Bayesian statistics

Bayesian statistics
💡No image available
Overview
Core idea	Updating probability distributions for uncertain parameters using Bayes’ theorem
Primary theorem	Bayes’ theorem
Common computational methods	Markov chain Monte Carlo, variational inference
Relationship to other frameworks	Alternative to frequentist inference

Bayesian statistics is a branch of statistics that uses probability to quantify uncertainty and updates that uncertainty as new evidence becomes available. It is based on Bayes’ theorem and contrasts with frequentist approaches, which typically treat parameters as fixed but unknown quantities rather than random variables. Bayesian methods are used widely in machine learning, data analysis, and scientific inference, including applications that rely on hierarchical models and Markov chain Monte Carlo (MCMC) computation.

Foundations

Bayesian statistics formalizes inference using Bayesian probability, in which all quantities of interest (such as model parameters and latent variables) may be represented by probability distributions. The key mathematical relationship is Bayes’ theorem, which expresses how a prior belief about an unknown quantity is revised in light of observed data to obtain a posterior distribution. In this framework, the likelihood function links parameters to data, while the prior distribution represents uncertainty before observing the data.

A central output of Bayesian analysis is the posterior distribution, from which point estimates and uncertainty intervals can be derived. Common summaries include the posterior mean, median, and modes, as well as credible intervals, which represent intervals containing a stated probability mass under the posterior distribution. Bayesian inference also supports decision-making through expected utility, which is often linked to decision theory and related concepts such as loss functions.

Prior, likelihood, and posterior

Bayesian models typically consist of a prior distribution for unknown parameters, a likelihood function for the observed data given those parameters, and a mechanism for computing the posterior. When the likelihood and prior are chosen to form a conjugate family, the posterior may have a closed-form expression, simplifying computation and interpretation. In practice, conjugacy is not always available, so numerical methods are frequently required.

Posterior inference depends critically on the prior specification. For example, noninformative prior and weakly informative priors are used when the intent is to minimize prior influence, while informative priors can incorporate external knowledge. Bayesian analysis also commonly uses hierarchical Bayesian model structures, where parameters themselves may have distributions with their own hyperparameters. This can improve estimation in settings with limited data and supports partial pooling across groups.

Computational methods

Because the posterior distribution often lacks a closed-form solution, Bayesian statistics relies on computational techniques. Markov chain Monte Carlo methods generate samples from the posterior by constructing a Markov chain whose stationary distribution equals the posterior. Well-known MCMC algorithms include Gibbs sampling and Metropolis–Hastings, each requiring careful consideration of convergence diagnostics and computational cost.

Alternative approaches include variational inference, which approximates the posterior using a family of distributions and optimizes an objective to find the closest approximation under a divergence measure. Variational methods can be faster than MCMC but may introduce approximation error. For model checking and assessment, Bayesian workflows often use posterior predictive checks to evaluate whether simulated data from the posterior replicate salient features of the observed data.

Bayesian inference and modeling

Bayesian statistics is used for both estimation and model comparison. In estimation, the posterior distribution provides a complete probabilistic description of uncertainty in parameters and predictions. Predictive distributions can be obtained by averaging over parameter uncertainty, producing Bayesian predictive intervals and robust forecasts.

For model comparison, Bayesians often employ Bayes factor, which compares the relative evidence for competing models based on their marginal likelihoods. In many applications, model specification is aided by probabilistic graphical structures, such as Bayesian networks, which encode conditional dependencies among random variables. Such tools support scalable inference in systems where variables have structured relationships, including domains like medical diagnosis, engineering systems, and natural language processing.

In applied settings, Bayesian practice also involves sensitivity analysis to understand how conclusions depend on prior choices and modeling assumptions. This is particularly relevant when priors are not strongly justified by domain knowledge or when data are limited. Robustness considerations are also influenced by the choice of likelihood, transformation of variables, and treatment of missing data.

Criticism and limitations

Bayesian methods have been widely adopted, but they also face common criticisms and limitations. One concern is the subjectivity implied by prior selection, which can be addressed by using transparent prior elicitation methods, conducting sensitivity analyses, and reporting how results change under alternative plausible priors. Computational challenges are another limitation: MCMC and variational inference can be expensive, and convergence failures or poor approximations may lead to misleading conclusions.

Interpretation of Bayesian uncertainty can also differ from frequentist practice. For example, credible intervals are interpreted probabilistically under the posterior distribution, while frequentist confidence intervals have coverage properties defined in repeated-sampling thought experiments. This difference affects how results are communicated and validated. Bayesian model assessment additionally depends on the adequacy of the assumed likelihood and prior structures, so misspecification can propagate into posterior summaries and predictions.