Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE)
💡No image available
Overview

Maximum likelihood estimation (MLE) is a statistical method for estimating the parameters of a probabilistic model by choosing the values that maximize the likelihood of observing the given data. Introduced within the broader framework of statistical estimation, MLE underpins many modern techniques, including parameter fitting in generalized linear models, Bayesian statistics via connections to likelihood-based inference, and statistical learning methods.

Overview

In MLE, the parameter vector is selected to maximize the likelihood function, which measures how probable the observed data are under different parameter values. Likelihood-based methods are closely related to hypothesis testing, since likelihood ratios often form the basis of tests such as those based on the likelihood-ratio test. The approach can also be expressed in terms of the log-likelihood, turning products of probabilities into sums and simplifying computation.

A key motivation for MLE is its connection to convex optimization and numerical methods: many models yield smooth log-likelihood functions that can be optimized with standard algorithms. In practice, researchers evaluate MLE solutions using criteria such as Akaike information criterion or the Bayesian information criterion, especially when comparing model classes with different numbers of parameters.

Mathematical formulation

Suppose data (x_1,\dots,x_n) are modeled as independent observations from a distribution with density (or mass) (f(x\mid\theta)), where (\theta) denotes unknown parameters. The likelihood function is [ L(\theta)=\prod_{i=1}^n f(x_i\mid\theta), ] and the MLE is any maximizer [ \hat{\theta}{\text{MLE}}=\arg\max{\theta} L(\theta). ] Often, maximization is performed on the log-likelihood [ \ell(\theta)=\sum_{i=1}^n \log f(x_i\mid\theta), ] since (\log(\cdot)) is monotonic and preserves the maximizing argument.

Depending on the model, the likelihood can be maximized by solving analytic equations (e.g., by setting derivatives of (\ell(\theta)) equal to zero) or by using iterative procedures. For complex models, numerical maximizers such as gradient-based methods and second-order schemes are common; these are connected to Newton's method and related optimization techniques.

Properties and theory

Under regularity conditions, MLE estimators have important asymptotic properties. In particular, consistency and asymptotic normality are standard outcomes: as the sample size grows, the MLE converges to the true parameter value and the distribution of the estimator approaches a normal distribution centered at the true parameter, with variance tied to the Fisher information.

The Fisher information also provides the basis for approximate uncertainty quantification, which can be expressed through asymptotic standard errors. In that setting, likelihood methods connect to the Cramér–Rao bound, offering a lower bound on the variance of unbiased estimators. MLE is often viewed as efficient because, asymptotically, it can achieve this bound in many well-behaved models.

Computation and practical considerations

For many parametric families, MLE can be computed efficiently, but for high-dimensional or non-linear models, maximizing the likelihood may require careful numerical strategies. In generalized models, likelihood maximization is frequently implemented via iterative reweighting or gradient-based routines; this is a common theme in generalized linear models. When models contain nuisance parameters or latent variables, likelihood maximization may be performed using algorithms such as Expectation–maximization algorithm.

MLE may also be sensitive to modeling assumptions and identifiability. If parameters are not identifiable, multiple values can yield the same likelihood, leading to non-unique maximizers. Additionally, in small samples or when regularity conditions fail, asymptotic approximations used for inference may be unreliable, motivating diagnostics and alternative inference approaches, including bootstrap and resampling-based methods.

Extensions and related concepts

MLE is closely related to the broader framework of statistical inference. For instance, likelihood-based estimation can be connected to information criteria such as Akaike information criterion for model selection, and to asymptotic tests derived from the likelihood function. The likelihood perspective also links MLE to Bayesian estimation through the role of the likelihood as a component of Bayes’ theorem, even though Bayesian methods estimate parameters using posterior distributions rather than only point maximizers.

When prior information or constraints are incorporated, one obtains variants such as penalized likelihood approaches, which are central to modern regularization methods in statistical learning. These ideas intersect with regularization and with maximum a posteriori estimation in Bayesian contexts, where optimizing a posterior can be viewed as modifying the likelihood with a prior term.