Chapter 2 Maximum Likelihood Estiamtion

2.1 Introduction

The Maximum Likelihood Estimation (MLE) is a method of estimating the parameters of a model. This estimation method is one of the most widely used.
The method of maximum likelihood selects the set of values of the model parameters that maximizes the likelihood function. Intuitively, this maximizes the “agreement” of the selected model with the observed data.
The Maximum-likelihood Estimation gives an unified approach to estimation.

2.2 The Principle of Maximum Likelihood

We take poisson distributed random variables as an example. Suppose that \(X_1,X_2,\dots,X_N\) are i.i.d. discrete random variables, such that \(Xi\sim Pois(\theta)\) with a pmf (probability mass function) defined as:

\[ Pr(X_i = x_i) = \frac{\exp(-\theta)\theta^{x_i}}{x_i!} \]

where \(\theta\) is an unknown parameter to estimate.

Question: What is the probability of observing the particular sample \(\{x_1, x_2,\dots, x_N\}\), assuming that a Poisson distribution with as yet unknown parameter \(\theta\) generated the data?

This probability is equal to

\[ Pr((X_1 = x_1)\cap\dots\cap (X_N = x_N )) \]

Since the variables \(X_i\) are i.i.d., this joint probability is equal to the product of the marginal probabilities:

\[ Pr((X_1 = x_1)\cap\dots\cap (X_N = x_N )) = \prod_{i=1}^N Pr(X_i = x_i) \]

Given the pmf of the Poisson distribution, we have:

\[ Pr((X_1 = x_1)\cap\dots\cap (X_N = x_N )) = \prod_{i=1}^N \frac{\exp(-\theta)\theta^{x_i}}{x_i!} = \exp(-\theta N)\frac{\theta^{\sum_{i=1}^N x_i}}{\prod_{i=1}^N x_i !} \]

This joint probability is a function of \(\theta\) (the unknown parameter) and corresponds to the likelihood of the sample \(\{x_1, x_2,\dots, x_N\}\) denoted by

\[ \mathcal{L}(x_1,\dots,x_N|\theta) = Pr((X_1 = x_1)\cap\dots\cap (X_N = x_N )) \]

Consider maximizing the likelihood function \(\mathcal{L}(x_1,\dots,x_N|\theta)\) with respect to \(\theta\). Since the log function is monotonically increasing, we usually maximize \(\ln \mathcal{L}(x_1,\dots,x_N|\theta)\) instead. We call this as loglikelihood function: \(\ell(x_1,\dots,x_N|\theta) = \ln \mathcal{L}(x_1,\dots,x_N|\theta)\), or simply \(\ell(\theta)\). In this case:

\[ \ell (x_1,\dots,x_N|\theta) = -\theta N + \ln(\theta)\sum_{i=1}^N x_i - \ln(\prod_{i=1}^N x_i !) \]

The simplest way to find the \(\theta\) that maximizes \(\ell(\theta)\) is to take a derivative.

\[ \frac{\partial\ell(\theta)}{\partial \theta} = -N + \frac{1}{\theta}\sum_{i=1}^N x_i \]

To make sure that we indeed maximize not minimize \(\ell(\theta)\), we should also check that the second derivative is less than 0:

\[ \frac{\partial^2\ell(\theta)}{\partial \theta^2} = - \frac{1}{\theta^2}\sum_{i=1}^N x_i < 0 \]

Therefore, the maximum likelihood estimator \(\hat{\theta}_{mle}\) is:

\[ \hat{\theta}_{mle} = \frac{1}{N}\sum_{i=1}^N x_i \]

For the Laplace model, the maximum-likelihood estimates are:

\[ \hat{\mu} = median(x_t)\\ \hat{b} = \frac{1}{n}\sum_{i=1}^n|x_t-\hat{\mu}| \]

Note that they are different from the MOM results.

2.3 Example (Exponential Distribution)

Suppose that \(D_1, D_2,\dots, D_N\) are i.i.d. positive random variables (durations for instance), with \(Di\sim Exp (\theta)\) with \(\theta\geq 0\) and

\[ \mathcal{L}_i(d_i|\theta_i) = f_{D_i}(d_i|\theta_i) = \frac{1}{\theta}\exp\left(-\frac{d_i}{\theta}\right)\\ \ell_i (d_i|\theta_i) = \ln(f_{D_i}(d_i|\theta_i)) = -\ln(\theta)-\frac{d_i}{\theta} \]