Chapter 11 Additional Properties of Estimators
11.1 Introduction
In this section, we introduce four key concepts associated with estimators, especially maximum likelihood estimators (MLE):
Sufficiency considers the question of what information from the data, is sufficient to estimate a population parameter, , without loss of information. Often sufficient statistics will take the form of a summary statistic, for example, the mean.
Minimising the mean square error (MSE) of estimators is a desirable property. For unbiased estimators minimising the variance of the estimator is equivalent to minimising the MSE. We introduce the Cramer-Rao lower bound which is the minimum variance obtainable by an unbiased estimator and the concept of minimum variance unbiased estimators (MVUE) as estimators which obtain the Cramer-Rao lower bound.
For large , the MLE, is approximately normally distributed about the true population parameter with variance determined by the second derivative of the likelihood. The variance of the asymptotic normal distribution coincides with the Cramer-Rao lower bound providing further support for using maximum likelihood estimation.
11.2 Sufficiency
Sufficient Statistic
Let where are i.i.d. random variables dependent on a parameter . A statistic is sufficient for if the conditional distribution of does not depend on , that is where is a function of only. Thus, contains all the information about .
The key point is that a sufficient statistic, as the name suggests, is sufficient for the estimation of a parameter . This is particularly useful if the sufficient statistic is a low-dimensional summary statistic of the data. As the following examples show in many cases there is a one-dimensional summary statistic of the data which is sufficient to estimate the population parameter of interest, . The Neyman-Fisher factorisation criterion provides easy to check conditions for sufficiency.
Neyman-Fisher factorisation criterion
The statistic is sufficient for if and only if one can factor the likelihood function such that where does not depend on (whenever ) and is a non-negative function of and .
The Neyman-Fisher factorisation criterion is equivalent to the log-likelihood function being expressible in the form: Then if we differentiate with respect to , we have that Setting and solving to obtain the MLE is equivalent to solving We observe that plays no role in the computation of the MLE and the function is a function of the sufficient statistic, and only.
Let be a random sample from . Show that is sufficient for .
Consider the likelihood function:
Therefore, letting and we can factor the likelihood function. So, by the Neyman-Fisher factorisation criterion, is a sufficient statistic for .
Remember in Section 10.3, Example 10.3.7, we have shown that the sample mean is the MLE of for .
Let be i.i.d. random variables from a Poisson distribution with parameter . Show that is a sufficient statistic for using the Neyman-Fisher factorisation criterion.
If we let and , then we have factorised the likelihood function according to the Neyman-Fisher factorisation criterion. So, must be a sufficient statistic of .
Note that
- Generally we prefer to use a sufficient statistic as an estimator for since the sufficient statistic uses all of the sample information to estimate .
- Sufficient statistics always exist, since is itself a sufficient statistic. However, we would prefer a statistic that has as low a dimension as possible. A sufficient statistic with the lowest possible dimensionality is called a minimal sufficient statistic.
- The MLE, if it exists, will always be a function of a sufficient statistic.
11.3 Minimum variance estimators
Given a population parameter does there exist a best estimator in general?
Recall that in our previous discussions in Section 9.3 on qualities of estimators we said we would prefer an estimator with as small an MSE as possible. Unfortunately, if we consider the class of all estimators for a particular parameter, there does not exist such an optimality criterion. If we decide to limit ourselves to particular classes of estimators then there do exist certain optimality criterion.
Let’s constrain ourselves to the class of unbiased estimators. Suppose that the random variables and their distributions satisfy the following regularity conditions:
- The range of the random variables does not depend on . The random variable is an example that does not satisfy this condition.
- The likelihood function is sufficiently smooth to allow us to interchange the operations of differentiation and integration.
- The second derivatives of the log-likelihood function exists.
Cramér-Rao inequality
Under the above regularity conditions if is an unbiased estimator of , then where .
Fisher’s information
is called the expected information or Fisher’s information.
Cramér-Rao lower bound
is called the Cramér-Rao lower bound.
The Cramér-Rao inequality implies that the smallest the variance of any unbiased estimator can become is .
Minimum variance unbiased estimator (MVUE)
If any unbiased estimator is such that , then we say that is a minimum variance unbiased estimator (MVUE) as no other unbiased estimator will be able to obtain a smaller variance.
Suppose are i.i.d. random variables from a Poisson distribution with parameter . Does the maximum likelihood estimator achieve the Cramér-Rao lower bound?
Firstly note that
Therefore is an unbiased estimator. Now
Now, since , . Therefore, is a MVUE for .
11.4 Asymptotic normality of the MLE
Asymptotic normality of the MLE
If is the MLE of , then under certain regularity conditions it can be shown that Hence, approximately for sufficiently large sample sizes,
As a consequence the MLE has the following asymptotic properties:
- is asymptotically unbiased;
- is asymptotically fully efficient, that is the variance of approaches the Cramér-Rao lower bound:
- is asymptotically normally distributed.
Although the asymptotic properties of the MLE are quite good, the properties are only true for sufficiently large samples. The properties do not necessarily hold for small samples and for any finite sample they are approximations. The asymptotic normality of the MLE is an example of the Central Limit Theorem, and consequently the quality of the approximation will depend on the underlying distribution.
11.5 Invariance property
If , where is one-to-one monotonic function of , then is the MLE of , and for large : where .
Note that for to be the MLE of it is not necessary for to be strictly one-to-one. It is sufficient for the range of to be an interval.
Properties of Poisson MLE
Let be a random sample from a Poisson distribution with parameter . We have shown is the MLE of .
- What is its asymptotic distribution?
- Compute .
- Find the MLE for and its asymptotic distribution.
- An alternative approach to estimate is the proportion of observations which are equal to 0, where if the event occurs and 0 otherwise. Show that is unbiased and find its asymptotic distribution.
Attempt Example 11.5.1: Properties of Poisson MLE and then watch Video 19 for the solutions.
Video 19: Properties of Poisson MLE
Solution to Example 11.5.1: MLE for Properties of Poisson MLE.
- According to the Asymptotic normality of the MLE Theorem, since is the MLE of , then . We have shown that , therefore, .
- We calculate
- Set . Then since the range of is an interval, specifically , the MLE of is given by
- For an event the function (known as the indicator function of ) takes the value 1 if occurs and 0 otherwise. Thus , the expectation for how likely the event is to occur is simply the probability that the event occurs. Compare with the Bernoulli distribution.
Therefore
Moreover, if , the number of observations equal to 0, then . For large , the Central Limit Theorem (Section 7.2) states and hence
Figure 11.1: Asymptotic variance times sample size, n, for varying p.
Task: Session 6
Attempt the R Markdown file for Session 6:
Session 6: Properties of MLEs
Student Exercises
Attempt the exercises below.
Consider the situation where Bernoulli trials are available. The number of trials required before 5 successes are obtained can be modelled by the negative binomial distribution with parameters and . The probability mass function of a negative binomial is:
Find the maximum likelihood estimator of based on a random sample of sets of trials. What are the maximum likelihood estimators of:
(a) the mean of the distribution , and,
(b) the quantity ?
Solution to Exercise 11.1.
and it is easily checked that . In the present example, , so the MLE of is .
By the invariance property, the MLE of
- is
- is
To determine the amount that a particular type of bacteria is present in water one finds out whether or not any is present in multiple samples. Let be the average number of bacteria per unit volume in the river, and assume that the bacteria are distributed at random in the water. Some test tubes each containing a volume of river water are incubated and tested. A negative test shows no bacteria whereas a positive test shows that at least one bacterium is present. If tubes out of tested give negative results, what is
the m.l.e. of ?
Hint.
If there are bacteria in a volume of river water thenSolution to Exercise 11.2.
Thus the MLE of is .