Chapter 9 Parameter Estimation

9.1 Introduction

In this Section, we consider the general definition of a statistic as a summary of a random sample. Statistics are used as estimators of population quantities with an estimate denoting a given realisation of an estimator. We explore key properties that we wish estimators to have such as unbiasedness, efficiency and consistency. We study the properties of the sample mean and sample variance as estimators of the population mean and variance, respectively.

9.2 Preliminaries

A statistic, \(T(\mathbf{X})\), is any function of the random sample.

Note that since \(T(\mathbf{X})\) is a function of random variables, it is also a random variable. Hence it will also have all the properties of a random variable. Most importantly, it has a distribution associated with it.

A statistic that is used for the purpose of estimating an unknown population parameter is called an estimator.

A realised value of an estimator, \(T(\mathbf{x})\), that is the value of \(T(\mathbf{X})\) evaluated at a particular outcome of the random sample, is called an estimate.

That is, if we let \(Y = T (\mathbf{X})\) then \(Y\) is a random variable and \(y= T (\mathbf{x})\) is a realisation of the random variable \(Y\) based on the sample \(\mathbf{x} = (x_1, x_2, \ldots, x_n)\). The properties of the estimator \(T (\mathbf{X})\) will typically depend upon \(n\), the number of observations in the random sample.

Average Income

Suppose that we want to estimate the average annual income in the U.K. Let \(X_1,X_2,\dots,X_n\) be a random sample of annual incomes. Possible estimators might include:

\(T_1(\mathbf{X}) = \frac{X_1 + X_2 + \cdots + X_n}{n}\);
\(T_2(\mathbf{X}) = \min \{X_1,X_2,\dots,X_n\}\);
\(T_3(\mathbf{X}) = X_1\).

Which of these is the best choice of estimator?

9.3 Judging estimators

Let \(\theta\) be a population parameter we wish to estimate. Since any function of the sample data is a potential estimator of \(\theta\), how should we determine whether an estimator is good or not? What qualities should our estimator have?

Quality 1: Unbiasedness

The estimator \(T(\mathbf{X})\) is an unbiased estimate of \(\theta\) if \[E \left[ T(\mathbf{X}) \right] = \theta.\] Otherwise, we say that the estimator \(T(\mathbf{X})\) is biased and we define \[B(T) = E \left[ T(\mathbf{X}) \right] - \theta\] to be the bias of \(T\).

If \(B(T) \rightarrow 0\) as the sample size \(n \rightarrow \infty\), then we say that \(T(\mathbf{X})\) is asymptotically unbiased for \(\theta\).

Quality 2: Small variance

If two estimators \(T_1(\mathbf{X})\) and \(T_2(\mathbf{X})\) are both unbiased for \(\theta\), then \(T_1(\mathbf{X})\) is said to be more efficient than \(T_2(\mathbf{X})\) if \[var \left( T_1(\mathbf{X}) \right) < var \left( T_2(\mathbf{X}) \right).\]

We would ideally like an estimator that is unbiased with a small variance. So given multiple unbiased estimators, we choose the most efficient estimator (the estimator with the smallest variance).

For comparing an estimator with a biased estimator, we can use the mean-square error to quantify the trade-off between bias and variance:

The mean-square error of an estimator is defined by

\[\text{MSE}(T) = E \left[ \left( T(\mathbf{X}) - \theta \right) ^2 \right].\]

Prove \(\text{MSE}(T) = \text{var} (T) + \left( B(T) \right)^2\).

Watch Video 16 for the proof of Exercise 1 or alternatively the proof is available:

Proof of Exercise 1

The first step is to note that we can write
\[\begin{eqnarray*} T (\mathbf{X}) - \theta &=& T (\mathbf{X}) - E[T (\mathbf{X})] + E[T (\mathbf{X})] - \theta \\ &=& T (\mathbf{X}) - E[T (\mathbf{X})] + B(T). \end{eqnarray*}\] Therefore
\[\begin{eqnarray*} E \left[ \left( T(\mathbf{X}) - \theta \right) ^2 \right] &=& E \left[ \left( T (\mathbf{X}) - E[T (\mathbf{X})] + B(T) \right) ^2 \right] \\ &=& E \left[ \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right)^2 + 2 B(T) \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right) + B(T)^2\right] \\ &=& E \left[ \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right)^2 \right] + 2 E \left[ B(T) \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right) \right] + E \left[ B(T)^2\right]. \end{eqnarray*}\] Since \(B(T)\) is a constant, the middle term in the above equation is
\[\begin{eqnarray*} 2 E \left[ B(T) \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right) \right] &=& 2 B(T) E \left[ T (\mathbf{X}) - E[T (\mathbf{X})] \right] \\ &=& 2 B(T) \left\{E[T (\mathbf{X})] -E[T (\mathbf{X})] \right\} =0. \end{eqnarray*}\] Therefore, since \(E \left[ \left( T (\mathbf{X}) - E[T (\mathbf{X})] \right)^2 \right] = var (T (\mathbf{X}))\), we have that
\[ E \left[ \left( T(\mathbf{X}) - \theta \right) ^2 \right] = var (T (\mathbf{X})) + 0 + B(T)^2 \]

as required.

Video 16: Derivation of MSE

Quality 3: Consistency

An estimator \(T(\mathbf{X})\) is said to be a consistent estimator for \(\theta\) if

\[T(\mathbf{X}) \stackrel{p}{\longrightarrow} \theta, \qquad \text{ as } n \rightarrow \infty.\]

Remember convergence in probability (\(\stackrel{p}{\longrightarrow}\)) is defined in Section 7.4, and the definition of consistency implies that, for any \(\epsilon >0\),
\[ P (|T (\mathbf{X})- \theta|> \epsilon) \rightarrow 0 \qquad \text{ as } n \rightarrow \infty.\]

That is, as \(n\) becomes large the probability that \(T(\mathbf{X})\) differs from \(\theta\) by more than \(\epsilon\), for any positive \(\epsilon\), becomes small and goes to 0 as \(n \rightarrow \infty\).

This third desirable property can sometimes be established using the following theorem:

Consistency Theorem

If \(E \left[ T(\mathbf{X}) \right] \rightarrow \theta\) and \(\text{Var} \left( T(\mathbf{X}) \right) \rightarrow 0\) as \(n \rightarrow \infty\), then \(T(\mathbf{X})\) is a consistent estimator for \(\theta\).

Note that the Consistency Theorem gives sufficient but not necessary conditions for consistency. Since by Exercise 1 \(\text{MSE}(T) = \text{var} (T) + \left( B(T) \right)^2\), the Consistency Theorem implies that if \(\text{MSE}(T) \rightarrow 0\) as \(n \rightarrow \infty\), then \(T(\mathbf{X})\) is a consistent estimator for \(\theta\).

Suppose \(X_1,X_2,\ldots,X_n\) is a random sample from any population with mean \(\mu\) and variance \(\sigma^2\). The sample mean is \(\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\) and is an estimator of \(\mu\). What are the properties of \(\bar{X}\)?

Firstly, we can show that \(\bar{X}\) is unbiased: \[\begin{align*} E[\bar{X}] &= E \left[ \frac{1}{n} \left( X_1 + X_2 + \ldots + X_n \right) \right] \\ &= \frac{1}{n} \left\{ E [X_1] + E[X_2] + \ldots + E[X_n] \right\} \\ &= \frac{1}{n} \left\{ \mu + \mu + \ldots + \mu \right\} \\ &= \frac{1}{n} n \mu \\ &= \mu. \end{align*}\] The variance of \(\bar{X}\) is \(\frac{\sigma^2}{n}\) since:
\[\begin{align*} var(\bar{X}) &= var \left( \frac{1}{n} \sum_{i=1}^n X_i \right) \\ &= \frac{1}{n^2} \sum_{i=1}^n \text{Var}(X_i) \\ &= \frac{1}{n^2} \sum_{i=1}^n \sigma^2 \\ &= \frac{1}{n^2} n\sigma^2 \\ &= \frac{\sigma^2}{n}. \end{align*}\]

Given that \(\bar{X}\) is an unbiased estimator the mean-square error of \(\bar{X}\) is equal to \(var(\bar{X})\), \(\frac{\sigma^2}{n}\).

Since \(E[\bar{X}] \rightarrow \mu\) and \(var(\bar{X}) \rightarrow 0\) as \(n \rightarrow \infty\), it follows from the Consistency Theorem that \(\bar{X}\) is a consistent estimator for \(\mu\).

We return to Average Income Example concerning the average annual income in the UK.

It follows from Exercise 2 that
\[T_1 (\mathbf{X}) = \frac{X_1 + X_2 +\ldots + X_n}{n}\]

is an unbiased and consistent estimator of the mean annual income.

Let \(L\) denote the lowest annual income in the UK. Then \[ T_2 (\mathbf{X}) = \min \{ X_1, X_2, \ldots, X_n \} \rightarrow L \qquad \mbox{ as } n \rightarrow \infty. \]

Except in the case \(n=1\), the mean of \(T_2 (\mathbf{X})\) will be below the mean annual income (the exact value will depend on the distribution of annual incomes) and will become smaller as \(n\) increases with the limit \(L\) as \(n \rightarrow \infty\).

The final estimator \(T_3 (\mathbf{X}) =X_1\) is unbiased as \(E[X_1]\) is the average annual income. However, for all \(n =1,2,\ldots\), \(var (T_3 (\mathbf{X})) = var (X_1)\) and unless the annual income is constant, \(var (X_1)>0\). Therefore \(T_3 (\mathbf{X})\) is not a consistent estimator since the estimator, and hence its variance, does not change as we increase the sample size.

9.4 Sample Variance

Variance Estimator

Suppose \(X_1,X_2,\ldots,X_n\) is a random sample from any population with mean \(\mu\) and variance \(\sigma^2\). Consider the estimator
\[ \hat{\sigma}^2\ = \frac{1}{n} \sum\limits_{i=1}^{n} \left( X_i - \bar{X} \right)^2.\]

Before considering the estimator \(\hat{\sigma}^2\) in Example 2 we prove Lemma 2 which is useful in manipulating sums of squares.

Splitting square

\[\begin{eqnarray*} \sum\limits_{i=1}^{n} (X_i - \mu)^2 &=& \sum\limits_{i=1}^{n} (X_i - \bar{X})^2 + \sum\limits_{i=1}^{n} (\bar{X} - \mu)^2 \\ &=& \sum\limits_{i=1}^{n} (X_i - \bar{X})^2 + n (\bar{X} - \mu)^2. \end{eqnarray*}\]

The proof uses the same approach to that given for the \(MSE (T)\) in Exercise 1 in that we can write \[\begin{eqnarray*} \sum\limits_{i=1}^{n} (X_i - \mu)^2 &=& \sum\limits_{i=1}^{n} (X_i - \bar{X} + \bar{X} - \mu)^2 \\ &=& \sum\limits_{i=1}^{n} \left\{ (X_i - \bar{X})^2 + 2 (X_i - \bar{X}) (\bar{X}-\mu) + (\bar{X} - \mu)^2 \right\} \\ &=& \sum\limits_{i=1}^{n} (X_i - \bar{X})^2 + 2 (\bar{X} - \mu ) \sum\limits_{i=1}^{n} (X_i - \bar{X}) + \sum\limits_{i=1}^{n} (\bar{X} - \mu )^2. \end{eqnarray*}\] Note that \[ \sum\limits_{i=1}^{n} (X_i - \bar{X}) = \sum\limits_{i=1}^{n} X_i - n \bar{X} = n \bar{X} - n \bar{X} =0,\]

and the Lemma follows.

Lemma 2 is an example of a common trick in statistics. Suppose that we have \(A_i = B_i +K\) \((i=1,2,\ldots, n)\) such that \(\sum_{i=1}^n B_i=0\), then
\[ \sum_{i=1}^n A_i^2 = \sum_{i=1}^n (B_i +K)^2 = \sum_{i=1}^n B_i^2 + n K^2.\]

We check whether the variance estimator \(\hat{\sigma}^2\) is biased or unbiased:

\[\begin{align*} E[\hat{\sigma}^2] &= E \left[ \frac{1}{n} \sum\limits_{i=1}^{n} (X_i - \bar{X})^2 \right] \\ &= E\left[\frac{1}{n} \sum\limits_{i=1}^{n} (X_i - \mu)^2 - \frac{1}{n} \sum\limits_{i=1}^{n} (\bar{X} - \mu)^2 \right] \\ &= \frac{1}{n} \sum\limits_{i=1}^{n} E \left[ (X_i - \mu)^2 \right] - \frac{1}{n} \sum\limits_{i=1}^{n} E \left[ (\bar{X} - \mu)^2 \right] \\ &= \frac{1}{n} \sum\limits_{i=1}^{n} \text{Var} (X_i) - \frac{1}{n} \sum\limits_{i=1}^{n} \text{Var} (\bar{X}) \\ &= \frac{1}{n} n\sigma^2 - \frac{1}{n} n \frac{\sigma^2}{n} \\ &= \frac{(n-1)\sigma^2}{n}. \end{align*}\]

Hence \(E[\hat{\sigma}^2] \neq \sigma^2 = Var(X_i)\) and so \(\hat{\sigma}^2\) is a biased, although asymptotically unbiased, estimator for \(\sigma^2\). Under weak additional conditions, such as \(E [X_1^4] < \infty\), it can be shown that \(\hat{\sigma}^2\) is a consistent estimator.

It follows from Variance Estimator that given a random sample \(X_1, X_2, \ldots, X_n\), the quantity,
\[s^2 = \frac{n}{n-1} \hat{\sigma}^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (X_i - \bar{X})^2\]

is an unbiased estimator of \(\sigma^2\). This is the definition of the sample variance that we gave in Section 2.3.

It can be shown that \[ s^2 = \frac{1}{n-1} \left( \sum_{i=1}^n X_i^2 - \frac{\left( \sum_{i=1}^n X_i \right)^2}{n} \right) = \frac{1}{n-1} \left( \sum_{i=1}^n X_i^2 - n \bar{X}^2 \right). \]

Sample variance and covariance

Given observed data \(x_1, x_2, \ldots, x_n\) then we define the sample variance by
\[ s_{x}^2 = \frac{1}{n-1} \sum\limits_{i=1}^n (\bar{x}_i - \bar{x})^2 = \frac{1}{n-1} \left( \sum\limits_{i=1}^n x_i^2 - \frac{\left( \sum\limits_{i=1}^n x_i \right)^2}{n} \right) = \frac{1}{n-1} \left(\sum\limits_{i=1}^n x_i^2 - n \bar{x}^2 \right).\] Similarly, if we have data pairs \((x_1, y_1), (x_2, y_2), \ldots, (x_n,y_n)\) we define the sample covariance by: \[ s_{xy} = \frac{1}{n-1} \sum\limits_{i=1}^n (x_i - \bar{x})(y_i -\bar{y}). \]

Task: Lab 5

Attempt the R Markdown file for Lab 5:
Lab 5: Estimators

Student Exercises

Attempt the exercises below.

Question 1.

Suggest a reasonable statistical model for each of the following situations, and say which parameter or function of the parameter(s) in the model is likely to be of main interest:

The number of reportable accidents that occur in the University in the month of October is ascertained, with a view to estimating the overall accident rate for the academic year;
In a laboratory test the times to failure of 10 computer hard disk units are measured, to enable the manufacturer to quote for the mean time to failure in sales literature.

Of course in practice one needs to check whether the suggested models are reasonable, e.g. by examining a histogram.

Solution to Question 1.

The number of October accidents could be \({\rm Po}(\theta)\) (if accidents occurred at random and independently).
Parameter: \(\theta\), the expected number of accidents per month.
Function of parameter of interest is \(12 \theta\).
Failure times \(T_1,T_2,\dots,T_{10}\) could be independent \({\rm Exp} (\theta)\) (if disk failures occurred at random and independently).
Function of parameter of interest is the mean failure time, \(1/\theta\).

Question 2.

Suppose that a surveyor is trying to determine the area of a rectangular field, in which the measured length \(Y_1\) and the measured width \(Y_2\) are independent random variables taking values according to the following distributions: \[ \begin{array}{l|lll} y_1 & 8 & 10 & 11 \\ \hline p(y_1) & 0.25 & 0.25 & 0.5 \end{array} \hspace{2cm} \begin{array}{l|ll} y_2 & 4 & 6 \\ \hline p(y_2) & 0.5 & 0.5 \end{array} \]

The calculated area \(A = Y_1 Y_2\) is also a random variable, and is used to estimate the true area. If the true length and width are 10 and 5, respectively.

Is \(Y_1\) an unbiased estimator of the true length?
Is \(Y_2\) an unbiased estimator of the true width?
Is \(A\) an unbiased estimator of the true area?

Solution to Question 2.

Yes \(Y_1\) is an unbiased estimator, since \[E[Y_1] = 8 \times 0.25 + 10 \times 0.25 + 11 \times 0.5 = 10.\]
Yes \(Y_2\) is an unbiased estimator, since \[E[Y_2] = 4 \times 0.5 + 6 \times 0.5 = 5.\]
Yes \(A\) is an unbiased estimator, since by independence \[ E[A] = E[Y_1 Y_2] = E[Y_1] E[Y_2]\] and therefore \[ E[A] = 10 \times 5 = 50.\]