Chapter 9 Parameter Estimation

9.1 Introduction

In this Section, we consider the general definition of a statistic as a summary of a random sample. Statistics are used as estimators of population quantities with an estimate denoting a given realisation of an estimator. We explore key properties that we wish estimators to have such as unbiasedness, efficiency and consistency. We study the properties of the sample mean and sample variance as estimators of the population mean and variance, respectively.

9.2 Preliminaries

A statistic, T(X), is any function of the random sample.

Note that since T(X) is a function of random variables, it is also a random variable. Hence it will also have all the properties of a random variable. Most importantly, it has a distribution associated with it.

A statistic that is used for the purpose of estimating an unknown population parameter is called an estimator.

A realised value of an estimator, T(x), that is the value of T(X) evaluated at a particular outcome of the random sample, is called an estimate.

That is, if we let Y=T(X) then Y is a random variable and y=T(x) is a realisation of the random variable Y based on the sample x=(x1,x2,,xn). The properties of the estimator T(X) will typically depend upon n, the number of observations in the random sample.

Average Income

Suppose that we want to estimate the average annual income in the U.K. Let X1,X2,,Xn be a random sample of annual incomes. Possible estimators might include:

  • T1(X)=X1+X2++Xnn;
  • T2(X)=min{X1,X2,,Xn};
  • T3(X)=X1.

Which of these is the best choice of estimator?

9.3 Judging estimators

Let θ be a population parameter we wish to estimate. Since any function of the sample data is a potential estimator of θ, how should we determine whether an estimator is good or not? What qualities should our estimator have?

Quality 1: Unbiasedness

The estimator T(X) is an unbiased estimate of θ if E[T(X)]=θ. Otherwise, we say that the estimator T(X) is biased and we define B(T)=E[T(X)]θ to be the bias of T.

If B(T)0 as the sample size n, then we say that T(X) is asymptotically unbiased for θ.

Quality 2: Small variance

If two estimators T1(X) and T2(X) are both unbiased for θ, then T1(X) is said to be more efficient than T2(X) if
var(T1(X))<var(T2(X)).

We would ideally like an estimator that is unbiased with a small variance. So given multiple unbiased estimators, we choose the most efficient estimator (the estimator with the smallest variance).

For comparing an estimator with a biased estimator, we can use the mean-square error to quantify the trade-off between bias and variance:

The mean-square error of an estimator is defined by

MSE(T)=E[(T(X)θ)2].

Prove MSE(T)=var(T)+(B(T))2.

Watch Video 16 for the proof of Exercise 1 or alternatively the proof is available:

Proof of Exercise 1
The first step is to note that we can write
T(X)θ=T(X)E[T(X)]+E[T(X)]θ=T(X)E[T(X)]+B(T).
Therefore
E[(T(X)θ)2]=E[(T(X)E[T(X)]+B(T))2]=E[(T(X)E[T(X)])2+2B(T)(T(X)E[T(X)])+B(T)2]=E[(T(X)E[T(X)])2]+2E[B(T)(T(X)E[T(X)])]+E[B(T)2].
Since B(T) is a constant, the middle term in the above equation is
2E[B(T)(T(X)E[T(X)])]=2B(T)E[T(X)E[T(X)]]=2B(T){E[T(X)]E[T(X)]}=0.
Therefore, since E[(T(X)E[T(X)])2]=var(T(X)), we have that
E[(T(X)θ)2]=var(T(X))+0+B(T)2

as required.

Video 16: Derivation of MSE

Quality 3: Consistency

An estimator T(X) is said to be a consistent estimator for θ if

T(X)pθ, as n.
Remember convergence in probability (p) is defined in Section 7.4, and the definition of consistency implies that, for any ϵ>0,
P(|T(X)θ|>ϵ)0 as n.

That is, as n becomes large the probability that T(X) differs from θ by more than ϵ, for any positive ϵ, becomes small and goes to 0 as n.

This third desirable property can sometimes be established using the following theorem:

Consistency Theorem

If E[T(X)]θ and Var(T(X))0 as n, then T(X) is a consistent estimator for θ.

Note that the Consistency Theorem gives sufficient but not necessary conditions for consistency. Since by Exercise 1 MSE(T)=var(T)+(B(T))2, the Consistency Theorem implies that if MSE(T)0 as n, then T(X) is a consistent estimator for θ.

Suppose X1,X2,,Xn is a random sample from any population with mean μ and variance σ2. The sample mean is X¯=1ni=1nXi and is an estimator of μ. What are the properties of X¯?

Firstly, we can show that X¯ is unbiased:
E[X¯]=E[1n(X1+X2++Xn)]=1n{E[X1]+E[X2]++E[Xn]}=1n{μ+μ++μ}=1nnμ=μ.
The variance of X¯ is σ2n since:
var(X¯)=var(1ni=1nXi)=1n2i=1nVar(Xi)=1n2i=1nσ2=1n2nσ2=σ2n.

Given that X¯ is an unbiased estimator the mean-square error of X¯ is equal to var(X¯), σ2n.

Since E[X¯]μ and var(X¯)0 as n, it follows from the Consistency Theorem that X¯ is a consistent estimator for μ.


We return to Average Income Example concerning the average annual income in the UK.

It follows from Exercise 2 that
T1(X)=X1+X2++Xnn

is an unbiased and consistent estimator of the mean annual income.

Let L denote the lowest annual income in the UK. Then
T2(X)=min{X1,X2,,Xn}L as n.

Except in the case n=1, the mean of T2(X) will be below the mean annual income (the exact value will depend on the distribution of annual incomes) and will become smaller as n increases with the limit L as n.

The final estimator T3(X)=X1 is unbiased as E[X1] is the average annual income. However, for all n=1,2,, var(T3(X))=var(X1) and unless the annual income is constant, var(X1)>0. Therefore T3(X) is not a consistent estimator since the estimator, and hence its variance, does not change as we increase the sample size.

9.4 Sample Variance

Variance Estimator

Suppose X1,X2,,Xn is a random sample from any population with mean μ and variance σ2. Consider the estimator
σ^2 =1ni=1n(XiX¯)2.

Before considering the estimator σ^2 in Example 2 we prove Lemma 2 which is useful in manipulating sums of squares.

Splitting square

i=1n(Xiμ)2=i=1n(XiX¯)2+i=1n(X¯μ)2=i=1n(XiX¯)2+n(X¯μ)2.
The proof uses the same approach to that given for the MSE(T) in Exercise 1 in that we can write
i=1n(Xiμ)2=i=1n(XiX¯+X¯μ)2=i=1n{(XiX¯)2+2(XiX¯)(X¯μ)+(X¯μ)2}=i=1n(XiX¯)2+2(X¯μ)i=1n(XiX¯)+i=1n(X¯μ)2.
Note that
i=1n(XiX¯)=i=1nXinX¯=nX¯nX¯=0,

and the Lemma follows.


Lemma 2 is an example of a common trick in statistics. Suppose that we have Ai=Bi+K (i=1,2,,n) such that i=1nBi=0, then
i=1nAi2=i=1n(Bi+K)2=i=1nBi2+nK2.

We check whether the variance estimator σ^2 is biased or unbiased:

E[σ^2]=E[1ni=1n(XiX¯)2]=E[1ni=1n(Xiμ)21ni=1n(X¯μ)2]=1ni=1nE[(Xiμ)2]1ni=1nE[(X¯μ)2]=1ni=1nVar(Xi)1ni=1nVar(X¯)=1nnσ21nnσ2n=(n1)σ2n.

Hence E[σ^2]σ2=Var(Xi) and so σ^2 is a biased, although asymptotically unbiased, estimator for σ2. Under weak additional conditions, such as E[X14]<, it can be shown that σ^2 is a consistent estimator.

It follows from Variance Estimator that given a random sample X1,X2,,Xn, the quantity,
s2=nn1σ^2=1n1i=1n(XiX¯)2

is an unbiased estimator of σ2. This is the definition of the sample variance that we gave in Section 2.3.

It can be shown that s2=1n1(i=1nXi2(i=1nXi)2n)=1n1(i=1nXi2nX¯2).

Sample variance and covariance

Given observed data x1,x2,,xn then we define the sample variance by
sx2=1n1i=1n(x¯ix¯)2=1n1(i=1nxi2(i=1nxi)2n)=1n1(i=1nxi2nx¯2).
Similarly, if we have data pairs (x1,y1),(x2,y2),,(xn,yn) we define the sample covariance by:
sxy=1n1i=1n(xix¯)(yiy¯).

Task: Lab 5

Attempt the R Markdown file for Lab 5:
Lab 5: Estimators


Student Exercises

Attempt the exercises below.

Question 1.

Suggest a reasonable statistical model for each of the following situations, and say which parameter or function of the parameter(s) in the model is likely to be of main interest:

  1. The number of reportable accidents that occur in the University in the month of October is ascertained, with a view to estimating the overall accident rate for the academic year;
  2. In a laboratory test the times to failure of 10 computer hard disk units are measured, to enable the manufacturer to quote for the mean time to failure in sales literature.

Of course in practice one needs to check whether the suggested models are reasonable, e.g. by examining a histogram.

Solution to Question 1.
  1. The number of October accidents could be Po(θ) (if accidents occurred at random and independently).
    Parameter: θ, the expected number of accidents per month.
    Function of parameter of interest is 12θ.
  2. Failure times T1,T2,,T10 could be independent Exp(θ) (if disk failures occurred at random and independently).
    Function of parameter of interest is the mean failure time, 1/θ.

Question 2.

Suppose that a surveyor is trying to determine the area of a rectangular field, in which the measured length Y1 and the measured width Y2 are independent random variables taking values according to the following distributions:
y181011p(y1)0.250.250.5y246p(y2)0.50.5

The calculated area A=Y1Y2 is also a random variable, and is used to estimate the true area. If the true length and width are 10 and 5, respectively.

  1. Is Y1 an unbiased estimator of the true length?
  2. Is Y2 an unbiased estimator of the true width?
  3. Is A an unbiased estimator of the true area?
Solution to Question 2.
  1. Yes Y1 is an unbiased estimator, since
    E[Y1]=8×0.25+10×0.25+11×0.5=10.
  2. Yes Y2 is an unbiased estimator, since
    E[Y2]=4×0.5+6×0.5=5.
  3. Yes A is an unbiased estimator, since by independence
    E[A]=E[Y1Y2]=E[Y1]E[Y2]
    and therefore
    E[A]=10×5=50.