Chapter 9 Parameter Estimation

9.1 Introduction

In this section, we consider the general definition of a statistic as a summary of a random sample. Statistics are used as estimators of population quantities with an estimate denoting a given realisation of an estimator. We explore key properties that we wish estimators to have such as unbiasedness, efficiency and consistency. We study the properties of the sample mean and sample variance as estimators of the population mean and variance, respectively.

9.2 Preliminaries

Statistic

A statistic, T(X)T(X), is any function of the random sample.

Note that since T(X)T(X) is a function of random variables, it is also a random variable. Hence it will also have all the properties of a random variable. Most importantly, it has a distribution associated with it.

Estimator

A statistic that is used for the purpose of estimating an unknown population parameter is called an estimator.

Estimate

A realised value of an estimator, T(x)T(x), that is the value of T(X)T(X) evaluated at a particular outcome of the random sample, is called an estimate.

That is, if we let Y=T(X)Y=T(X) then YY is a random variable and y=T(x)y=T(x) is a realisation of the random variable YY based on the sample x=(x1,x2,,xn)x=(x1,x2,,xn). The properties of the estimator T(X)T(X) will typically depend upon nn, the number of observations in the random sample.

Average Income

Suppose that we want to estimate the average annual income in the U.K. Let X1,X2,,XnX1,X2,,Xn be a random sample of annual incomes. Possible estimators might include:

  • T1(X)=X1+X2++XnnT1(X)=X1+X2++Xnn;
  • T2(X)=min{X1,X2,,Xn}T2(X)=min{X1,X2,,Xn};
  • T3(X)=X1T3(X)=X1.

Which of these is the best choice of estimator?

9.3 Judging estimators

Let θθ be a population parameter we wish to estimate. Since any function of the sample data is a potential estimator of θθ, how should we determine whether an estimator is good or not? What qualities should our estimator have?

Quality 1: Unbiasedness

Unbiased

The estimator T(X)T(X) is an unbiased estimate of θθ if E[T(X)]=θ.E[T(X)]=θ. Otherwise, we say that the estimator T(X)T(X) is biased and we define B(T)=E[T(X)]θB(T)=E[T(X)]θ to be the bias of TT.

Asymptotically unbiased

If B(T)0B(T)0 as the sample size nn, then we say that T(X)T(X) is asymptotically unbiased for θθ.

Quality 2: Small variance

Efficiency

If two estimators T1(X)T1(X) and T2(X)T2(X) are both unbiased for θθ, then T1(X)T1(X) is said to be more efficient than T2(X)T2(X) if
var(T1(X))<var(T2(X)).var(T1(X))<var(T2(X)).

We would ideally like an estimator that is unbiased with a small variance. So given multiple unbiased estimators, we choose the most efficient estimator (the estimator with the smallest variance).

For comparing an estimator with a biased estimator, we can use the mean-square error to quantify the trade-off between bias and variance:

Mean-square error

The mean-square error of an estimator is defined by

MSE(T)=E[(T(X)θ)2].MSE(T)=E[(T(X)θ)2].

Prove MSE(T)=var(T)+(B(T))2MSE(T)=var(T)+(B(T))2.

Watch Video 16 for the proof of Example 9.3.5.

Video 16: Derivation of MSE

Proof of Example 9.3.5.
The first step is to note that we can write
T(X)θ=T(X)E[T(X)]+E[T(X)]θ=T(X)E[T(X)]+B(T).T(X)θ=T(X)E[T(X)]+E[T(X)]θ=T(X)E[T(X)]+B(T).
Therefore
E[(T(X)θ)2]=E[(T(X)E[T(X)]+B(T))2]=E[(T(X)E[T(X)])2+2B(T)(T(X)E[T(X)])+B(T)2]=E[(T(X)E[T(X)])2]+2E[B(T)(T(X)E[T(X)])]+E[B(T)2].E[(T(X)θ)2]=E[(T(X)E[T(X)]+B(T))2]=E[(T(X)E[T(X)])2+2B(T)(T(X)E[T(X)])+B(T)2]=E[(T(X)E[T(X)])2]+2E[B(T)(T(X)E[T(X)])]+E[B(T)2].
Since B(T)B(T) is a constant, the middle term in the above equation is
2E[B(T)(T(X)E[T(X)])]=2B(T)E[T(X)E[T(X)]]=2B(T){E[T(X)]E[T(X)]}=0.2E[B(T)(T(X)E[T(X)])]=2B(T)E[T(X)E[T(X)]]=2B(T){E[T(X)]E[T(X)]}=0.
Therefore, since E[(T(X)E[T(X)])2]=var(T(X))E[(T(X)E[T(X)])2]=var(T(X)), we have that
E[(T(X)θ)2]=var(T(X))+0+B(T)2E[(T(X)θ)2]=var(T(X))+0+B(T)2

as required.

Quality 3: Consistency

Consistency

An estimator T(X)T(X) is said to be a consistent estimator for θθ if

T(X)pθ, as n.T(X)pθ, as n.
Remember convergence in probability (pp) is defined in Section 7.4, and the definition of consistency implies that, for any ϵ>0ϵ>0,
P(|T(X)θ|>ϵ)0 as n.P(|T(X)θ|>ϵ)0 as n.

That is, as nn becomes large the probability that T(X)T(X) differs from θθ by more than ϵϵ, for any positive ϵϵ, becomes small and goes to 0 as nn.

This third desirable property can sometimes be established using the following theorem:

Consistency Theorem

If E[T(X)]θE[T(X)]θ and Var(T(X))0Var(T(X))0 as nn, then T(X)T(X) is a consistent estimator for θθ.

Note that the Consistency Theorem gives sufficient but not necessary conditions for consistency. Since by Example 9.3.5 MSE(T)=var(T)+(B(T))2MSE(T)=var(T)+(B(T))2, the Consistency Theorem implies that if MSE(T)0MSE(T)0 as nn, then T(X)T(X) is a consistent estimator for θθ.


Suppose X1,X2,,XnX1,X2,,Xn is a random sample from any population with mean μμ and variance σ2σ2. The sample mean is ˉX=1nni=1Xi¯X=1nni=1Xi and is an estimator of μμ. What are the properties of ˉX¯X?

Firstly, we can show that ˉX¯X is unbiased:
E[ˉX]=E[1n(X1+X2++Xn)]=1n{E[X1]+E[X2]++E[Xn]}=1n{μ+μ++μ}=1nnμ=μ.E[¯X]=E[1n(X1+X2++Xn)]=1n{E[X1]+E[X2]++E[Xn]}=1n{μ+μ++μ}=1nnμ=μ.
The variance of ˉX is σ2n since:
var(ˉX)=var(1nni=1Xi)=1n2ni=1Var(Xi)=1n2ni=1σ2=1n2nσ2=σ2n.

Given that ˉX is an unbiased estimator the mean-square error of ˉX is equal to var(ˉX)=σ2n.

Since E[ˉX]μ and var(ˉX)0 as n, it follows from the Consistency Theorem that ˉX is a consistent estimator for μ.


We return to Average Income Example concerning the average annual income in the UK.

It follows from Example 9.3.8 that
T1(X)=X1+X2++Xnn

is an unbiased and consistent estimator of the mean annual income.

Let L denote the lowest annual income in the UK. Then
T2(X)=min{X1,X2,,Xn}L as n.

Except in the case n=1, the mean of T2(X) will be below the mean annual income (the exact value will depend on the distribution of annual incomes) and will become smaller as n increases with the limit L as n.

The final estimator T3(X)=X1 is unbiased as E[X1] is the average annual income. However, for all n=1,2,, var(T3(X))=var(X1) and unless the annual income is constant, var(X1)>0. Therefore T3(X) is not a consistent estimator since the estimator, and hence its variance, does not change as we increase the sample size.

9.4 Sample Variance

Variance Estimator

Suppose X1,X2,,Xn is a random sample from any population with mean μ and variance σ2. Consider the estimator
ˆσ2 =1nni=1(XiˉX)2.

Before considering the estimator ˆσ2 in Example 9.4.1 we prove Lemma 9.4.2 which is useful in manipulating sums of squares.

Splitting square

ni=1(Xiμ)2=ni=1(XiˉX)2+ni=1(ˉXμ)2=ni=1(XiˉX)2+n(ˉXμ)2.
The proof uses the same approach to that given for the MSE(T) in Example 9.3.5 in that we can write
ni=1(Xiμ)2=ni=1(XiˉX+ˉXμ)2=ni=1{(XiˉX)2+2(XiˉX)(ˉXμ)+(ˉXμ)2}=ni=1(XiˉX)2+2(ˉXμ)ni=1(XiˉX)+ni=1(ˉXμ)2.
Note that
ni=1(XiˉX)=ni=1XinˉX=nˉXnˉX=0,

and the Lemma follows.


Lemma 9.4.2 is an example of a common trick in statistics. Suppose that we have Ai=Bi+K (i=1,2,,n) such that ni=1Bi=0, then
ni=1A2i=ni=1(Bi+K)2=ni=1B2i+nK2.

We check whether the variance estimator ˆσ2 is biased or unbiased:

E[ˆσ2]=E[1nni=1(XiˉX)2]=E[1nni=1(Xiμ)21nni=1(ˉXμ)2]=1nni=1E[(Xiμ)2]1nni=1E[(ˉXμ)2]=1nni=1Var(Xi)1nni=1Var(ˉX)=1nnσ21nnσ2n=(n1)σ2n.

Hence E[ˆσ2]σ2=Var(Xi) and so ˆσ2 is a biased, although asymptotically unbiased, estimator for σ2. Under weak additional conditions, such as E[X41]<, it can be shown that ˆσ2 is a consistent estimator.

It follows from Variance Estimator that given a random sample X1,X2,,Xn, the quantity,
s2=nn1ˆσ2=1n1ni=1(XiˉX)2

is an unbiased estimator of σ2. This is the definition of the sample variance that we gave in Section 2.3.

It can be shown that s2=1n1(ni=1X2i(ni=1Xi)2n)=1n1(ni=1X2inˉX2).

Sample variance and covariance

Given observed data x1,x2,,xn, then we define the sample variance by
s2x=1n1ni=1(ˉxiˉx)2=1n1(ni=1x2i(ni=1xi)2n)=1n1(ni=1x2inˉx2).
Similarly, if we have data pairs (x1,y1),(x2,y2),,(xn,yn) we define the sample covariance by:
sxy=1n1ni=1(xiˉx)(yiˉy).

Task: Session 5

Attempt the R Markdown file for Session 5:
Session 5: Estimators


Student Exercises

Attempt the exercises below.


Suggest a reasonable statistical model for each of the following situations, and say which parameter or function of the parameter(s) in the model is likely to be of main interest:

  1. The number of reportable accidents that occur in the University in the month of October is ascertained, with a view to estimating the overall accident rate for the academic year;
  2. In a laboratory test the times to failure of 10 computer hard disk units are measured, to enable the manufacturer to quote for the mean time to failure in sales literature.

Of course in practice one needs to check whether the suggested models are reasonable, e.g. by examining a histogram.

Solution to Exercise 9.1.
  1. The number of October accidents could be Po(θ) (if accidents occurred at random and independently).
    Parameter: θ, the expected number of accidents per month.
    Function of parameter of interest is 12θ.
  2. Failure times T1,T2,,T10 could be independent Exp(θ) (if disk failures occurred at random and independently).
    Function of parameter of interest is the mean failure time, 1/θ.



Suppose that a surveyor is trying to determine the area of a rectangular field, in which the measured length Y1 and the measured width Y2 are independent random variables taking values according to the following distributions:
y181011p(y1)0.250.250.5y246p(y2)0.50.5

The calculated area A=Y1Y2 is also a random variable, and is used to estimate the true area. If the true length and width are 10 and 5, respectively.

  1. Is Y1 an unbiased estimator of the true length?
  2. Is Y2 an unbiased estimator of the true width?
  3. Is A an unbiased estimator of the true area?
Solution to Exercise 9.2.
  1. Yes Y1 is an unbiased estimator, since
    E[Y1]=8×0.25+10×0.25+11×0.5=10.
  2. Yes Y2 is an unbiased estimator, since
    E[Y2]=4×0.5+6×0.5=5.
  3. Yes A is an unbiased estimator, since by independence
    E[A]=E[Y1Y2]=E[Y1]E[Y2]
    and therefore
    E[A]=10×5=50.