Chapter 9 Parameter Estimation
9.1 Introduction
In this Section, we consider the general definition of a statistic as a summary of a random sample. Statistics are used as estimators of population quantities with an estimate denoting a given realisation of an estimator. We explore key properties that we wish estimators to have such as unbiasedness, efficiency and consistency. We study the properties of the sample mean and sample variance as estimators of the population mean and variance, respectively.
9.2 Preliminaries
A statistic, , is any function of the random sample.
Note that since is a function of random variables, it is also a random variable. Hence it will also have all the properties of a random variable. Most importantly, it has a distribution associated with it.
A statistic that is used for the purpose of estimating an unknown population parameter is called an estimator.
A realised value of an estimator, , that is the value of evaluated at a particular outcome of the random sample, is called an estimate.
That is, if we let then is a random variable and is a realisation of the random variable based on the sample . The properties of the estimator will typically depend upon , the number of observations in the random sample.
Average Income
Suppose that we want to estimate the average annual income in the U.K. Let be a random sample of annual incomes. Possible estimators might include:
- ;
- ;
- .
Which of these is the best choice of estimator?
9.3 Judging estimators
Let be a population parameter we wish to estimate. Since any function of the sample data is a potential estimator of , how should we determine whether an estimator is good or not? What qualities should our estimator have?
Quality 1: Unbiasedness
The estimator is an unbiased estimate of if Otherwise, we say that the estimator is biased and we define to be the bias of .
If as the sample size , then we say that is asymptotically unbiased for .
Quality 2: Small variance
We would ideally like an estimator that is unbiased with a small variance. So given multiple unbiased estimators, we choose the most efficient estimator (the estimator with the smallest variance).
For comparing an estimator with a biased estimator, we can use the mean-square error to quantify the trade-off between bias and variance:
The mean-square error of an estimator is defined by
Prove .
Watch Video 16 for the proof of Exercise 1 or alternatively the proof is available:
Proof of Exercise 1
as required.
Video 16: Derivation of MSE
Quality 3: Consistency
An estimator is said to be a consistent estimator for if
That is, as becomes large the probability that differs from by more than , for any positive , becomes small and goes to 0 as .
This third desirable property can sometimes be established using the following theorem:
Consistency Theorem
If and as , then is a consistent estimator for .
Note that the Consistency Theorem gives sufficient but not necessary conditions for consistency. Since by Exercise 1 , the Consistency Theorem implies that if as , then is a consistent estimator for .
Suppose is a random sample from any population with mean and variance . The sample mean is and is an estimator of . What are the properties of ?
Given that is an unbiased estimator the mean-square error of is equal to , .
Since and as , it follows from the Consistency Theorem that is a consistent estimator for .
We return to Average Income Example concerning the average annual income in the UK.
It follows from Exercise 2 thatis an unbiased and consistent estimator of the mean annual income.
Let denote the lowest annual income in the UK. ThenExcept in the case , the mean of will be below the mean annual income (the exact value will depend on the distribution of annual incomes) and will become smaller as increases with the limit as .
The final estimator is unbiased as is the average annual income. However, for all , and unless the annual income is constant, . Therefore is not a consistent estimator since the estimator, and hence its variance, does not change as we increase the sample size.
9.4 Sample Variance
Variance Estimator
Suppose is a random sample from any population with mean and variance . Consider the estimatorBefore considering the estimator in Example 2 we prove Lemma 2 which is useful in manipulating sums of squares.
Splitting square
and the Lemma follows.
We check whether the variance estimator is biased or unbiased:
Hence and so is a biased, although asymptotically unbiased, estimator for . Under weak additional conditions, such as , it can be shown that is a consistent estimator.
It follows from Variance Estimator that given a random sample , the quantity,is an unbiased estimator of . This is the definition of the sample variance that we gave in Section 2.3.
It can be shown that
Given observed data then we define the sample variance by
Student Exercises
Attempt the exercises below.
Question 1.
Suggest a reasonable statistical model for each of the following situations, and say which parameter or function of the parameter(s) in the model is likely to be of main interest:
- The number of reportable accidents that occur in the University in the month of October is ascertained, with a view to estimating the overall accident rate for the academic year;
- In a laboratory test the times to failure of 10 computer hard disk units are measured, to enable the manufacturer to quote for the mean time to failure in sales literature.
Of course in practice one needs to check whether the suggested models are reasonable, e.g. by examining a histogram.
Solution to Question 1.
- The number of October accidents could be (if accidents occurred at random and independently).
Parameter: , the expected number of accidents per month.
Function of parameter of interest is .
- Failure times could be independent (if disk failures occurred at random and independently).
Function of parameter of interest is the mean failure time, .
Question 2.
Suppose that a surveyor is trying to determine the area of a rectangular field, in which the measured length and the measured width are independent random variables taking values according to the following distributions:The calculated area is also a random variable, and is used to estimate the true area. If the true length and width are 10 and 5, respectively.
- Is an unbiased estimator of the true length?
- Is an unbiased estimator of the true width?
- Is an unbiased estimator of the true area?
Solution to Question 2.
- Yes is an unbiased estimator, since
- Yes is an unbiased estimator, since
- Yes is an unbiased estimator, since by independence