3.1 Estimation of the Population Mean
Key Concept 3.1
Estimators and Estimates
Estimators are functions of sample data drawn from an unknown population. Estimates are numeric values computed by estimators based on the sample data. Estimators are random variables because they are functions of random data. Estimates are nonrandom numbers.
Think of some economic variable, for example hourly earnings of college graduates, denoted by . Suppose we are interested in the mean of . In order to exactly calculate we would have to interview every working graduate in the economy. We simply cannot do this due to time and cost constraints. However, we can draw a random sample of i.i.d. observations and estimate using one of the simplest estimators in the sense of Key Concept 3.1 one can think of, that is,
the sample mean of . Then again, we could use an even simpler estimator for : the very first observation in the sample, . Is a good estimator? For now, assume that
which is not too unreasonable as hourly income is non-negative and we expect many hourly earnings to be in a range of to . Moreover, it is common for income distributions to be skewed to the right — a property of the distribution.
# plot the chi_12^2 distribution
curve(dchisq(x, df=12),
from = 0,
to = 40,
ylab = "density",
xlab = "hourly earnings in Euro")
Hide Source
Hide Plot
We now draw a sample of observations and take the first observation as an estimate for
# set seed for reproducibility
set.seed(1)
# sample from the chi_12^2 distribution, keep only the first observation
rchisq(n = 100, df = 12)[1]
## [1] 8.257893
The estimate is not too far away from but it is somewhat intuitive that we could do better: the estimator discards a lot of information and its variance is the population variance:
This brings us to the following question: What is a good estimator of an unknown parameter in the first place? This question is tackled in Key Concepts 3.2 and 3.3.
Key Concept 3.2
Bias, Consistency and Efficiency
Desirable characteristics of an estimator include unbiasedness, consistency and efficiency.
Unbiasedness:
If the mean of the sampling distribution of some estimator for the population mean equals , the estimator is unbiased for . The bias of then is :
Consistency:
We want the uncertainty of the estimator to decrease as the number of observations in the sample grows. More precisely, we want the probability that the estimate falls within a small interval around the true value to get increasingly closer to as grows. We write this as
Variance and efficiency:
We want the estimator to be efficient. Suppose we have two estimators, and and for some given sample size it holds that
but
We then prefer to use as it has a lower variance than , meaning that is more efficient in using the information provided by the observations in the sample.