5.5 The Gauss-Markov Theorem
When estimating regression models, we know that the results of the estimation procedure are random. However, when using unbiased estimators, at least on average, we estimate the true parameter. When comparing different unbiased estimators, it is therefore interesting to know which one has the highest precision: being aware that the likelihood of estimating the exact value of the parameter of interest is 00 in an empirical application, we want to make sure that the likelihood of obtaining an estimate very close to the true value is as high as possible. This means we want to use the estimator with the lowest variance of all unbiased estimators, provided we care about unbiasedness. The Gauss-Markov theorem states that, in the class of conditionally unbiased linear estimators, the OLS estimator has this property under certain conditions.
Key Concept 5.5
The Gauss-Markov Theorem for ˆβ1^β1
Suppose that the assumptions made in Key Concept 4.3 hold and that the errors are homoskedastic. The OLS estimator is the best (in the sense of smallest variance) linear conditionally unbiased estimator (BLUE) in this setting.
Let us have a closer look at what this means:
Estimators of β1β1 that are linear functions of the Y1,…,YnY1,…,Yn and that are unbiased conditionally on the regressor X1,…,XnX1,…,Xn can be written as ∼β1=n∑i=1aiYi∼β1=n∑i=1aiYi where the aiai are weights that are allowed to depend on the XiXi but not on the YiYi.
We already know that ∼β1∼β1 has a sampling distribution: ∼β1∼β1 is a linear function of the YiYi which are random variables. If now E(∼β1|X1,…,Xn)=β1,E(∼β1|X1,…,Xn)=β1, ∼β1∼β1 is a linear unbiased estimator of β1β1, conditionally on the X1,…,XnX1,…,Xn.
- We may ask if ∼β1∼β1 is also the best estimator in this class, i.e., the most efficient one of all linear conditionally unbiased estimators where “most efficient” means smallest variance. The weights aiai play an important role here and it turns out that OLS uses just the right weights to have the BLUE property.
Simulation Study: BLUE Estimator
Consider the case of a regression of Yi,…,YnYi,…,Yn only on a constant. Here, the YiYi are assumed to be a random sample from a population with mean μμ and variance σ2σ2. The OLS estimator in this model is simply the sample mean, see Chapter 3.2.
ˆβ1=n∑i=11n⏟=aiYiClearly, each observation is weighted by
ai=1n.
and we also know that Var(ˆβ1)=σ2n.
We now use R to conduct a simulation study that demonstrates what happens to the variance of (5.3) if different weights wi=1±ϵn are assigned to either half of the sample Y1,…,Yn instead of using 1n, the OLS weights.
# set sample size and number of repetitions
n <- 100
reps <- 1e5
# choose epsilon and create a vector of weights as defined above
epsilon <- 0.8
w <- c(rep((1 + epsilon) / n, n / 2),
rep((1 - epsilon) / n, n / 2) )
# draw a random sample y_1,...,y_n from the standard normal distribution,
# use both estimators 1e5 times and store the result in the vectors 'ols' and
# 'weightedestimator'
ols <- rep(NA, reps)
weightedestimator <- rep(NA, reps)
for (i in 1:reps) {
y <- rnorm(n)
ols[i] <- mean(y)
weightedestimator[i] <- crossprod(w, y)
}
# plot kernel density estimates of the estimators' distributions:
# OLS
plot(density(ols),
col = "purple",
lwd = 3,
main = "Density of OLS and Weighted Estimator",
xlab = "Estimates")
# weighted
lines(density(weightedestimator),
col = "steelblue",
lwd = 3)
# add a dashed line at 0 and add a legend to the plot
abline(v = 0, lty = 2)
legend('topright',
c("OLS", "Weighted"),
col = c("purple", "steelblue"),
lwd = 3)
Hide Source
Hide Plot
What conclusion can we draw from the result?
- Both estimators seem to be unbiased: the means of their estimated distributions are zero.
- The estimator using weights that deviate from those implied by OLS is less efficient than the OLS estimator: there is higher dispersion when weights are wi=1±0.8100 instead of wi=1100 as required by the OLS solution.
Hence, the simulation results support the Gauss-Markov Theorem.