5.4 OLS properties
The Gauss-Markov Theorem: given the assumptions (1)-(7), the least squares estimator OLS, in class of all unbiased linear estimators, has minimum variance -> BLUE (Best Linear Unbiased Estimator)
If the OLS estimator is linear and unbiased and at the same time has the smallest variance then it is efficient (in finite samples)
As the sample size increases indefinitely (n→∞) variance of the OLS estimator converges to zero! This property is called consistency!
Also, as the sample size increases distribution of the OLS estimator asymptotically approaches to the normal distribution (according to Central Limit Theorem)
Considering relations ˆβ=(xTx)−1xTy and y=xβ+u the expectation and the variance of the OLS estimator ˆβ can be determined
ˆβ=(xTx)−1xTy=(xTx)−1xT(xβ+u)=β+(xTx)−1xTu
By taking expectation of the last equation in (5.23), where only u is a random vector, we get E(ˆβ)=β+(xTx)−1xTE(u)=β
According to result in (5.24) the estimator’s expectation is equal to the vector of the true parameters. This proves that OLS estimator is unbiased!
Difference between ˆβ and β is estimator’s biasedness
ˆβ−β=(xTx)−1xTu
- Variance of the OLS estimator can me determined by taking the square of it’s biasedness
Var(ˆβ)=Γ=E((ˆβ−β)(ˆβ−β)T)=E((xTx)−1xTuuTx(xTx)−1)=(xTx)−1xTE(uuT)x(xTx)−1=σ2u(xTx)−1
According to results in (5.24) and (5.26) we know that expectation of an estimator ˆβj is equal to the parameter βj with variance Var(ˆβj)=σ2udiagj(xTx)−1
The square root of the estimator’s variance (5.27) is called standard error se(ˆβj)
Exercise 26. Two estimated models are given as: (1) yi=3.54+2.87xi+ˆui (2) yi=4.11+1.96xi+0.53zi+ˆui
-
Which problem appears in model (1) when variable z is omitted although it is relevant?
Solution
If a relevant variable is omitted from a model (1) it leads to a problem called omitted variable bias. This means that estimated coefficient 2.87 is biased (it does not reflect the true relationships between x and y). Omitting relevant variables can also lead to an endogeneity problem. -
Which problem appears in model (2) when variable z is not omitted although it is irrelevant?
Solution
Including irrelevant variable typically leads to inefficiency problem (having higher standard errors for the estimated coefficients of the relevant variables). -
Why the slope coefficient with respect to variable x is changed in model (2) after including variable z?
Solution
The slope coefficient with respect to variable x changes to 1.96 after including variable z in model (2), because x and z are highly correlated. Including any additional independent variable can lead to the multicollinearity problem (highly correlated RHS variables). This also means that x effects y both directly and indirectly (concept of mediation role of variable z). -
In which case the slope coefficient with respect to variable x should not change?
Solution
The slope coefficient with respect to variable x should not change if x and z are zero correlated.