5.4 OLS properties

The Gauss-Markov Theorem: given the assumptions (1)-(7), the least squares estimator OLS, in class of all unbiased linear estimators, has minimum variance -> BLUE (Best Linear Unbiased Estimator)

  • If the OLS estimator is linear and unbiased and at the same time has the smallest variance then it is efficient (in finite samples)

  • As the sample size increases indefinitely (n) variance of the OLS estimator converges to zero! This property is called consistency!

  • Also, as the sample size increases distribution of the OLS estimator asymptotically approaches to the normal distribution (according to Central Limit Theorem)

  • Considering relations  ˆβ=(xTx)1xTy  and  y=xβ+u  the expectation and the variance of the OLS estimator ˆβ can be determined

ˆβ=(xTx)1xTy=(xTx)1xT(xβ+u)=β+(xTx)1xTu

  • By taking expectation of the last equation in (5.23), where only u is a random vector, we get E(ˆβ)=β+(xTx)1xTE(u)=β

  • According to result in (5.24) the estimator’s expectation is equal to the vector of the true parameters. This proves that OLS estimator is unbiased!

  • Difference between ˆβ and β is estimator’s biasedness

ˆββ=(xTx)1xTu

  • Variance of the OLS estimator can me determined by taking the square of it’s biasedness

Var(ˆβ)=Γ=E((ˆββ)(ˆββ)T)=E((xTx)1xTuuTx(xTx)1)=(xTx)1xTE(uuT)x(xTx)1=σ2u(xTx)1

  • According to results in (5.24) and (5.26) we know that expectation of an estimator ˆβj is equal to the parameter βj with variance Var(ˆβj)=σ2udiagj(xTx)1

  • The square root of the estimator’s variance (5.27) is called standard error se(ˆβj)

Exercise 26. Two estimated models are given as: (1)   yi=3.54+2.87xi+ˆui                 (2)   yi=4.11+1.96xi+0.53zi+ˆui

  1. Which problem appears in model (1) when variable z is omitted although it is relevant?
    Solution If a relevant variable is omitted from a model (1) it leads to a problem called omitted variable bias. This means that estimated coefficient 2.87 is biased (it does not reflect the true relationships between x and y). Omitting relevant variables can also lead to an endogeneity problem.
  2. Which problem appears in model (2) when variable z is not omitted although it is irrelevant?
    Solution Including irrelevant variable typically leads to inefficiency problem (having higher standard errors for the estimated coefficients of the relevant variables).
  3. Why the slope coefficient with respect to variable x is changed in model (2) after including variable z?
    Solution The slope coefficient with respect to variable x changes to 1.96 after including variable z in model (2), because x and z are highly correlated. Including any additional independent variable can lead to the multicollinearity problem (highly correlated RHS variables). This also means that x effects y both directly and indirectly (concept of mediation role of variable z).
  4. In which case the slope coefficient with respect to variable x should not change?
    Solution The slope coefficient with respect to variable x should not change if x and z are zero correlated.