4.6 Moments of coefficient estimates, MLR matrix edition
Let’s start with the expected value. If we consider our whole vector of coefficient estimates, \(\boldsymbol{b}\), what is the expected value of this vector?
Check yourself! Why is the expected value of the vector \(\boldsymbol{y}\) equal to \(\boldsymbol{X\beta}\)?
\[ E({\boldsymbol b}) = E[(\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y]\\ = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' E[{\boldsymbol y}]\\ = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol X \boldsymbol \beta\\ = \boldsymbol \beta \]
How about variance? Well, since we have a whole vector of coefficient estimates, we need a variance-covariance matrix: \[ \begin{aligned} Var({\boldsymbol b}) &= Var[ (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y]\\ &= (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X'Var[{\boldsymbol y}] ((\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X')'\\ &= (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X'Var[{\boldsymbol y}] \boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\\ &= (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X'\sigma^2 \boldsymbol I \boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\\ &= \sigma^2 (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\\ &= \sigma^2(\boldsymbol X' \boldsymbol X)^{-1} \end{aligned} \]
Let’s check what happens with the simple linear regression case, with just an intercept and a single slope coefficient. Previously we found \((\boldsymbol{X}'\boldsymbol{X})^{-1}\) for this scenario, so let’s use it!
\[ \begin{aligned} (\boldsymbol X' \boldsymbol X)^{-1} &= \frac{1}{nS_{xx}}\left(\begin{array}{cc} \sum_{i=1}^nx_i^2&-\sum_{i=1}^nx_i\\ -\sum_{i=1}^n x_i&n \end{array}\right)\\ &= \frac{1}{S_{xx}}\left(\begin{array}{cc} n^{-1}\sum_{i=1}^nx_i^2&-\bar{x}\\ -\bar{x}&1 \end{array}\right)\\ &= \frac{1}{S_{xx}}\left(\begin{array}{cc} n^{-1}(\sum_{i=1}^nx_i^2 - n\bar{x}^2+ n\bar{x}^2)&-\bar{x}\\ -\bar{x}&1 \end{array}\right)\\ &= \frac{1}{S_{xx}}\left(\begin{array}{cc} n^{-1}S_{xx} + \bar{x}^2&-\bar{x}\\ -\bar{x}&1 \end{array}\right)\\ &= \left(\begin{array}{cc} \frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}&\frac{-\bar{x}}{S_{xx}}\\ \frac{-\bar{x}}{S_{xx}}&\frac{1}{S_{xx}} \end{array}\right)\\ \end{aligned} \] So
\[ \begin{aligned} \sigma^2_{\varepsilon}(\boldsymbol X' \boldsymbol X)^{-1} &= \left(\begin{array}{cc} \sigma^2_{\varepsilon}(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}})&\frac{-\bar{x}\sigma^2_{\varepsilon}}{S_{xx}}\\ \frac{-\bar{x}\sigma^2_{\varepsilon}}{S_{xx}}&\frac{\sigma^2_{\varepsilon}}{S_{xx}} \end{array}\right)\\ \end{aligned} \]
The diagonal elements of the variance-covariance matrix are the variances of the individual vector components (so \(Var(b_0)\) and \(Var(b_1)\) here). If you’ve seen formulations for the variance or standard error of \(b_0\) or \(b_1\) before, these are equivalent – though you might not have used the sum-of-squares notation previously. Meanwhile, on the off-diagonal, we have the covariance of \(b_0\) and \(b_1\). Are they independent?