2.7 The hat matrix
Let’s hop back to the matrix form of the normal equations for a minute: \[ \boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y\]
Now, if we want to get actual predictions out of this vector of coefficients we’re estimating, we do it with multiplication: \(\boldsymbol{Xb}\). But according to our equations, \[\hat{\boldsymbol y}=\boldsymbol{Xb} = \boldsymbol X((\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y)\] or rather: \[\hat{\boldsymbol y}= \boldsymbol{Xb} = (\boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X')\boldsymbol{y}\] So the vector of predicted values is achieved by multiplying \(\boldsymbol{y}\) with a matrix. This is a projection matrix; it projects the \(\boldsymbol{y}\) vector down onto the column space of \(\boldsymbol{X}\) where our predictions must lie. Because it’s the matrix that “turns \(\boldsymbol{y}\) into \(\hat{\boldsymbol y}\),” it’s called the hat matrix, \(\boldsymbol H\). The hat matrix has some other cool features too:
- Square
- Symmetric
- Idempotent: \(\boldsymbol H \boldsymbol H = \boldsymbol H\). This makes sense since it’s a projection matrix: once you project down into \(Col(\boldsymbol X)\), you’re already there, so projecting again won’t do anything. So \(\boldsymbol{Hc} = \boldsymbol{HHc}\) for any \(\boldsymbol c\)!