3.7 The hat matrix

Let’s hop back to the matrix form of the normal equations for a minute: \boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y

Now, if we want to get actual predictions out of this vector of coefficients we’re estimating, we do it with multiplication: \boldsymbol{Xb}. But according to our equations, \hat{\boldsymbol y}=\boldsymbol{Xb} = \boldsymbol X((\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y) or rather: \hat{\boldsymbol y}= \boldsymbol{Xb} = (\boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X')\boldsymbol{y} So the vector of predicted values is achieved by multiplying \boldsymbol{y} with a matrix. This is a projection matrix; it projects the \boldsymbol{y} vector down onto the column space of \boldsymbol{X} where our predictions must lie. Because it’s the matrix that “turns \boldsymbol{y} into \hat{\boldsymbol y},” it’s called the hat matrix, \boldsymbol H. The hat matrix has some other cool features too:

  • Square
  • Symmetric
  • Idempotent: \boldsymbol H \boldsymbol H = \boldsymbol H. This makes sense since it’s a projection matrix: once you project down into Col(\boldsymbol X), you’re already there, so projecting again won’t do anything. So \boldsymbol{Hc} = \boldsymbol{HHc} for any \boldsymbol c!