3.7 The hat matrix

Let’s hop back to the matrix form of the normal equations for a minute: $\boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y$

Now, if we want to get actual predictions out of this vector of coefficients we’re estimating, we do it with multiplication: $\boldsymbol{Xb}$ . But according to our equations, $\hat{\boldsymbol y}=\boldsymbol{Xb} = \boldsymbol X((\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y)$ or rather: $\hat{\boldsymbol y}= \boldsymbol{Xb} = (\boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X')\boldsymbol{y}$ So the vector of predicted values is achieved by multiplying $\boldsymbol{y}$ with a matrix. This is a projection matrix; it projects the $\boldsymbol{y}$ vector down onto the column space of $\boldsymbol{X}$ where our predictions must lie. Because it’s the matrix that “turns $\boldsymbol{y}$ into $\hat{\boldsymbol y}$ ,” it’s called the hat matrix, $\boldsymbol H$ . The hat matrix has some other cool features too:

Square
Symmetric
Idempotent: $\boldsymbol H \boldsymbol H = \boldsymbol H$ . This makes sense since it’s a projection matrix: once you project down into $Col(\boldsymbol X)$ , you’re already there, so projecting again won’t do anything. So $\boldsymbol{Hc} = \boldsymbol{HHc}$ for any $\boldsymbol c$ !