## 3.7 The hat matrix

Let’s hop back to the matrix form of the normal equations for a minute: $\boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y$

Now, if we want to get actual predictions out of this vector of coefficients we’re estimating, we do it with multiplication: $$\boldsymbol{Xb}$$. But according to our equations, $\hat{\boldsymbol y}=\boldsymbol{Xb} = \boldsymbol X((\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y)$ or rather: $\hat{\boldsymbol y}= \boldsymbol{Xb} = (\boldsymbol X(\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X')\boldsymbol{y}$ So the vector of predicted values is achieved by multiplying $$\boldsymbol{y}$$ with a matrix. This is a projection matrix; it projects the $$\boldsymbol{y}$$ vector down onto the column space of $$\boldsymbol{X}$$ where our predictions must lie. Because it’s the matrix that “turns $$\boldsymbol{y}$$ into $$\hat{\boldsymbol y}$$,” it’s called the hat matrix, $$\boldsymbol H$$. The hat matrix has some other cool features too:

• Square
• Symmetric
• Idempotent: $$\boldsymbol H \boldsymbol H = \boldsymbol H$$. This makes sense since it’s a projection matrix: once you project down into $$Col(\boldsymbol X)$$, you’re already there, so projecting again won’t do anything. So $$\boldsymbol{Hc} = \boldsymbol{HHc}$$ for any $$\boldsymbol c$$!