## 3.3 Matrix algebra for the normal equations

Let’s start with a definition you might have seen before.

Remember $$I$$? That’s the identity matrix: a square matrix with 1’s on the diagonal and 0’s everywhere else. If you multiply any (appropriately sized) matrix by $$I$$, you get your original matrix back, which is why it’s called the “identity.”

Invertible matrix: A square matrix $$A^{-1}$$ such that $$A^{-1}A = I$$.

Handy facts about invertible matrices:

• A matrix is invertible if it is nonsingular.
• A square matrix is nonsingular if its rank is equal to the number of rows or columns (full rank).
• The determinant of a matrix $$A$$ is 0 if and only if $$A$$ is singular.

If we want to solve the normal equations, we can’t multiply by $$\boldsymbol X'^{-1}$$ because that doesn’t exist (since $$\boldsymbol{X}'$$ is not a square matrix). So instead, we multiply by $$(\boldsymbol X' \boldsymbol X)^{-1}$$ and obtain:

$\boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y.$

Yeah, it’s still “least squares!” Remember, we got those normal equations by minimizing the norm of the residual vector – and that norm was a sum of squares.

This is the least squares solution!

For this to work we need $$\boldsymbol X' \boldsymbol X$$ to be non-singular, which means that $$\boldsymbol{X}$$ has to be of full rank – no column of $$\boldsymbol{X}$$ can be linearly dependent on the others.

Link this full-rank constraint back to actual regression: if we have two variables $$x_{k_1}$$ and $$x_{k_2}$$ with identical (linearly speaking) measurements on all the observations, we have confounding. In context, this means we can’t tell the difference between the effects of predictors $$k_1$$ and $$k_2$$. So there’s no single best option for $$\boldsymbol{b}$$, since the coefficients $$b_{k_1}$$ and $$b_{k_2}$$ aren’t distinguishable from each other.

Side note: in R, the matrix inverse command is solve() and transpose is t(X), so you’d say:

b = solve(t(X)%*%X)%*%t(X)%*%y

That’s nice, in that it’s simple to do, but it’s worth noting that it doesn’t always work right. Inverting a matrix can involve going back and forth between very small and very large numbers, and if a little bit of machine error or rounding sneaks in there, the final answer can be off – or even inconsistent. We’d better try and get an analytical solution….