3.3 Matrix algebra for the normal equations

Let’s start with a definition you might have seen before.

Remember $I$ ? That’s the identity matrix: a square matrix with 1’s on the diagonal and 0’s everywhere else. If you multiply any (appropriately sized) matrix by $I$ , you get your original matrix back, which is why it’s called the “identity.”

Invertible matrix: A square matrix $A^{-1}$ such that $A^{-1}A = I$ .

Handy facts about invertible matrices:

A matrix is invertible if it is nonsingular.
A square matrix is nonsingular if its rank is equal to the number of rows or columns (full rank).
The determinant of a matrix $A$ is 0 if and only if $A$ is singular.

If we want to solve the normal equations, we can’t multiply by $\boldsymbol X'^{-1}$ because that doesn’t exist (since $\boldsymbol{X}'$ is not a square matrix). So instead, we multiply by $(\boldsymbol X' \boldsymbol X)^{-1}$ and obtain:

$\boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y.$

Yeah, it’s still “least squares!” Remember, we got those normal equations by minimizing the norm of the residual vector – and that norm was a sum of squares.

This is the least squares solution!

For this to work we need $\boldsymbol X' \boldsymbol X$ to be non-singular, which means that $\boldsymbol{X}$ has to be of full rank – no column of $\boldsymbol{X}$ can be linearly dependent on the others.

Link this full-rank constraint back to actual regression: if we have two variables $x_{k_1}$ and $x_{k_2}$ with identical (linearly speaking) measurements on all the observations, we have confounding. In context, this means we can’t tell the difference between the effects of predictors $k_1$ and $k_2$ . So there’s no single best option for $\boldsymbol{b}$ , since the coefficients $b_{k_1}$ and $b_{k_2}$ aren’t distinguishable from each other.

Side note: in R, the matrix inverse command is solve() and transpose is t(X), so you’d say:

b = solve(t(X)%*%X)%*%t(X)%*%y

That’s nice, in that it’s simple to do, but it’s worth noting that it doesn’t always work right. Inverting a matrix can involve going back and forth between very small and very large numbers, and if a little bit of machine error or rounding sneaks in there, the final answer can be off – or even inconsistent. We’d better try and get an analytical solution….