3.3 Matrix algebra for the normal equations

Let’s start with a definition you might have seen before.

Remember \(I\)? That’s the identity matrix: a square matrix with 1’s on the diagonal and 0’s everywhere else. If you multiply any (appropriately sized) matrix by \(I\), you get your original matrix back, which is why it’s called the “identity.”

Invertible matrix: A square matrix \(A^{-1}\) such that \(A^{-1}A = I\).

Handy facts about invertible matrices:

  • A matrix is invertible if it is nonsingular.
  • A square matrix is nonsingular if its rank is equal to the number of rows or columns (full rank).
  • The determinant of a matrix \(A\) is 0 if and only if \(A\) is singular.

If we want to solve the normal equations, we can’t multiply by \(\boldsymbol X'^{-1}\) because that doesn’t exist (since \(\boldsymbol{X}'\) is not a square matrix). So instead, we multiply by \((\boldsymbol X' \boldsymbol X)^{-1}\) and obtain:

\[ \boldsymbol{b} = (\boldsymbol X' \boldsymbol X)^{-1}\boldsymbol X' \boldsymbol y.\]

Yeah, it’s still “least squares!” Remember, we got those normal equations by minimizing the norm of the residual vector – and that norm was a sum of squares.

This is the least squares solution!

For this to work we need \(\boldsymbol X' \boldsymbol X\) to be non-singular, which means that \(\boldsymbol{X}\) has to be of full rank – no column of \(\boldsymbol{X}\) can be linearly dependent on the others.

Link this full-rank constraint back to actual regression: if we have two variables \(x_{k_1}\) and \(x_{k_2}\) with identical (linearly speaking) measurements on all the observations, we have confounding. In context, this means we can’t tell the difference between the effects of predictors \(k_1\) and \(k_2\). So there’s no single best option for \(\boldsymbol{b}\), since the coefficients \(b_{k_1}\) and \(b_{k_2}\) aren’t distinguishable from each other.

Side note: in R, the matrix inverse command is solve() and transpose is t(X), so you’d say:

b = solve(t(X)%*%X)%*%t(X)%*%y

That’s nice, in that it’s simple to do, but it’s worth noting that it doesn’t always work right. Inverting a matrix can involve going back and forth between very small and very large numbers, and if a little bit of machine error or rounding sneaks in there, the final answer can be off – or even inconsistent. We’d better try and get an analytical solution….