3.5 Elementary matrices and inverting a matrix

We now want to discuss inverses of matrices in some more detail.

Definition 3.43: (Invertible and singular matrices)
A matrix \(A\in M_{n,n}(\mathbb{R})\) is called invertible, or non-singular, if there exists a matrix \(A^{-1}\in M_{n,n}(\mathbb{R})\) such that \[A^{-1}A=I .\] If \(A\) is not invertible then it is called singular.

We will first give some properties of inverses, namely that a left inverse is as well a right inverse, and that the inverse is unique. Note that these properties are direct consequences of the corresponding properties of linear maps, a concept we will explre in Chapter 7. (In fact, they also are implied by Theorem 3.53 below.) Recall that \(M_n(\mathbb{R})=M_{n,n}(\mathbb{R})\).

Theorem 3.44:

Let \(A\in M_{n}(\mathbb{R})\) be invertible with inverse \(A^{-1}\), then

  • \(AA^{-1}=I\).

  • If \(BA=I\) for some matrix \(B\in M_{n}(\mathbb{R})\) then \(B=A^{-1}\).

  • If \(B\in M_{n}(\mathbb{R})\) is also invertible, then \(AB\) is invertible with \((AB)^{-1}=B^{-1}A^{-1}\).

We will prove this theorem later in the chapter, once we have introduced some further concepts which will help us do so.

The third property implies that arbitrary long products of invertible matrices are invertible too. The first property can can be interpreted as saying that \(A^{-1}\) is invertible too, and has the inverse \(A\), i.e., \[(A^{-1})^{-1}=A .\]3 We will now turn to the question of how to compute the inverse of a matrix. This will involve similar techniques as for solving systems of linear equations, in particular the use of elementary row operations. The first step will be to show that elementary row operations can be performed using matrix multiplication. To this end we introduce a special type of matrix.

Definition 3.45:
A matrix \(E\in M_{n}(\mathbb{R})\) is called an elementary matrix if it is obtained from the identity matrix \(I_n\) by precisely one elementary row operation.
Example 3.46:
  • Switching rows in \(I_2\) gives \(E=\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\).

  • Multiplying row 1 by \(\lambda\in \mathbb{R}\) in \(I_2\) gives \(E=\begin{pmatrix}\lambda & 0\\ 0 & 1\end{pmatrix}\).

  • Adding row 2 to row 1 in \(I_2\) gives \(E=\begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix}\).

  • Switching row 3 and row 5 in \(I_5\) gives \(\begin{pmatrix}1 & 0 &0 &0 &0\\ 0 & 1 &0&0&0\\ 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 1 & 0 & 0 \end{pmatrix}\).

Exercise 3.47:
What is the elementary matrix corresponding to adding three times row 2 to row 3 in a \(3\times 3\) matrix?
Exercise 3.48:
Are \(A=\begin{pmatrix}0&1&0\\0&0&1\\1&0&0 \end{pmatrix}\) and \(B= \begin{pmatrix}1&0\\0&0 \end{pmatrix}\) elementary matrices?

The important property of elementary matrices is the following, which says that multiplying an arbitrary matrix \(A\) on the left by an elementary matrix \(E\) is the same as applying the row operation that was used to generate \(E\) from the identity to \(A\). Before looking at this theorem, let’s consider an example to see this.

Example 3.49:
We consider the effect of multiplying a general \(2\times 2\) matrix \(\begin{pmatrix}a & b\\ c& d\end{pmatrix}\) by some of the elementary matrices from Example 3.46. We find \[\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}a & b\\ c& d\end{pmatrix}=\begin{pmatrix}c & d\\ a & b\end{pmatrix}, \quad \begin{pmatrix}\lambda & 0\\ 0 & 1\end{pmatrix}\begin{pmatrix}a & b\\ c& d\end{pmatrix}=\begin{pmatrix}\lambda a & \lambda b\\ c& d\end{pmatrix}\] and \[\begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix}\begin{pmatrix}a & b\\ c& d\end{pmatrix}=\begin{pmatrix}a+c & b+d \\ c & d\end{pmatrix},\] which correspond indeed to the associated elementary row operations.

So we now state this as a general theorem:

Theorem 3.50:

Let \(A\in M_{m,n}(\mathbb{R})\) and assume \(B\) is obtained from \(A\) by an elementary row operation with corresponding elementary matrix \(E\). Then \[B=EA .\]

Proof.

Let \(c_1,\ldots,c_n\in\mathbb{R}^m\) be the columns of \(A\), and \(f^t_1, \ldots, f^t_m\in\mathbb{R}^m\) the rows of \(E\), then the matrix \(B\) has rows \(b^t_1, \cdots , b^t_m\) with \[b^t_i=(f_i\cdot c_1\;\; f_i\cdot c_2 \;\;\cdots \;\;f_i\cdot c_n) .\] Since \(E\) is an elementary matrix, there are only four possibilities for \(f_i\):

  • if the elementary row operation didn’t change row \(i\), then \(f^t_i=e^t_i\) and \(b_i^t=a_i^t\);

  • if the elementary row operation exchanged row \(i\) and row \(j\), then \(f^t_i=e^t_j\) and \(b^t_i=a^t_j\);

  • if the elementary row operation multiplied row \(i\) by \(\lambda\), then \(f^t_i=\lambda e^t_i\) and \(b^t_i=\lambda a^t_i\);

  • if the elementary row operation added row \(j\) to row \(i\) then \(f^t_i=e^t_i+e^t_j\) and \(b^t_i=a^t_i+a^t_j\).

So we see that in all possible cases the multiplication of \(A\) by \(E\) has the same effect as applying an elementary row operation to \(A\).

Exercise 3.51:
What can we deduce about the invertibility of elementary matrices?
Click for solution

Since any elementary row operation can be undone by other elementary row operations we immediately obtain the following

Corollary 3.52:

Any elementary matrix is invertible.


Now let us see how we can use these elementary matrices. Assume we can find a sequence of \(N\) elementary row operations which transform a matrix \(A\) into the diagonal matrix \(I\) and let \(E_1, \cdots, E_N\) be the elementary matrices associated with these elementary row operations, then repeated application of Theorem 3.50 gives \(I=E_N\cdots E_2E_1A\), hence \[A^{-1}=E_N\cdots E_2E_1 .\] So we have found a representation of \(A^{-1}\) as a product of elementary matrices, but we can simplify this even further. Since \(E_N\cdots E_2E_1=E_N\cdots E_2E_1I\) we can again invoke Theorem 3.50 to conclude that \(A^{-1}\) is obtained by applying the sequence of elementary row operations to the identity matrix \(I\). This means that we don’t have to compute the elementary matrices, nor their product.

On the other hand, since \(A\) is an \(n \times n\) matrix if the reduced row echelon form is not \(I\) it must have strictly less then \(n\) leading \(1\)’s, so Theorem 3.39 implies that the equation \(Ax=0\) has infinitely many solutions, hence \(A\) can’t be invertible.

What we found is summarised in the following theorem.

Theorem 3.53:

Let \(A\in M_{n}(\mathbb{R})\), if \(A\) can be transformed by successive elementary row operations into the identity matrix, then \(A\) is invertible and the inverse is obtained by applying the same sequence of elementary row operations to \(I\).

This leads to the following algorithm, known as the Gauss-Jordan procedure:

  • Form the \(n\times 2n\) matrix \(\begin{pmatrix}A & I \end{pmatrix}\) and apply elementary row transformations until \(A\) is in reduced row echelon form \(C\), i.e, \(\begin{pmatrix}A & I \end{pmatrix}\) is transformed to \(\begin{pmatrix}C & B \end{pmatrix}\) for some matrix \(B\).

  • If the reduced row echelon form of \(A\) is \(I\), i.e., \(C=I\), then \(B=A^{-1}\).

  • If the reduced row echelon form is not \(I\), then \(A\) is not invertible.

Let us look at a few examples to see how this algorithm works.

Example 3.54:
Let \(A=\begin{pmatrix}0 & 2\\ 1 & -1\end{pmatrix}\), then \(\begin{pmatrix}A& I\end{pmatrix}=\begin{pmatrix}0 & 2 & 1 &0\\ 1 & -1 & 0 & 1\end{pmatrix}\) and consecutive elementary row operations give \[\begin{array}{cccccc} \begin{pmatrix}0 & 2 & 1 &0\\ 1 & -1 & 0 & 1\end{pmatrix}&\to \begin{pmatrix}1 & -1 & 0 & 1\\ 0 & 2 & 1 &0\end{pmatrix} &\to \begin{pmatrix}1 & -1 & 0 & 1\\ 0 & 1 & 1/2 &0\end{pmatrix} &\to \begin{pmatrix}1 & 0 & 1/2 & 1\\ 0 & 1 & 1/2 &0\end{pmatrix}, \end{array}\] and so \(A^{-1}=\begin{pmatrix}1/2 & 1\\ 1/2 & 0\end{pmatrix}\).
Example 3.55:
Let \(A=\begin{pmatrix}2 & -4\\ -1 & 2 \end{pmatrix}\), then adding 2 times the second row to the first gives \(\begin{pmatrix}0 & 0\\ -1 & 2\end{pmatrix}\) and the reduced row echelon form of this matrix is \(\begin{pmatrix}1 & -2 \\ 0 & 0\end{pmatrix}\) hence \(A\) is not invertible.
Example 3.56:
Let \(A=\begin{pmatrix}2 & 1 &\ 0\\ 0 & 1 & 2\\ 2 & 1 & -1 \end{pmatrix}\), then elementary row operations give \[\begin{split} \begin{pmatrix}2 & 1 & 0& 1 & 0 & 0\\ 0 & 1 & 2 & 0 & 1 & 0\\ 2 & 1 & -1 & 0 & 0 & 1\end{pmatrix}&\to \begin{pmatrix}2 & 1 & 0& 1 & 0 & 0\\ 0 & 1 & 2 & 0 & 1 & 0\\ 0 & 0 & -1 & -1 & 0 & 1\end{pmatrix}\\ &\to \begin{pmatrix}2 & 1 & 0& 1 & 0 & 0\\ 0 & 1 & 0 & -2& 1 & 2\\ 0 & 0 & -1 & -1 & 0 & 1\end{pmatrix}\\ &\to \begin{pmatrix}2 & 0 & 0& 3 & -1 & -2\\ 0 & 1 & 0 & -2& 1 & 2\\ 0 & 0 & -1 & -1 & 0 & 1\end{pmatrix}\\ &\to \begin{pmatrix}1 & 0 & 0& 3/2 & -1/2 & -1\\ 0 & 1 & 0 & -2& 1 & 2\\ 0 & 0 & 1 & 1 & 0 & -1\end{pmatrix} \end{split}\] and so \(A^{-1}=\begin{pmatrix}3/2 & -1/2 & -1\\ -2 & 1 & 2\\ 1 & 0 & -1\end{pmatrix}.\)

For a general \(2\times 2\) matrix \(A=\begin{pmatrix}a & b\\ c& d\end{pmatrix}\) we find \[A^{-1}=\frac{1}{ad-bc}\begin{pmatrix}d & -b\\ -c & a\end{pmatrix},\] which is well defined if \[ad-bc\neq 0 .\]

We can now sketch the proof of Theorem 3.44 above.

Proof (Proof of Theorem 3.44).

To address (iii) we note that if \(A,B\) are both invertible, then \[(B^{-1}A^{-1})(AB) = B^{-1}(A^{-1}A)B = B^{-1} I B = B^{-1}B= I,\] so by definition of the inverse, \(B^{-1}A^{-1}\) is the inverse of \(AB\).

If \(A\) is invertible, we find elementary matrices so that \(I=E_N\cdots E_2E_1A\), hence \(A^{-1}=E_N\ldots E_2E_1\). Moreover, it is clear that all \(E_i\)’s are invertible (with inverses also elementary matrices). Multiplying \(I=E_N \cdots E_2 E_1 A\) on the left by \(E_N^{-1}\), then \(E_{N-1}^{-1}\), and so on, one gets that \[E_1^{-1}E_2^{-1}\cdots E_N^{-1} = E_1^{-1}E_2^{-1}\cdots E_N^{-1} E_N \cdots E_2 E_1 A = A.\] So \(AA^{-1} = E_1^{-1}E_2^{-1}\cdots E_N^{-1}E_N \cdots E_2E_1 = I\), giving (i).

Finally, if \(I=BA\) then \(IA^{-1}=BAA^{-1}\) so by (i) \(A^{-1}=B\), giving (ii).

Note that matrix inverses also provide another way of solving a system of equations. If we have a system \(Ax=b\) where \(A\) is an invertible matrix then we have that \(x=A^{-1}b.\) As it is usually harder to invert the matrix than solve the system using Gaussian elimination we will not often use this method, but it can be useful in the case where you already know the inverse of the matrix.

Having explored matrices and systems of linear equations, we next look at a important function on matrices, which has links to invertibility of matrices and the number of solutions to a system of equations. This is the determinant function.


  1. These results establish that the set of invertible \(n\times n\) matrices forms a ‘group’ under multiplication — we define what this means exactly later in the course — which is called the general linear group over \(\mathbb{R}^n\), \[GL_n(\mathbb{R}):=\{A\in M_{n}(\mathbb{R}) : A \text{ is invertible}\} .\]↩︎