3.4 Solving systems of linear equations
To solve a system of linear equations we will introduce a systematic way to simplify it until we can read off directly if it is solvable and compute the solutions easily. Again it will be useful to write the system of equations in matrix form and then observe that operations on equations boil down to those on rows of the matrix. The procedure in question is often referred to as Gaussian elimination.
Let us return for a moment to the original way of writing a set of \(m\) linear equations in \(n\) unknowns, \[\begin{aligned} a_{11}x_1+a_{12}x_2+\cdots +a_{1n}x_n &=b_1,\\ a_{21}x_1+a_{22}x_2+\cdots +a_{2n}x_n &=b_2,\\ \cdots & \\ a_{m1}x_1+a_{m2}x_2+\cdots +a_{mn}x_n &=b_m. \end{aligned}\] We can perform the following operations on the set of equations without changing the solutions:
multiply an equation by a non-zero constant,
add a multiple of any equation to one of the other equations,
exchange two of the equations.
It is clear that operations (i) and (iii) don’t change the set of solutions, and to see that operation (ii) doesn’t change the set of solutions we can argue as follows: If \(x\in\mathbb{R}^n\) is a solution to the system of equations and we change the system by adding \(\lambda\) times equation \(i\) to equation \(j\) then \(x\) is clearly a solution of the new system, too. But if \(x'\in\mathbb{R}^n\) is a solution to the new system we can return to the old system by subtracting \(\lambda\) times equation \(i\) from equation \(j\), and so \(x'\) must be a solution to the old system, too. Hence both systems have the same set of solutions.
The way to solve a system of equations is to use the above operations to simplify a system of equations systematically until we can basically read off the solutions. It is useful to formulate this using the matrix representation of a system of linear equations \[Ax=b.\]
Now we translate the above operations on systems of equations into operations on the augmented matrix.
An elementary row operation (ERO) is one of the following operations on matrices:
multiply a row by a non-zero number (row \(i\) \(\to\) \(\lambda\times \text{row} i\))
add a multiple of one row to another row (row \(i\) \(\to\) row \(i\) \(+\lambda\times\text{row} j\))
exchange two rows (row \(i\) \(\leftrightarrow\) row \(j\))
Let \(A\in M_{m,n}(\mathbb{R})\), \(b\in \mathbb{R}^m\). If the augmented matrix \((A'\ b')\) is obtained from \((A\ b)\) by a sequence of ERO’s, then the corresponding system of linear equation has the same solutions, i.e., \[S(A, b)=S(A',b') .\]
If we apply these operations to the augmented matrix of a system of linear equations then they clearly correspond to the three operations (i), (ii), and (iii) we introduced above, hence the system corresponding to the new matrix has the same set of solutions.
□
We want to use these operations to systematically simplify the augmented matrix. Let us look at an example to get an idea which type of simplification we can achieve.
Consider the following system of equations \[\begin{aligned} x+y+2z&=9,\\ 2x+4y-3z&=1,\\ 3x+6y-5z&=0. \end{aligned}\] This is of the form \(Ax=b\) with \[A=\begin{pmatrix}1 & 1 & 2 \\ 2 & 4 &-3\\ 3 & 6 &-5\end{pmatrix}\quad b=\begin{pmatrix}9\\1\\0\end{pmatrix},\] hence the corresponding augmented matrix is given by \[\begin{pmatrix}1 & 1& 2& 9\\ 2 & 4 & -3 & 1 \\ 3 & 6 & -5 & 0\end{pmatrix}.\] Applying elementary row operations gives \[\begin{aligned} \begin{pmatrix}1 & 1& 2& 9\\ 2& 4 & -3 & 1 \\ 3 & 6 & -5 & 0\end{pmatrix}& \qquad \xrightarrow{\text{row }2-2\times \text{row }1,\quad \text{row } 3-3\times \text{row }1 }\\ \begin{pmatrix}1 & 1& 2& 9\\ 0 & 2 & -7 & -17 \\ 0 & 3 & -11 & -27\end{pmatrix}& \qquad \xrightarrow{\text{row } 3- \text{row } 2} \\ \begin{pmatrix}1 & 1& 2& 9\\ 0 & 2 & -7 & -17 \\ 0 & 1 & -4 & -10\end{pmatrix}&\qquad \xrightarrow{\text{row } 3\leftrightarrow \text{row } 2}\\ \begin{pmatrix}1 & 1& 2& 9\\ 0 & 1 & -4 & -10\\ 0 & 2 & -7 & -17 \end{pmatrix}& \qquad \xrightarrow{\text{row } 3-2\times \text{row } 2} \\ \begin{pmatrix}1 & 1& 2& 9\\ 0 & 1 & -4 & -10\\ 0 & 0 & 1 & 3 \end{pmatrix}& \end{aligned}\] where we have written next to the matrix which elementary row operations we applied in order to arrive at the given line from the previous one. The system of equations corresponding to the last matrix is \[\begin{aligned} x+y+2z&=9,\\ y-4z&=-10,\\ z&=3 \end{aligned}\] so we have \(z=3\) from the last equation. Then substituting this in the second equation gives \(y=-10+4z=-10+12=2\) and substituting this in the first equation gives \(x=9-y-2z=9-2-6=1\). So we see that if the augmented matrix is in the above triangular like form we can solve the system of equations easily by what is sometimes called back-substitution.
But we can also continue applying elementary row operations and find \[\begin{aligned} \begin{pmatrix}1 & 1& 2& 9\\ 0 & 1 & -4 & -10\\ 0 & 0 & 1 & 3 \end{pmatrix}& \qquad \xrightarrow{\text{row }1-\text{row }2}\\ \begin{pmatrix}1& 0& 6& 19\\ 0 & 1 & -4 & -10\\ 0 & 0 & 1 & 3 \end{pmatrix}& \qquad \xrightarrow{ \text{row }1-6\times \text{row }3,\quad \text{row }2+ 4\times\text{row }3}\\ \begin{pmatrix}1& 0& 0& 1\\ 0 & 1 & 0 & 2\\ 0 & 0 & 1 & 3 \end{pmatrix}. & \end{aligned}\] Now the corresponding system of equations is of even simpler form \[\begin{aligned} x&=1,\\ y&=2,\\z&=3 \end{aligned}\] and gives the solution directly.The different forms into which we brought the matrix by elementary row operations are of a special type.
A matrix \(M\) is in row echelon form (REF) if
each row is either all zeros, or the leftmost non-zero number is \(1\) (this is called the leading \(1\) in that row), and
if row \(i\) is above row \(j\), then the leading \(1\) of row \(i\) is to the left of row \(j\); any zero rows are below any non-zero rows.
A matrix is in reduced row echelon form (RREF) if, in addition to (i) and (ii), it satisfies
- in each column which contains a leading \(1\), all other numbers are \(0\).
In Example 3.36 we have marked the leading \(1\)’s in bold; we will see later that their distribution determines the nature of the solutions of the corresponding system of equations.
The reason for introducing these definitions is that elementary row operations can be used to bring any matrix to these forms.
Any matrix \(M\) can by a finite number of elementary row operations, called Gaussian elimination, be brought to
row echelon form, or
reduced row echelon form.
Let \(M=(m_1,m_2,\cdots ,m_n)\) where \(m_i\in\mathbb{R}^m\) are the column vectors of \(M\). Take the leftmost column vector which is non-zero, say this is \(m_{j}\), and exchange rows until the first entry in that vector is non-zero, and divide the first row by that number. Now the matrix is \(M'= (m_1',m_2',\cdots ,m_n')\) and the leftmost non-zero column vector is of the form \[m_j'=\begin{pmatrix}1\\ a_{2j}\\ \vdots \\ a_{nj}\end{pmatrix}.\] Now we can subtract multiples of the first row from the other rows until all numbers in the \(j\)th column below the top 1 are \(0\); more precisely, we subtract from row \(i\) \(a_{ij}\) times the first row. We have transformed the matrix now to the form \[\begin{pmatrix}0 & 1 & \cdots \\ 0 & 0 & \tilde{M}\\ \end{pmatrix}\] and now we apply the same procedure to the matrix \(\tilde{M}\). Eventually we arrive at row echelon form. To arrive at reduced row echelon form we start from row echelon form and use the leading \(1\)’s to clear out all non-zero elements in the columns containing a leading \(1\).
□
Example 3.34 is an illustration on how the reduction to row echelon and reduced row echelon form works.
Let us now turn to the question what the row echelon form tells us about the structure of the set of solutions to a system of linear equations. The key information lies in the distribution of the leading \(1\)’s.
Let \(Ax=b\) be a system of equations in \(n\) unknowns, and \(M\) be the row echelon form of the associated augmented matrix. Then
the system has no solutions if and only if the last column of \(M\) contains a leading \(1\),
the system has a unique solution if and only if every column except the last one of contains a leading \(1\),
the system has infinitely many solutions if and only if the last column of \(M\) does not contain a leading \(1\) and there are less than \(n\) leading \(1\)’s. Then there \(n-k\) unknowns which can chosen arbitrarily, where \(k\) is the number of leading \(1\)’ s of \(M\).
Let us first observe that the leading \(1\)’s of the reduced row echelon form of a system are the same as the leading \(1\)’s of the row echelon form. Therefore we can assume the system is in reduced row echelon form, which makes the arguments slightly simpler. Let us start with the last non-zero row, that is the row with the rightmost leading \(1\), and consider the corresponding equation. If the leading \(1\) is in the last column, then this equation is of the form \[0x_1+0x_2+\cdots +0x_n=1 ,\] and so we have the contradiction \(0=1\) and therefore there is no \(x\in\mathbb{R}^n\) solving the set of equations. This is case (i) of the theorem.
If the last column does not contain a leading \(1\), but all other columns contain leading \(1\)’s then the reduced row echelon form is \[\begin{pmatrix}1 & 0 & \cdots & 0 & b_1'\\ 0 & 1 & \cdots & 0 & b_2'\\ \vdots &\vdots & \ddots & \vdots &\vdots \\ 0& 0& \cdots & 1 & b_n' \\ & & \cdots & & \end{pmatrix}\] and if \(m>n\) there are \(m-n\) rows with only \(0\)’s. The corresponding system of equations is then \[x_1=b_1' ,\quad x_2=b_2' ,\quad \cdots \quad x_n=b_n'\] and so there is a unique solution. This is case (ii) in the theorem.
Finally let us consider the case that there are \(k\) leading \(1\)’s with \(k<n\) and none of them is in the last column. Let us index the column with leading \(1\)’s by \(j_1,j_2,\cdots ,j_k\). Then the system of equations corresponding to the reduced row echelon form is of the form \[\begin{aligned} x_{j_1}+\sum_{i \text{ not leading}} a_{j_1 i}x_i &= b_{j_1}',\\ x_{j_2}+\sum_{i \text{ not leading}} a_{j_2 i}x_i &= b_{j_2}',\\ \cdots & \\ x_{j_k}+\sum_{i \text{ not leading}} a_{j_ki}x_i &= b_{j_k}' \end{aligned}\] where the sums contain only those \(x_i\) whose index is not labelling a column with a leading \(1\). These are \(n-k\) unknowns \(x_i\) whose value can be chosen arbitrarily and once their value is fixed, the remaining \(k\) unknowns are determined uniquely. This proves part (iii) of the theorem.
Note that in each case we have proved the ‘only if’ part of the statement, and the ‘if’ part follows from the fact that these three cases are mutually exclusive and considering the contrapositive of each statement.□
Let us consider one simple consequence of this general theorem which we will use in a couple of proofs later on.
Click for solution
This gives a rigorous basis to the intuitive idea that if you have \(n\) unknowns, you need at least \(n\) linear equations to determine the unknowns uniquely, and we state the result, which also holds for complex systems, formally below.
Let \(A\in M_{m,n}(\mathbb{F})\) and assume that \(S(A,\mathbf{0})=\{\mathbf{0}\} ,\) i.e., the only solution \(x\in \mathbb{F}^n\) to \(Ax=\mathbf{0}\) is \(x=\mathbf{0}\). Then \(m\geq n\).
This follows from part \((ii)\) of the previous theorem. If every column has a leading one then there are at least as many rows as columns, i.e., \(m\geq n\).
□