3.4 Systems of Linear Equations

Consider the system of two linear equations: x+y=1,2xy=1. As shown in Figure 3.1, equations (3.1) and (3.2) represent two straight lines which intersect at the point x=2/3 and y=1/3. This point of intersection is determined by solving for the values of x and y such that x+y=2xy14

System of two linear equations.

Figure 3.1: System of two linear equations.

The two linear equations can be written in matrix form as: [1121][xy]=[11], or, Az=b, where, A=[1121], z=[xy] and b=[11].

If there was a (2×2) matrix B, with elements bij, such that BA=I2, where I2 is the (2×2) identity matrix, then we could solve for the elements in z as follows. In the equation Az=b, pre-multiply both sides by B to give: BAz=BbIz=Bbz=Bb, or [xy]=[b11b12b21b22][11]=[b111+b121b211+b221]. If such a matrix B exists it is called the inverse of A and is denoted A1. Intuitively, the inverse matrix A1 plays a similar role as the inverse of a number. Suppose a is a number; e.g., a=2. Then we know that (1/a)a=a1a=1. Similarly, in matrix algebra A1A=I2 where I2 is the identity matrix. Now, consider solving the equation ax=b. By simple division we have that x=(1/a)b=a1b. Similarly, in matrix algebra, if we want to solve the system of linear equations Ax=b we pre-multiply by A1 and get the solution x=A1b.

Using B=A1, we may express the solution for z as: z=A1b. As long as we can determine the elements in A1 then we can solve for the values of x and y in the vector z. The system of linear equations has a solution as long as the two lines intersect, so we can determine the elements in A1 provided the two lines are not parallel. If the two lines are parallel, then one of the equations is a multiple of the other. In this case, we say that A is not invertible.

There are general numerical algorithms for finding the elements of A1 (e.g., Gaussian elimination) and R has these algorithms available. However, if A is a (2×2) matrix then there is a simple formula for A1. Let A be a (2×2) matrix such that: A=[a11a12a21a22]. Then, A1=1det where \det(\mathbf{A})=a_{11}a_{22}-a_{21}a_{12} denotes the determinant of \mathbf{A} and is assumed to be not equal to zero. By brute force matrix multiplication we can verify this formula: \begin{align*} \mathbf{A}^{-1}\mathbf{A} & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22} & -a_{12}\\ -a_{21} & a_{11} \end{array}\right]\left[\begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & a_{22}a_{12}-a_{12}a_{22}\\ -a_{21}a_{11}+a_{11}a_{21} & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & 0\\ 0 & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\left[\begin{array}{cc} \dfrac{a_{22}a_{11}-a_{12}a_{21}}{a_{11}a_{22}-a_{21}a_{12}} & 0\\ 0 & \dfrac{-a_{21}a_{12}+a_{11}a_{22}}{a_{11}a_{22}-a_{21}a_{12}} \end{array}\right]\\ & =\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. \end{align*} Let’s apply the above rule to find the inverse of \mathbf{A} in our example linear system (3.1)-(3.2): \mathbf{A}^{-1}=\frac{1}{-1-2}\left[\begin{array}{cc} -1 & -1\\ -2 & 1 \end{array}\right]=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]. Notice that, \mathbf{A}^{-1}\mathbf{A}=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{cc} 1 & 1\\ 2 & -1 \end{array}\right]=\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. Our solution for \mathbf{z} is then, \begin{align*} \mathbf{z} & =\mathbf{A}^{-1}\mathbf{b}\\ & =\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{c} 1\\ 1 \end{array}\right]\\ & =\left[\begin{array}{c} \frac{2}{3}\\ \frac{1}{3} \end{array}\right]=\left[\begin{array}{c} x\\ y \end{array}\right], \end{align*} so that x=2/3 and y=1/3.

Example 2.20 (Solving systems of linear equations in R)

In R, the solve() function is used to compute the inverse of a matrix and solve a system of linear equations. The linear system x+y=1 and 2x-y=1 can be represented using:

matA = matrix(c(1,1,2,-1), 2, 2, byrow=TRUE)
vecB = c(1,1)

First we solve for \mathbf{A}^{-1}:15

matA.inv = solve(matA)
matA.inv
##           [,1]       [,2]
## [1,] 0.3333333  0.3333333
## [2,] 0.6666667 -0.3333333
matA.inv%*%matA
##      [,1]          [,2]
## [1,]    1 -5.551115e-17
## [2,]    0  1.000000e+00
matA%*%matA.inv
##      [,1]         [,2]
## [1,]    1 5.551115e-17
## [2,]    0 1.000000e+00

Then we solve the system \mathbf{z}=\mathbf{A}^{-1}\mathbf{b}:

z = matA.inv%*%vecB
z
##           [,1]
## [1,] 0.6666667
## [2,] 0.3333333

\blacksquare

In general, if we have n linear equations in n unknown variables we may write the system of equations as \begin{align*} a_{11}x_{1}+a_{12}x_{2}+\cdots+a_{1n}x_{n} & =b_{1}\\ a_{21}x_{1}+a_{22}x_{2}+\cdots+a_{2n}x_{n} & =b_{2}\\ \vdots & =\vdots\\ a_{n1}x_{1}+a_{n2}x_{2}+\cdots+a_{nn}x_{n} & =b_{n} \end{align*} which we may then express in matrix form as \left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{array}\right]\left[\begin{array}{c} x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{array}\right]=\left[\begin{array}{c} b_{1}\\ b_{2}\\ \vdots\\ b_{n} \end{array}\right] or \underset{(n\times n)}{\mathbf{A}}\cdot\underset{(n\times1)}{\mathbf{x}}=\underset{(n\times1)}{\mathbf{b}}. The solution to the system of equations is given by: \mathbf{x=A}^{-1}\mathbf{b}, where \mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n and \mathbf{I}_{n} is the (n\times n) identity matrix. If the number of equations is greater than two, then we generally use numerical algorithms to find the elements in \mathbf{A}^{-1}.

3.4.1 Rank of a matrix

A n\times m matrix \mathbf{A} has m columns \mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m} where each column is an n\times1 vector and n rows where each row is an 1\times m row vector: \underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right]=\left[\begin{array}{cccc} \mathbf{a}_{1} & \mathbf{a}_{2} & \ldots & \mathbf{a}_{m}\end{array}\right]. Two vectors \mathbf{a}_{1} and \mathbf{a}_{2} are linearly independent if c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}=0 implies that c_{1}=c_{2}=0. A set of vectors \mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m} are linearly independent if c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}+\cdots+c_{m}\mathbf{a}_{m}=0 implies that c_{1}=c_{2}=\cdots=c_{m}=0. That is, no vector \mathbf{a}_{i} can be expressed as a non-trivial linear combination of the other vectors.

The column rank of the n\times m matrix \mathbf{A}, denoted \mathrm{rank}(\mathbf{A}), is equal to the maximum number of linearly independent columns. The row rank of a matrix is equal to the maximum number of linearly independent rows, and is given by \mathrm{rank}(\mathbf{A}^{\prime}). It turns out that the column rank and row rank of a matrix are the same. Hence, \mathrm{rank}(\mathbf{A})=\mathrm{rank}(\mathbf{A}^{\prime})\leq\mathrm{min}(m,n). If \mathrm{rank}(\mathbf{A})=m then \mathbf{A} is called full column rank, and if \mathrm{rank}(\mathbf{A})=n then \mathbf{A} is called full row rank. If \mathbf{A} is not full rank then it is called reduced rank.

Example 3.6 (Determining the rank of a matrix in R)

Consider the 2\times3 matrix \mathbf{A}=\left(\begin{array}{ccc} 1 & 3 & 5\\ 2 & 4 & 6 \end{array}\right) Here, \mathrm{rank}(\mathbf{A})\leq min(3,2)=2, the number of rows. Now, \mathrm{rank}(\mathbf{A})=2 since the rows of \mathbf{A} are linearly independent. In R, the rank of a matrix can be found using the Matrix function rankMatrix():

library(Matrix)
Amat = matrix(c(1,3,5,2,4,6), 2, 3, byrow=TRUE)
as.numeric(rankMatrix(Amat))
## [1] 2

\blacksquare

The rank of an n\times n square matrix \mathbf{A} is directly related to its invertibility. If \mathrm{rank}(\mathbf{A})=n then \mathbf{A}^{-1} exists. This result makes sense. Since \mathbf{A}^{-1} is used to solve a system of n linear equations in n unknowns, it will have a unique solution as long as no equation in the system can be written as a linear combination of the other equations.

3.4.2 Partitioned matrices and partitioned inverses

Consider a general n\times m matrix \mathbf{A}: \underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right]. In some situations we might want to partition \mathbf{A} into sub-matrices containing sub-blocks of the elements of \mathbf{A}: \mathbf{A}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right], where the sub-matrices \mathbf{A}_{11}, \mathbf{A}_{12}, \mathbf{A}_{21}, and \mathbf{A}_{22} contain the appropriate sub-elements of \mathbf{A}. For example, consider \mathbf{A}=\left[\begin{array}{cc|cc} 1 & 2 & 3 & 4\\ 5 & 6 & 7 & 8\\ \hline 9 & 10 & 11 & 12\\ 13 & 14 & 15 & 16 \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right], where \begin{eqnarray*} \mathbf{A}_{11} & = & \left[\begin{array}{cc} 1 & 2\\ 5 & 6 \end{array}\right],\,\mathbf{A}_{12}=\left[\begin{array}{cc} 3 & 4\\ 7 & 8 \end{array}\right],\\ \mathbf{A}_{21} & = & \left[\begin{array}{cc} 9 & 10\\ 13 & 14 \end{array}\right],\,\mathbf{A}_{22}=\left[\begin{array}{cc} 11 & 12\\ 15 & 16 \end{array}\right]. \end{eqnarray*}

Example 3.7 (Combining sub-matrices in R)

In R sub-matrices can be combined column-wise using cbind(), and row-wise using rbind(). For example, to create the matrix \mathbf{A} from the sub-matrices use

A11mat = matrix(c(1,2,5,6), 2, 2, byrow=TRUE) 
A12mat = matrix(c(3,4,7,8), 2, 2, byrow=TRUE) 
A21mat = matrix(c(9,10,13,14), 2, 2, byrow=TRUE) 
A22mat = matrix(c(11,12,15,16), 2, 2, byrow=TRUE) 
Amat = rbind(cbind(A11mat, A12mat), cbind(A21mat, A22mat)) 
Amat
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16

\blacksquare

The basic matrix operations work on sub-matrices in the obvious way. To illustrate, consider another matrix \mathbf{B} that is conformable with \mathbf{A} and partitioned in the same way: \mathbf{B}=\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]. Then \mathbf{A}+\mathbf{B}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]+\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}+\mathbf{B}_{11} & \mathbf{A}_{12}+\mathbf{B}_{12}\\ \mathbf{A}_{21}+\mathbf{B}_{21} & \mathbf{A}_{22}+\mathbf{B}_{22} \end{array}\right]. If all of the sub-matrices of \mathbf{A} and \mathbf{B} are comformable then \mathbf{AB}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}\mathbf{B}_{11}+\mathbf{A}_{12}\mathbf{B}_{21} & \mathbf{A}_{11}\mathbf{B}_{12}+\mathbf{A}_{12}\mathbf{B}_{22}\\ \mathbf{A}_{21}\mathbf{B}_{11}+\mathbf{A}_{22}\mathbf{B}_{21} & \mathbf{A}_{21}\mathbf{B}_{12}+\mathbf{A}_{22}\mathbf{B}_{22} \end{array}\right].

The transpose of a partitioned matrix satisfies \mathbf{A^{\prime}}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{\prime}=\left[\begin{array}{cc} \mathbf{A}_{11}^{\prime} & \mathbf{A}_{21}^{\prime}\\ \mathbf{A}_{12}^{\prime} & \mathbf{A}_{22}^{\prime} \end{array}\right]. Notice the interchange of the two off-diagonal blocks.

A partitioned matrix \mathbf{A} with appropriate invertible sub-matrices \mathbf{A}_{11} and \mathbf{A}_{22} has a partitioned inverse of the form \mathbf{A}^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11}^{-1}+\mathbf{A}_{11}^{-1}\mathbf{A}_{12}\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & -\mathbf{A}_{11}\mathbf{A}_{12}\mathbf{C}^{-1}\\ -\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & \mathbf{C}^{-1} \end{array}\right], where \mathbf{C}=\mathbf{A}_{22}-\mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12}. This formula can be verified by direct calculation.


  1. Soving for x gives x=2y. Substituting this value into the equation x+y=1 gives 2y+y=1 and solving for y gives y=1/3. Solving for x then gives x=2/3.↩︎

  2. Notice that the calculations in R do not show \mathbf{A}^{-1}\mathbf{A}=\mathbf{I} exactly. The (1,2) element of \mathbf{A}^{-1}\mathbf{A} is -5.552e-17, which for all practical purposes is zero. However, due to the limitations of machine calculations the result is not exactly zero.↩︎