## 3.4 Systems of Linear Equations

Consider the system of two linear equations: \begin{align} x+y & =1,\tag{3.1}\\ 2x-y & =1.\tag{3.2} \end{align} As shown in Figure 3.1, equations (3.1) and (3.2) represent two straight lines which intersect at the point $$x=2/3$$ and $$y=1/3$$. This point of intersection is determined by solving for the values of $$x$$ and $$y$$ such that $$x+y=2x-y$$14

The two linear equations can be written in matrix form as: $\left[\begin{array}{cc} 1 & 1\\ 2 & -1 \end{array}\right]\left[\begin{array}{c} x\\ y \end{array}\right]=\left[\begin{array}{c} 1\\ 1 \end{array}\right],$ or, $\mathbf{A\cdot z=b,}$ where, $\mathbf{A}=\left[\begin{array}{cc} 1 & 1\\ 2 & -1 \end{array}\right],~\mathbf{z}=\left[\begin{array}{c} x\\ y \end{array}\right]\textrm{ and }\mathbf{b}=\left[\begin{array}{c} 1\\ 1 \end{array}\right].$

If there was a $$(2\times2)$$ matrix $$\mathbf{B},$$ with elements $$b_{ij}$$, such that $$\mathbf{B\cdot A=I}_{2},$$ where $$\mathbf{I}_{2}$$ is the $$(2\times2)$$ identity matrix, then we could solve for the elements in $$\mathbf{z}$$ as follows. In the equation $$\mathbf{A\cdot z=b}$$, pre-multiply both sides by $$\mathbf{B}$$ to give: \begin{align*} \mathbf{B\cdot A\cdot z} & =\mathbf{B\cdot b}\\ & \Longrightarrow\mathbf{I\cdot z=B\cdot b}\\ & \Longrightarrow\mathbf{z=B\cdot b}, \end{align*} or $\left[\begin{array}{c} x\\ y \end{array}\right]=\left[\begin{array}{cc} b_{11} & b_{12}\\ b_{21} & b_{22} \end{array}\right]\left[\begin{array}{c} 1\\ 1 \end{array}\right]=\left[\begin{array}{c} b_{11}\cdot1+b_{12}\cdot1\\ b_{21}\cdot1+b_{22}\cdot1 \end{array}\right].$ If such a matrix $$\mathbf{B}$$ exists it is called the inverse of $$\mathbf{A}$$ and is denoted $$\mathbf{A}^{-1}$$. Intuitively, the inverse matrix $$\mathbf{A}^{-1}$$ plays a similar role as the inverse of a number. Suppose $$a$$ is a number; e.g., $$a=2$$. Then we know that $$(1/a)\cdot a=a^{-1}a=1$$. Similarly, in matrix algebra $$\mathbf{A}^{-1}\mathbf{A=I}_{2}$$ where $$\mathbf{I}_{2}$$ is the identity matrix. Now, consider solving the equation $$a\cdot x=b$$. By simple division we have that $$x=(1/a)\cdot b=a^{-1}\cdot b$$. Similarly, in matrix algebra, if we want to solve the system of linear equations $$\mathbf{Ax=b}$$ we pre-multiply by $$\mathbf{A}^{-1}$$ and get the solution $$\mathbf{x=A}^{-1}\mathbf{b}$$.

Using $$\mathbf{B=A}^{-1}$$, we may express the solution for $$\mathbf{z}$$ as: $\mathbf{z=A}^{-1}\mathbf{b}.$ As long as we can determine the elements in $$\mathbf{A}^{-1}$$ then we can solve for the values of $$x$$ and $$y$$ in the vector $$\mathbf{z}$$. The system of linear equations has a solution as long as the two lines intersect, so we can determine the elements in $$\mathbf{A}^{-1}$$ provided the two lines are not parallel. If the two lines are parallel, then one of the equations is a multiple of the other. In this case, we say that $$\mathbf{A}$$ is not invertible.

There are general numerical algorithms for finding the elements of $$\mathbf{A}^{-1}$$ (e.g., Gaussian elimination) and R has these algorithms available. However, if $$\mathbf{A}$$ is a $$(2\times2)$$ matrix then there is a simple formula for $$\mathbf{A}^{-1}$$. Let $$\mathbf{A}$$ be a $$(2\times2)$$ matrix such that: $\mathbf{A}=\left[\begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array}\right].$ Then, $\mathbf{A}^{-1}=\frac{1}{\det(\mathbf{A})}\left[\begin{array}{cc} a_{22} & -a_{12}\\ -a_{21} & a_{11} \end{array}\right],$ where $$\det(\mathbf{A})=a_{11}a_{22}-a_{21}a_{12}$$ denotes the determinant of $$\mathbf{A}$$ and is assumed to be not equal to zero. By brute force matrix multiplication we can verify this formula: \begin{align*} \mathbf{A}^{-1}\mathbf{A} & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22} & -a_{12}\\ -a_{21} & a_{11} \end{array}\right]\left[\begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & a_{22}a_{12}-a_{12}a_{22}\\ -a_{21}a_{11}+a_{11}a_{21} & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & 0\\ 0 & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\left[\begin{array}{cc} \dfrac{a_{22}a_{11}-a_{12}a_{21}}{a_{11}a_{22}-a_{21}a_{12}} & 0\\ 0 & \dfrac{-a_{21}a_{12}+a_{11}a_{22}}{a_{11}a_{22}-a_{21}a_{12}} \end{array}\right]\\ & =\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. \end{align*} Let’s apply the above rule to find the inverse of $$\mathbf{A}$$ in our example linear system (3.1)-(3.2): $\mathbf{A}^{-1}=\frac{1}{-1-2}\left[\begin{array}{cc} -1 & -1\\ -2 & 1 \end{array}\right]=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right].$ Notice that, $\mathbf{A}^{-1}\mathbf{A}=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{cc} 1 & 1\\ 2 & -1 \end{array}\right]=\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right].$ Our solution for $$\mathbf{z}$$ is then, \begin{align*} \mathbf{z} & =\mathbf{A}^{-1}\mathbf{b}\\ & =\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{c} 1\\ 1 \end{array}\right]\\ & =\left[\begin{array}{c} \frac{2}{3}\\ \frac{1}{3} \end{array}\right]=\left[\begin{array}{c} x\\ y \end{array}\right], \end{align*} so that $$x=2/3$$ and $$y=1/3$$.

Example 2.20 (Solving systems of linear equations in R)

In R, the solve() function is used to compute the inverse of a matrix and solve a system of linear equations. The linear system $$x+y=1$$ and $$2x-y=1$$ can be represented using:

matA = matrix(c(1,1,2,-1), 2, 2, byrow=TRUE)
vecB = c(1,1)

First we solve for $$\mathbf{A}^{-1}$$:15

matA.inv = solve(matA)
matA.inv
##           [,1]       [,2]
## [1,] 0.3333333  0.3333333
## [2,] 0.6666667 -0.3333333
matA.inv%*%matA
##      [,1]          [,2]
## [1,]    1 -5.551115e-17
## [2,]    0  1.000000e+00
matA%*%matA.inv
##      [,1]         [,2]
## [1,]    1 5.551115e-17
## [2,]    0 1.000000e+00

Then we solve the system $$\mathbf{z}=\mathbf{A}^{-1}\mathbf{b}$$:

z = matA.inv%*%vecB
z
##           [,1]
## [1,] 0.6666667
## [2,] 0.3333333

$$\blacksquare$$

In general, if we have $$n$$ linear equations in $$n$$ unknown variables we may write the system of equations as \begin{align*} a_{11}x_{1}+a_{12}x_{2}+\cdots+a_{1n}x_{n} & =b_{1}\\ a_{21}x_{1}+a_{22}x_{2}+\cdots+a_{2n}x_{n} & =b_{2}\\ \vdots & =\vdots\\ a_{n1}x_{1}+a_{n2}x_{2}+\cdots+a_{nn}x_{n} & =b_{n} \end{align*} which we may then express in matrix form as $\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{array}\right]\left[\begin{array}{c} x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{array}\right]=\left[\begin{array}{c} b_{1}\\ b_{2}\\ \vdots\\ b_{n} \end{array}\right]$ or $\underset{(n\times n)}{\mathbf{A}}\cdot\underset{(n\times1)}{\mathbf{x}}=\underset{(n\times1)}{\mathbf{b}}.$ The solution to the system of equations is given by: $\mathbf{x=A}^{-1}\mathbf{b},$ where $$\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n$$ and $$\mathbf{I}_{n}$$ is the $$(n\times n)$$ identity matrix. If the number of equations is greater than two, then we generally use numerical algorithms to find the elements in $$\mathbf{A}^{-1}$$.

### 3.4.1 Rank of a matrix

A $$n\times m$$ matrix $$\mathbf{A}$$ has $$m$$ columns $$\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m}$$ where each column is an $$n\times1$$ vector and $$n$$ rows where each row is an $$1\times m$$ row vector: $\underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right]=\left[\begin{array}{cccc} \mathbf{a}_{1} & \mathbf{a}_{2} & \ldots & \mathbf{a}_{m}\end{array}\right].$ Two vectors $$\mathbf{a}_{1}$$ and $$\mathbf{a}_{2}$$ are linearly independent if $$c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}=0$$ implies that $$c_{1}=c_{2}=0$$. A set of vectors $$\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m}$$ are linearly independent if $$c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}+\cdots+c_{m}\mathbf{a}_{m}=0$$ implies that $$c_{1}=c_{2}=\cdots=c_{m}=0$$. That is, no vector $$\mathbf{a}_{i}$$ can be expressed as a non-trivial linear combination of the other vectors.

The column rank of the $$n\times m$$ matrix $$\mathbf{A}$$, denoted $$\mathrm{rank}(\mathbf{A})$$, is equal to the maximum number of linearly independent columns. The row rank of a matrix is equal to the maximum number of linearly independent rows, and is given by $$\mathrm{rank}(\mathbf{A}^{\prime})$$. It turns out that the column rank and row rank of a matrix are the same. Hence, $$\mathrm{rank}(\mathbf{A})$$$$=\mathrm{rank}(\mathbf{A}^{\prime})\leq\mathrm{min}(m,n)$$. If $$\mathrm{rank}(\mathbf{A})=m$$ then $$\mathbf{A}$$ is called full column rank, and if $$\mathrm{rank}(\mathbf{A})=n$$ then $$\mathbf{A}$$ is called full row rank. If $$\mathbf{A}$$ is not full rank then it is called reduced rank.

Example 3.6 (Determining the rank of a matrix in R)

Consider the $$2\times3$$ matrix $\mathbf{A}=\left(\begin{array}{ccc} 1 & 3 & 5\\ 2 & 4 & 6 \end{array}\right)$ Here, $$\mathrm{rank}(\mathbf{A})\leq min(3,2)=2$$, the number of rows. Now, $$\mathrm{rank}(\mathbf{A})=2$$ since the rows of $$\mathbf{A}$$ are linearly independent. In R, the rank of a matrix can be found using the Matrix function rankMatrix():

library(Matrix)
Amat = matrix(c(1,3,5,2,4,6), 2, 3, byrow=TRUE)
as.numeric(rankMatrix(Amat))
## [1] 2

$$\blacksquare$$

The rank of an $$n\times n$$ square matrix $$\mathbf{A}$$ is directly related to its invertibility. If $$\mathrm{rank}(\mathbf{A})=n$$ then $$\mathbf{A}^{-1}$$ exists. This result makes sense. Since $$\mathbf{A}^{-1}$$ is used to solve a system of $$n$$ linear equations in $$n$$ unknowns, it will have a unique solution as long as no equation in the system can be written as a linear combination of the other equations.

### 3.4.2 Partitioned matrices and partitioned inverses

Consider a general $$n\times m$$ matrix $$\mathbf{A}$$: $\underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right].$ In some situations we might want to partition $$\mathbf{A}$$ into sub-matrices containing sub-blocks of the elements of $$\mathbf{A}$$: $\mathbf{A}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right],$ where the sub-matrices $$\mathbf{A}_{11}$$, $$\mathbf{A}_{12}$$, $$\mathbf{A}_{21}$$, and $$\mathbf{A}_{22}$$ contain the appropriate sub-elements of $$\mathbf{A}$$. For example, consider $\mathbf{A}=\left[\begin{array}{cc|cc} 1 & 2 & 3 & 4\\ 5 & 6 & 7 & 8\\ \hline 9 & 10 & 11 & 12\\ 13 & 14 & 15 & 16 \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right],$ where $\begin{eqnarray*} \mathbf{A}_{11} & = & \left[\begin{array}{cc} 1 & 2\\ 5 & 6 \end{array}\right],\,\mathbf{A}_{12}=\left[\begin{array}{cc} 3 & 4\\ 7 & 8 \end{array}\right],\\ \mathbf{A}_{21} & = & \left[\begin{array}{cc} 9 & 10\\ 13 & 14 \end{array}\right],\,\mathbf{A}_{22}=\left[\begin{array}{cc} 11 & 12\\ 15 & 16 \end{array}\right]. \end{eqnarray*}$

Example 3.7 (Combining sub-matrices in R)

In R sub-matrices can be combined column-wise using cbind(), and row-wise using rbind(). For example, to create the matrix $$\mathbf{A}$$ from the sub-matrices use

A11mat = matrix(c(1,2,5,6), 2, 2, byrow=TRUE)
A12mat = matrix(c(3,4,7,8), 2, 2, byrow=TRUE)
A21mat = matrix(c(9,10,13,14), 2, 2, byrow=TRUE)
A22mat = matrix(c(11,12,15,16), 2, 2, byrow=TRUE)
Amat = rbind(cbind(A11mat, A12mat), cbind(A21mat, A22mat))
Amat
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16

$$\blacksquare$$

The basic matrix operations work on sub-matrices in the obvious way. To illustrate, consider another matrix $$\mathbf{B}$$ that is conformable with $$\mathbf{A}$$ and partitioned in the same way: $\mathbf{B}=\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right].$ Then $\mathbf{A}+\mathbf{B}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]+\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}+\mathbf{B}_{11} & \mathbf{A}_{12}+\mathbf{B}_{12}\\ \mathbf{A}_{21}+\mathbf{B}_{21} & \mathbf{A}_{22}+\mathbf{B}_{22} \end{array}\right].$ If all of the sub-matrices of $$\mathbf{A}$$ and $$\mathbf{B}$$ are comformable then $\mathbf{AB}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}\mathbf{B}_{11}+\mathbf{A}_{12}\mathbf{B}_{21} & \mathbf{A}_{11}\mathbf{B}_{12}+\mathbf{A}_{12}\mathbf{B}_{22}\\ \mathbf{A}_{21}\mathbf{B}_{11}+\mathbf{A}_{22}\mathbf{B}_{21} & \mathbf{A}_{21}\mathbf{B}_{12}+\mathbf{A}_{22}\mathbf{B}_{22} \end{array}\right].$

The transpose of a partitioned matrix satisfies $\mathbf{A^{\prime}}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{\prime}=\left[\begin{array}{cc} \mathbf{A}_{11}^{\prime} & \mathbf{A}_{21}^{\prime}\\ \mathbf{A}_{12}^{\prime} & \mathbf{A}_{22}^{\prime} \end{array}\right].$ Notice the interchange of the two off-diagonal blocks.

A partitioned matrix $$\mathbf{A}$$ with appropriate invertible sub-matrices $$\mathbf{A}_{11}$$ and $$\mathbf{A}_{22}$$ has a partitioned inverse of the form $\mathbf{A}^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11}^{-1}+\mathbf{A}_{11}^{-1}\mathbf{A}_{12}\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & -\mathbf{A}_{11}\mathbf{A}_{12}\mathbf{C}^{-1}\\ -\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & \mathbf{C}^{-1} \end{array}\right],$ where $$\mathbf{C}=\mathbf{A}_{22}-\mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12}$$. This formula can be verified by direct calculation.

1. Soving for $$x$$ gives $$x=2y$$. Substituting this value into the equation $$x+y=1$$ gives $$2y+y=1$$ and solving for $$y$$ gives $$y=1/3$$. Solving for $$x$$ then gives $$x=2/3$$.↩︎

2. Notice that the calculations in R do not show $$\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}$$ exactly. The (1,2) element of $$\mathbf{A}^{-1}\mathbf{A}$$ is -5.552e-17, which for all practical purposes is zero. However, due to the limitations of machine calculations the result is not exactly zero.↩︎