3.4 Systems of Linear Equations
Consider the system of two linear equations: x+y=1,2x−y=1. As shown in Figure 3.1, equations (3.1) and (3.2) represent two straight lines which intersect at the point x=2/3 and y=1/3. This point of intersection is determined by solving for the values of x and y such that x+y=2x−y14

Figure 3.1: System of two linear equations.
The two linear equations can be written in matrix form as: [112−1][xy]=[11], or, A⋅z=b, where, A=[112−1], z=[xy] and b=[11].
If there was a (2×2) matrix B, with elements bij, such that B⋅A=I2, where I2 is the (2×2) identity matrix, then we could solve for the elements in z as follows. In the equation A⋅z=b, pre-multiply both sides by B to give: B⋅A⋅z=B⋅b⟹I⋅z=B⋅b⟹z=B⋅b, or [xy]=[b11b12b21b22][11]=[b11⋅1+b12⋅1b21⋅1+b22⋅1]. If such a matrix B exists it is called the inverse of A and is denoted A−1. Intuitively, the inverse matrix A−1 plays a similar role as the inverse of a number. Suppose a is a number; e.g., a=2. Then we know that (1/a)⋅a=a−1a=1. Similarly, in matrix algebra A−1A=I2 where I2 is the identity matrix. Now, consider solving the equation a⋅x=b. By simple division we have that x=(1/a)⋅b=a−1⋅b. Similarly, in matrix algebra, if we want to solve the system of linear equations Ax=b we pre-multiply by A−1 and get the solution x=A−1b.
Using B=A−1, we may express the solution for z as: z=A−1b. As long as we can determine the elements in A−1 then we can solve for the values of x and y in the vector z. The system of linear equations has a solution as long as the two lines intersect, so we can determine the elements in A−1 provided the two lines are not parallel. If the two lines are parallel, then one of the equations is a multiple of the other. In this case, we say that A is not invertible.
There are general numerical algorithms for finding the elements of A−1 (e.g., Gaussian elimination) and R has these algorithms available. However, if A is a (2×2) matrix then there is a simple formula for A−1. Let A be a (2×2) matrix such that: A=[a11a12a21a22]. Then, A−1=1det where \det(\mathbf{A})=a_{11}a_{22}-a_{21}a_{12} denotes the determinant of \mathbf{A} and is assumed to be not equal to zero. By brute force matrix multiplication we can verify this formula: \begin{align*} \mathbf{A}^{-1}\mathbf{A} & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22} & -a_{12}\\ -a_{21} & a_{11} \end{array}\right]\left[\begin{array}{cc} a_{11} & a_{12}\\ a_{21} & a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & a_{22}a_{12}-a_{12}a_{22}\\ -a_{21}a_{11}+a_{11}a_{21} & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\frac{1}{a_{11}a_{22}-a_{21}a_{12}}\left[\begin{array}{cc} a_{22}a_{11}-a_{12}a_{21} & 0\\ 0 & -a_{21}a_{12}+a_{11}a_{22} \end{array}\right]\\ & =\left[\begin{array}{cc} \dfrac{a_{22}a_{11}-a_{12}a_{21}}{a_{11}a_{22}-a_{21}a_{12}} & 0\\ 0 & \dfrac{-a_{21}a_{12}+a_{11}a_{22}}{a_{11}a_{22}-a_{21}a_{12}} \end{array}\right]\\ & =\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. \end{align*} Let’s apply the above rule to find the inverse of \mathbf{A} in our example linear system (3.1)-(3.2): \mathbf{A}^{-1}=\frac{1}{-1-2}\left[\begin{array}{cc} -1 & -1\\ -2 & 1 \end{array}\right]=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]. Notice that, \mathbf{A}^{-1}\mathbf{A}=\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{cc} 1 & 1\\ 2 & -1 \end{array}\right]=\left[\begin{array}{cc} 1 & 0\\ 0 & 1 \end{array}\right]. Our solution for \mathbf{z} is then, \begin{align*} \mathbf{z} & =\mathbf{A}^{-1}\mathbf{b}\\ & =\left[\begin{array}{cc} \frac{1}{3} & \frac{1}{3}\\ \frac{2}{3} & \frac{-1}{3} \end{array}\right]\left[\begin{array}{c} 1\\ 1 \end{array}\right]\\ & =\left[\begin{array}{c} \frac{2}{3}\\ \frac{1}{3} \end{array}\right]=\left[\begin{array}{c} x\\ y \end{array}\right], \end{align*} so that x=2/3 and y=1/3.
In R, the solve()
function is used to compute the inverse
of a matrix and solve a system of linear equations. The linear system
x+y=1 and 2x-y=1 can be represented using:
matrix(c(1,1,2,-1), 2, 2, byrow=TRUE)
matA = c(1,1) vecB =
First we solve for \mathbf{A}^{-1}:15
solve(matA)
matA.inv = matA.inv
## [,1] [,2]
## [1,] 0.3333333 0.3333333
## [2,] 0.6666667 -0.3333333
%*%matA matA.inv
## [,1] [,2]
## [1,] 1 -5.551115e-17
## [2,] 0 1.000000e+00
%*%matA.inv matA
## [,1] [,2]
## [1,] 1 5.551115e-17
## [2,] 0 1.000000e+00
Then we solve the system \mathbf{z}=\mathbf{A}^{-1}\mathbf{b}:
matA.inv%*%vecB
z = z
## [,1]
## [1,] 0.6666667
## [2,] 0.3333333
\blacksquare
In general, if we have n linear equations in n unknown variables we may write the system of equations as \begin{align*} a_{11}x_{1}+a_{12}x_{2}+\cdots+a_{1n}x_{n} & =b_{1}\\ a_{21}x_{1}+a_{22}x_{2}+\cdots+a_{2n}x_{n} & =b_{2}\\ \vdots & =\vdots\\ a_{n1}x_{1}+a_{n2}x_{2}+\cdots+a_{nn}x_{n} & =b_{n} \end{align*} which we may then express in matrix form as \left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{array}\right]\left[\begin{array}{c} x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{array}\right]=\left[\begin{array}{c} b_{1}\\ b_{2}\\ \vdots\\ b_{n} \end{array}\right] or \underset{(n\times n)}{\mathbf{A}}\cdot\underset{(n\times1)}{\mathbf{x}}=\underset{(n\times1)}{\mathbf{b}}. The solution to the system of equations is given by: \mathbf{x=A}^{-1}\mathbf{b}, where \mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n and \mathbf{I}_{n} is the (n\times n) identity matrix. If the number of equations is greater than two, then we generally use numerical algorithms to find the elements in \mathbf{A}^{-1}.
3.4.1 Rank of a matrix
A n\times m matrix \mathbf{A} has m columns \mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m} where each column is an n\times1 vector and n rows where each row is an 1\times m row vector: \underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right]=\left[\begin{array}{cccc} \mathbf{a}_{1} & \mathbf{a}_{2} & \ldots & \mathbf{a}_{m}\end{array}\right]. Two vectors \mathbf{a}_{1} and \mathbf{a}_{2} are linearly independent if c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}=0 implies that c_{1}=c_{2}=0. A set of vectors \mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{m} are linearly independent if c_{1}\mathbf{a}_{1}+c_{2}\mathbf{a}_{2}+\cdots+c_{m}\mathbf{a}_{m}=0 implies that c_{1}=c_{2}=\cdots=c_{m}=0. That is, no vector \mathbf{a}_{i} can be expressed as a non-trivial linear combination of the other vectors.
The column rank of the n\times m matrix \mathbf{A}, denoted \mathrm{rank}(\mathbf{A}), is equal to the maximum number of linearly independent columns. The row rank of a matrix is equal to the maximum number of linearly independent rows, and is given by \mathrm{rank}(\mathbf{A}^{\prime}). It turns out that the column rank and row rank of a matrix are the same. Hence, \mathrm{rank}(\mathbf{A})=\mathrm{rank}(\mathbf{A}^{\prime})\leq\mathrm{min}(m,n). If \mathrm{rank}(\mathbf{A})=m then \mathbf{A} is called full column rank, and if \mathrm{rank}(\mathbf{A})=n then \mathbf{A} is called full row rank. If \mathbf{A} is not full rank then it is called reduced rank.
Consider the 2\times3 matrix
\mathbf{A}=\left(\begin{array}{ccc}
1 & 3 & 5\\
2 & 4 & 6
\end{array}\right)
Here, \mathrm{rank}(\mathbf{A})\leq min(3,2)=2, the number of rows.
Now, \mathrm{rank}(\mathbf{A})=2 since the rows of \mathbf{A}
are linearly independent. In R, the rank of a matrix can be found
using the Matrix function rankMatrix()
:
library(Matrix)
matrix(c(1,3,5,2,4,6), 2, 3, byrow=TRUE)
Amat =as.numeric(rankMatrix(Amat))
## [1] 2
\blacksquare
The rank of an n\times n square matrix \mathbf{A} is directly related to its invertibility. If \mathrm{rank}(\mathbf{A})=n then \mathbf{A}^{-1} exists. This result makes sense. Since \mathbf{A}^{-1} is used to solve a system of n linear equations in n unknowns, it will have a unique solution as long as no equation in the system can be written as a linear combination of the other equations.
3.4.2 Partitioned matrices and partitioned inverses
Consider a general n\times m matrix \mathbf{A}: \underset{(n\times m)}{\mathbf{A}}=\left[\begin{array}{cccc} a_{11} & a_{12} & \ldots & a_{1m}\\ a_{21} & a_{22} & \ldots & a_{2m}\\ \vdots & \vdots & \ddots & \vdots\\ a_{n1} & a_{n2} & \ldots & a_{nm} \end{array}\right]. In some situations we might want to partition \mathbf{A} into sub-matrices containing sub-blocks of the elements of \mathbf{A}: \mathbf{A}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right], where the sub-matrices \mathbf{A}_{11}, \mathbf{A}_{12}, \mathbf{A}_{21}, and \mathbf{A}_{22} contain the appropriate sub-elements of \mathbf{A}. For example, consider \mathbf{A}=\left[\begin{array}{cc|cc} 1 & 2 & 3 & 4\\ 5 & 6 & 7 & 8\\ \hline 9 & 10 & 11 & 12\\ 13 & 14 & 15 & 16 \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right], where \begin{eqnarray*} \mathbf{A}_{11} & = & \left[\begin{array}{cc} 1 & 2\\ 5 & 6 \end{array}\right],\,\mathbf{A}_{12}=\left[\begin{array}{cc} 3 & 4\\ 7 & 8 \end{array}\right],\\ \mathbf{A}_{21} & = & \left[\begin{array}{cc} 9 & 10\\ 13 & 14 \end{array}\right],\,\mathbf{A}_{22}=\left[\begin{array}{cc} 11 & 12\\ 15 & 16 \end{array}\right]. \end{eqnarray*}
In R sub-matrices can be combined column-wise using cbind()
,
and row-wise using rbind()
. For example, to create the matrix
\mathbf{A} from the sub-matrices use
matrix(c(1,2,5,6), 2, 2, byrow=TRUE)
A11mat = matrix(c(3,4,7,8), 2, 2, byrow=TRUE)
A12mat = matrix(c(9,10,13,14), 2, 2, byrow=TRUE)
A21mat = matrix(c(11,12,15,16), 2, 2, byrow=TRUE)
A22mat = rbind(cbind(A11mat, A12mat), cbind(A21mat, A22mat))
Amat = Amat
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
\blacksquare
The basic matrix operations work on sub-matrices in the obvious way. To illustrate, consider another matrix \mathbf{B} that is conformable with \mathbf{A} and partitioned in the same way: \mathbf{B}=\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]. Then \mathbf{A}+\mathbf{B}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]+\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}+\mathbf{B}_{11} & \mathbf{A}_{12}+\mathbf{B}_{12}\\ \mathbf{A}_{21}+\mathbf{B}_{21} & \mathbf{A}_{22}+\mathbf{B}_{22} \end{array}\right]. If all of the sub-matrices of \mathbf{A} and \mathbf{B} are comformable then \mathbf{AB}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]\left[\begin{array}{cc} \mathbf{B}_{11} & \mathbf{B}_{12}\\ \mathbf{B}_{21} & \mathbf{B}_{22} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A}_{11}\mathbf{B}_{11}+\mathbf{A}_{12}\mathbf{B}_{21} & \mathbf{A}_{11}\mathbf{B}_{12}+\mathbf{A}_{12}\mathbf{B}_{22}\\ \mathbf{A}_{21}\mathbf{B}_{11}+\mathbf{A}_{22}\mathbf{B}_{21} & \mathbf{A}_{21}\mathbf{B}_{12}+\mathbf{A}_{22}\mathbf{B}_{22} \end{array}\right].
The transpose of a partitioned matrix satisfies \mathbf{A^{\prime}}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{\prime}=\left[\begin{array}{cc} \mathbf{A}_{11}^{\prime} & \mathbf{A}_{21}^{\prime}\\ \mathbf{A}_{12}^{\prime} & \mathbf{A}_{22}^{\prime} \end{array}\right]. Notice the interchange of the two off-diagonal blocks.
A partitioned matrix \mathbf{A} with appropriate invertible sub-matrices \mathbf{A}_{11} and \mathbf{A}_{22} has a partitioned inverse of the form \mathbf{A}^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11} & \mathbf{A}_{12}\\ \mathbf{A}_{21} & \mathbf{A}_{22} \end{array}\right]^{-1}=\left[\begin{array}{cc} \mathbf{A}_{11}^{-1}+\mathbf{A}_{11}^{-1}\mathbf{A}_{12}\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & -\mathbf{A}_{11}\mathbf{A}_{12}\mathbf{C}^{-1}\\ -\mathbf{C}^{-1}\mathbf{A}_{21}\mathbf{A}_{11}^{-1} & \mathbf{C}^{-1} \end{array}\right], where \mathbf{C}=\mathbf{A}_{22}-\mathbf{A}_{21}\mathbf{A}_{11}^{-1}\mathbf{A}_{12}. This formula can be verified by direct calculation.
Soving for x gives x=2y. Substituting this value into the equation x+y=1 gives 2y+y=1 and solving for y gives y=1/3. Solving for x then gives x=2/3.↩︎
Notice that the calculations in R do not show \mathbf{A}^{-1}\mathbf{A}=\mathbf{I} exactly. The (1,2) element of \mathbf{A}^{-1}\mathbf{A} is -5.552e-17, which for all practical purposes is zero. However, due to the limitations of machine calculations the result is not exactly zero.↩︎