7.2 Linear maps and matrices
For most of this section we will focus on linear maps in Euclidean space, that is maps from \(\mathbb{R}^n\) to \(\mathbb{R}^m\) for some \(m,n \in \mathbb{N}\).It turns out that such maps correspond to matrices in \(M_{m,n}(\mathbb{R})\).
Note: although we focus on real matrices, we could again replace every \(\mathbb{R}\) by \(\mathbb{C}\) and proceed in the same way.
We have already seen that any matrix can be thought of as a linear map, so we need only consider how an arbitrary linear map can be represented as a matrix. Consider a linear map \(T:\mathbb{R}^n \to \mathbb{R}^m\). In order for this map to be linear, each of the components must be a linear equation with zero constant term.The proof of this is left as an exercise.
Now if we think about how this map acts on on \(x=(x_1, \dots, x_n)\in \mathbb{R}^n\) we must have that \[T\begin{pmatrix}x_1\\ \vdots \\ x_n \end{pmatrix}=\begin{pmatrix}a_{11}x_1+ a_{12}x_2+\cdots + a_{1n} \\ \vdots \\ a_{m1}x_1+a_{m2}x_2 +\dots +a_{mn}x_n \end{pmatrix}\] for some coefficients \(a_{ij}\in \mathbb{R}\). But this is exactly the same as the action of the matrix \(A=(a_{ij})\in M_{m,n}(\mathbb{R}).\)
In cases where the map is presented as linear equations in each component it is straightforward to read off the matrix as above. We have also seen some examples where the form of the linear map is less obvious. In general, we can use the fact that the \(i\)th column of our matrix will be given by \(T(e_i)\) for all of our standard basis vectors.
Note that the matrix corresponding to a linear map is not unique, but instead depends on our choice of basis for the domain and co-domain. For now we will assume that we use the standard bases for both, but later in the course we will explore how to deal with different choices of basis, and some reasons why we may prefer to use alternative bases.
We previously introduced some operations for linear maps: addition, multiplication by a scalar, and composition. We want to study now how these translate to matrices.
Let \(S,T:\mathbb{R}^n\to\mathbb{R}^m\) be linear maps with corresponding matrices \(M_T=(t_{ij}), M_S=(s_{ij})\), and let \(\lambda\in\mathbb{R}\). Then the matrices corresponding to the maps \(\lambda T\) and \(S+T\) are given by \[M_{\lambda T}=\lambda M_T=(\lambda t_{ij}) \quad \text{and}\quad M_{S+T}=(s_{ij}+t_{ij})=M_S+M_T.\]
Let \(R=S+T\). The matrix associated with \(R\) is by Definition 7.29 given by \(M_R=(r_{ij})\) with \(i\)th column \(R(e_i)\), but since \(R(e_i)=(S+T)(e_i)=S(e_i)+T(e_i)\) we have that the \(i\)th column of \(M_R\) is the \(i\)th column of \(M_S\) plus the \(i\)th column of \(M_T\), and so \(M_R=M_S+M_T\).
Similarly we find that \(M_{\lambda T}\) has \(i\)th column \(\lambda T(e_i)\) and so \(M_{\lambda T}=\lambda M_T\).□
So the above theorem tells us that when adding linear maps/multiplying them by a scalar we just add the corresponding matrix elements/multiply them by a scalar. Note that this extends to expressions of the form \[M_{\lambda S+\mu T}=\lambda M_S+\mu M_T .\] and these expressions actually define the addition of matrices.
The composition of maps leads to multiplication of matrices.
Let \(T:\mathbb{R}^n\to\mathbb{R}^m\) and \(S:\mathbb{R}^m\to \mathbb{R}^l\) be linear maps with corresponding matrices \(M_T=(t_{ij})\) and \(M_S=(s_{ij})\), where \(M_T\) is \(m\times n\) and \(M_S\) is \(l\times m\). Then the matrix \(M_{S\circ T}=(r_{ik})\) corresponding to the composition \(R=S\circ T\) of \(T\) and \(S\) has elements \[r_{ik}=\sum_{j=1}^m s_{ij}t_{jk}\] and is an \(l\times n\) matrix.
The proof of this is left as an exercise. However, note that it follows from our definitions, and in fact it is for this reason that we defined matrix multiplication in the way that we did.
We can think about the formula for matrix multiplication as \(r_{ik}\) being the dot product of the \(i\)th row vector of \(M_S\) and the \(k\)th column vector of \(M_T\). This formula defines a product of matrices by \[M_SM_T:=M_{S\circ T} .\]
So we have now used the notions of addition and composition of linear maps to define addition and products of matrices. The results about maps then immediately imply corresponding results for matrices.
Let \(A,B\) be \(m\times n\) matrices and \(C\) an \(l\times m\) matrix, then \[C(A+B)=CA+CB .\] Let \(A,B\) be \(l\times m\) matrices and \(C\) an \(m\times n\) matrix, then \[(A+B)C=AC+BC .\] Let \(C\) be an \(m\times n\) matrix, \(B\) be an \(l\times m\) matrix and \(A\) a \(k\times l\) matrix, then \[A(BC)=(AB)C .\]
We saw in Theorem 3.11 that matrices define linear maps, and in Theorem 7.12 the above properties were shown for linear maps.
□
The first two properties mean that matrix multiplication is distributive over addition, and the last one is called associativity. In particular associativity would be quite cumbersome to prove directly for matrix multiplication, whereas the proof for linear maps is very simple. This shows that often an abstract approach simplifies proofs a lot. The price one pays for this is that it takes sometimes longer to learn and understand the material in a more abstract language.
Having identified our matrices with linear maps, we can now consider concepts like the image, kernel, rank and nullity of a matrix. Once again we can make use of Gaussian elimination, this time to find the rank and nullity of a matrix and hence of the corresponding linear map.
Let \(A\in M_{m,n}(\mathbb{R})\) and assume that the row echelon form of \(A\) has \(k\) leading \(1\)’ s, then \(\operatorname{rank}A=k\) and \(\operatorname{nullity}A=n-k\).
So in order to find the rank of a matrix we use elementary row operations to bring it to row echelon form and then we just count the number of leading \(1\)’s. The proof will be left as an exercise.
Consider \(A=\begin{pmatrix}1&-1 &3\\2&0&4\\0&3&1\\-1&-1&1 \end{pmatrix}\in M_{4,3}(\mathbb{R})\). We want to find the rank and nullity of this matrix. Using row operations, we find that a row echelon form of the matrix is \(\begin{pmatrix}1&-1&3\\0&1&-1\\0&0&1\\0&0&0 \end{pmatrix}\)
Hence we have that \(\operatorname{rank}A=3\) and \(\operatorname{nullity}A=0\).
We also previously saw that we could use the determinant to tell us about whether a system of \(n\) equations in \(n\) unknowns had a unique solution, but we can now go a step further using the image of a matrix to determine the outcome when the determinant is zero.
The system of linear equations \(Ax=b\), with \(A\in M_{n}(\mathbb{R})\), has a unique solution if and only if \(\det A\neq 0.\) If \(\det A=0\) and \(b\notin \operatorname{Im}A\) no solution exists, and if \(\det A=0\) and \(b\in \operatorname{Im}A\) then infinitely many solutions exist.
We know that \(A\) is invertible if and only if \(\det A\neq 0\), and then we find \[x=A^{-1}b.\] So \(\det A\neq 0\) means the system has a unique solution. If \(\det A=0\), then \(\operatorname{nullity}A>0\) and \(\operatorname{rank}A <n\), so a solution exists only if \(b\in \operatorname{Im}A\), and if a solution \(x_0\) exists, then all vectors in \(\{x_0\}+\ker A\) are solutions, too, hence there are infinitely many solutions.
□
In the next chapter, we will continue to explore linear maps and different ways we can represent them.