7.1 Properties of linear maps

In this section we will study some properties of linear maps and develop some of the related structures. We will later see that this connects to the exploration of matrices that we looked at earlier in the course.

7.1.1 Combinations of linear maps

Throughout this section we assume that \(U, V, W\) and \(X\) are vector spaces over \(\mathbb{F}\).

We first notice that we can add linear maps if they relate the same spaces, and multiply them by scalars:

Definition 7.8: (Addition and scalar multiplication of linear maps)

Let \(S:V\to W\) and \(T:V\to W\) be linear maps, and \(\lambda\in\mathbb{F}\). Then we define

\((\lambda T)(x):=\lambda T(x)\) and
\((S+T)(x):=S(x)+T(x)\).

Exercise 7.9:

Do you think that \(S+T\) and \(\lambda T\) will also be linear maps?

Click for solution

Theorem 7.10:

Let \(\lambda \in \mathbb{F}\) and \(T, S:V \to W\) be linear maps. The maps \(\lambda T\) and \(S+T\) are linear maps from \(V\) to \(W\).

The proof follows directly from the definitions.

We can also compose maps in general, and for linear maps we find that the composition (if defined) is linear.

Theorem 7.11:

Let \(T:U\to V\) and \(S:V\to W\) be linear maps, then the composition \[S\circ T(x):=S(T(x))\] is a linear map \(S\circ T:U\to V\).

Proof.

We consider first the action of \(S\circ T\) on \(\lambda x\): Since \(T\) is linear we have \(S\circ T(\lambda x)=S(T(\lambda x))=S(\lambda T(x))\) and since \(S\) is linear, too, we get \(S(\lambda T(x))=\lambda S(T(x))=\lambda S\circ T(x)\). Now we apply \(S\circ T\) to \(x+y\): \[\begin{aligned} S\circ T(x+y)&=S(T(x+y))\\ &=S(T(x)+T(y))\\ &=S(T(x))+S(T(y))=S\circ T(x)+S\circ T(y) . \end{aligned}\]

□

In a similar way one can prove the following.

Theorem 7.12:

Let \(T:U\to V\) and \(R, S:V\to W\) be linear maps, then \[(R+S)\circ T=R\circ T+S\circ T\] and if \(S, T:U\to V\) and \(R :V\to W\) be linear maps, then \[R\circ(S+T)=R\circ S+R\circ T .\] Furthermore if \(T:U\to V\), \(S:V\to W\) and \(R:W\to X\) are linear maps then \[(R\circ S)\circ T=R\circ(S\circ T) .\]

Note that the first and third properties hold for any functions, but the second relies on the linearity of the functions.

7.1.2 Image and kernel

Let us define two subsets related naturally to each linear map.

Definition 7.13: (Image and kernel)

Let \(T:V\to W\) be a linear map, then we define

the image of \(T\) to be \[\operatorname{Im}T=\{ y\in W: \text{there exists } x\in V \text{ with } T(x)=y\}\subseteq W,\]
the kernel of \(T\) to be \[\ker T=\{ x\in V: T(x)=\mathbf{0}\}\subseteq V .\]

Example 7.14:

Let \(T:\mathbb{R}^3 \to \mathbb{R}^2\) be defined by \(T\begin{pmatrix}x_1\\x_2\\x_3 \end{pmatrix}=\begin{pmatrix} x_1+x_3 \\ 4x_2\end{pmatrix}\). We have seen that this is a linear map. Then \[\operatorname{Im}T=\left\{\begin{pmatrix} x_1+x_3 \\ 4x_2\end{pmatrix}: x_1, x_2, x_3 \in \mathbb{R}\right\}=\mathbb{R}^2\] and \[\begin{align*} \ker T &=\left\{\begin{pmatrix} x_1\\x_2\\x_3\end{pmatrix}\in \mathbb{R}^3:\begin{pmatrix} x_1+x_3 \\ 4x_2\end{pmatrix}=\mathbf{0}\right\}\\ &=\left\{\begin{pmatrix} x_1\\x_2\\x_3\end{pmatrix}\in \mathbb{R}^3: x_1+x_3=0 \text{ and } 4x_2=0\right\}\\ &=\left\{\begin{pmatrix} x_1\\0\\-x_1\end{pmatrix}: x_1\in \mathbb{R}\right\}\\ &=\operatorname{span}\left\{\begin{pmatrix} 1\\0\\-1\end{pmatrix}\right\}. \end{align*}\]

Example 7.15:

Let \(A\in M_{m,n}(\mathbb{R})\) and let \(a_1, a_2,\cdots ,a_n\in \mathbb{R}^m\) be the column vectors of \(A\), and set \(T_Ax:=Ax\) which is a linear map \(T:\mathbb{R}^n\to\mathbb{R}^m\). Then since \(T_Ax=\mathbf{0}\) means \(Ax=\mathbf{0}\) we have \[\ker T_A=S(A,\mathbf{0}) ,\] and from the relation \(T_Ax=Ax=x_1a_1+x_2a_2+\cdots +x_na_n\) we see that \[\operatorname{Im}T_A=\operatorname{span}\{a_1, a_2,\cdots ,a_n\} .\]

In these examples we see that the image and kernel are actually subspaces rather than just subsets. This was not a coincidence, as the following theorem shows.

Theorem 7.16:

Let \(T:V\to W\) be a linear map, then \(\operatorname{Im}T\) is a linear subspace of \(W\) and \(\ker T\) is a linear subspace of \(V\).

The proof is left as an exercise.

Now let us relate the image and the kernel to the some general properties of a map.

Recall that if \(A,B\) are sets and \(f:A\to B\) is a map (not necessarily linear), then \(f\) is called

surjective, if for any \(b\in B\) there exists an \(a\in A\) such that \(f(a)=b\).
injective, if whenever \(f(a)=f(a')\) then \(a=a'\).
bijective, if \(f\) is injective and surjective, that is if for any \(b\in B\) there exist exactly one \(a\in A\) with \(f(a)=b\).

Theorem 7.17:

If \(f:A\to B\) is bijective, then there exists a unique map \(f^{-1}:B\to A\) with \(f\circ f^{-1}(b)=b\) for all \(b\in B\), \(f^{-1}\circ f(a)=a\) for all \(a\in A\) and \(f^{-1}\) is bijective, too.

Proof.

Let us first show existence: For any \(b\in B\), there exists an \(a\in A\) such that \(f(a)=b\), since \(f\) is surjective. Since \(f\) is injective, this \(a\) is unique, i.e., if \(f(a')=b\), then \(a'=a\), so we can set \[f^{-1}(b):=a .\] By definition this map satisfies \(f\circ f^{-1}(b)=f(f^{-1}(b))=f(a)=b\) and \(f^{-1}(f(a))=f^{-1}(b)=a\). From these we also get that \(f^{-1}\) is bijective.

□

In the special case of linear maps we can connect these general properties to the image and kernel of our map.

Theorem 7.18:

Let \(T: V\to W\) be a linear map, then

\(T\) is surjective if and only if \(\operatorname{Im}T=W\),
\(T\) is injective if and only if \(\ker T=\{\mathbf{0}\}\), and
\(T\) is bijective if and only if \(\operatorname{Im}T=W\) and \(\ker T=\{\mathbf{0}\}\).

Proof.

That surjective is equivalent to \(\operatorname{Im}T=W\) follows directly from the definitions of surjectivity and \(\operatorname{Im}T\).
Notice that we always have \(\mathbf{0}\in \ker T\), since \(T(\mathbf{0})=\mathbf{0}\). Now assume \(T\) is injective and let \(x\in \ker T\), i.e, \(T(x)=\mathbf{0}\). But then \(T(x)=T(\mathbf{0})\) and injectivity of \(T\) gives \(x=\mathbf{0}\), hence \(\ker T=\{\mathbf{0}\}\). For the converse, let \(\ker T=\{\mathbf{0}\}\), and assume there are \(x,x'\in V\) with \(T(x)=T(x')\). Using linearity of \(T\) we then get \(\mathbf{0}=T(x)-T(x')=T(x-x')\) and hence \(x-x'\in \ker T\), and since \(\ker T=\{\mathbf{0}\}\) this means that \(x=x'\) and hence \(T\) is injective.
This follows immediately from the previous two parts of the theorem.

\(\;\)

□

Exercise 7.19:

Is \(T:\mathbb{R}^3 \to \mathbb{R}^2\), \(T\begin{pmatrix}x_1\\x_2\\x_3 \end{pmatrix}=\begin{pmatrix} x_1+x_3 \\ 4x_2\end{pmatrix}\) injective? Is it surjective?

An important property of linear maps with \(\ker T=\{\mathbf{0}\}\) is the following.

Theorem 7.20:

Let \(x_1,x_2,\cdots,x_k\in V\) be linearly independent, and \(T:V\to W\) be a linear map with \(\ker T=\{\mathbf{0}\}\). Then \(T(x_1), T(x_2) ,\cdots ,T(x_k)\) are linearly independent.

Proof.

Assume \(T(x_1), T(x_2),\cdots ,T(x_k)\) are linearly dependent, i.e., there exist \(\lambda_1,\lambda_2,\cdots,\lambda_k\), not all \(0\), such that \[\lambda_1T(x_1)+\lambda_2T(x_2)+\cdots +\lambda_kT(x_k)=\mathbf{0}.\] But since \(T\) is linear we have \[T(\lambda_1x_1+\lambda_2x_2+\cdots +\lambda_kx_k)= \lambda_1T(x_1)+\lambda_2T(x_2)+\cdots +\lambda_kT(x_k)=\mathbf{0},\] and hence \(\lambda_1x_1+\lambda_2x_2+\cdots +\lambda_kx_k\in \ker T\). But since \(\ker T=\{\mathbf{0}\}\) it follows that \[\lambda_1x_1+\lambda_2x_2+\cdots +\lambda_kx_k=\mathbf{0},\] which means that the vectors \(x_1,x_2, \cdots ,x_k\) are linearly dependent, and this contradicts the assumption. Therefore \(T(x_1), T(x_2),\cdots ,T(x_k)\) are linearly independent.

□

Notice that this result implies that if \(T\) is bijective, it maps a basis of \(V\) to a basis of \(W\), hence \(V=W\).

We saw that a bijective map has an inverse; we now show that if \(T\) is linear, then the inverse is linear, too.

Theorem 7.21:

Let \(T:V\to V\) be a linear map and assume \(T\) is bijective. Then \(T^{-1}:V\to V\) is also linear.

Proof.

Let \(y, y' \in V\). We want to show \(T^{-1}(y+y')=T^{-1}(y)+T^{-1}(y')\). Since \(T\) is bijective we know that there are unique \(x,x'\in V\) with \(y=T(x)\) and \(y'=T(x')\), therefore \[y+y'=T(x)+T(x')=T(x+x')\] and applying \(T^{-1}\) to both sides of this equation gives \[T^{-1}(y+y')=T^{-1}(T(x+x'))=x+x'=T^{-1}(y)+T^{-1}(y') .\]

The second property of a linear map is shown in a similar way and this is left as an exercise.

□

7.1.3 Rank and nullity

As the image and kernel of a linear map are subspaces we can find bases for these and hence determine their dimension. We give the dimension of the image and the kernel names as they are important concepts that can tell us a lot about a map.

Definition 7.22: (Nullity and rank)

Let \(T:\mathbb{R}^n\to\mathbb{R}^m\) be a linear map, then we define the nullity of \(T\) as \[\operatorname{nullity}T:=\dim \ker T ,\] and the rank of \(T\) as \[\operatorname{rank}T:=\dim \operatorname{Im}T .\]

Example 7.23:

Let \(T:\mathbb{R}^2 \to \mathbb{R}^2\), \(T(x)=\begin{pmatrix}0 & 1 \\ 0 & 0\end{pmatrix}\begin{pmatrix}x_1\\x_2\end{pmatrix}=\begin{pmatrix}x_2\\ 0\end{pmatrix}.\) This is a linear map (left as an exercise). Then \(x\in \ker T\) means \(x_2=0\), hence \(\ker T=\operatorname{span}\{e_1\}\), and \(\operatorname{Im}T=\operatorname{span}\{e_1\}\). Therefore we find \(\operatorname{rank}T=1\) and \(\operatorname{nullity}T=1\).

Exercise 7.24:

Find the rank and nullity of the linear map \(T:\mathbb{R}^3 \to \mathbb{R}^2\), \(T\begin{pmatrix}x_1\\x_2\\x_3 \end{pmatrix}=\begin{pmatrix} x_1+x_3 \\ 4x_2\end{pmatrix}\)

So in view of our discussion in the previous subsection we have that a map \(T: V\to W\) is injective if \(\operatorname{nullity}T=0\) and surjective if \(\operatorname{rank}T=\dim W\). It turns out that rank and nullity are actually related; this is the content of the Rank Nullity Theorem.

Theorem 7.25: (Rank-Nullity Theorem)

Let \(T:V \to W\) be a linear map, then \[\operatorname{rank}T+\operatorname{nullity}T=\dim V .\]

We will explore this theorem and its proof further later in the course. For now, let us just consider a few consequences of this.

Exercise 7.26:

Suppose that \(T:\mathbb{R}^n\to\mathbb{R}^m\) is invertible. What can we say about \(n\) and \(m\)?

Click for solution

We must have that \(n=m\). In fact we can state this as a result about subspaces in general.

Corollary 7.27:

If the linear map \(T:V\to W\) is invertible then \(\dim V= \dim W\).

Proof.

Let \(\dim V=n\) and \(\dim W=m\). Since \(T\) is invertible we have that \(\operatorname{rank}T=m\) and \(\operatorname{nullity}T=0\). Hence by the Rank-Nullity Theorem we have that \(m=\operatorname{rank}T=n\).

□

Corollary 7.28:

Let \(T: V\to V\) be a linear map, then

if \(\operatorname{rank}T=\dim V\), then \(T\) is invertible,
if \(\operatorname{nullity}T=0\), then \(T\) is invertible.

Proof.

We have that \(T\) is invertible if \(\operatorname{rank}T=\dim V\) and \(\operatorname{nullity}T=0\), but by the Rank-Nullity Theorem \(\operatorname{rank}T+\operatorname{nullity}T=\dim V\), hence any one of the conditions implies the other.

□

We have already seen a couple of examples where we have defined linear maps using matrices. However, the connection between matrices and linear maps is even stronger, as we will now explore.