3.2 Matrices

The most important property of matrices is that one can multiply them under suitable conditions on the number of rows and columns. The product of matrices appears naturally if we consider a vector \(y=Ax\) and apply another matrix to it, i.e., \(By=B(Ax)\). The question is then if there exist a matrix \(C\) such that \[Cx=B(Ax) ,\] then we would call \(C=BA\) the matrix product of \(B\) and \(A\). If we use equation (3.3) and Theorem 3.11 we obtain \[\begin{equation} \begin{split} B(Ax)&=B(x_1a_1+x_2a_2+\cdots +x_na_n)\\ &= x_1 Ba_1+x_2 Ba_2+\cdots +x_nBa_n . \end{split} \tag{3.4}\end{equation}\] Hence if \(C\) is the matrix with columns \(Ba_1, \cdots , Ba_n\), then, again by (3.3), we have \(Cx=B(Ax)\).

We formulate this now a bit more precisely:

Theorem 3.12:

Let \(A=(a_{ij})\in M_{m,n}(\mathbb{R})\) and \(B=(b_{ij})\in M_{l,m}(\mathbb{R})\) then there exists a matrix \(C=(c_{ij})\in M_{l,n}(\mathbb{R})\) such that for all \(x\in \mathbb{R}^n\) we have \[Cx=B(Ax)\] and the elements of \(C\) are given by \[c_{ij}=\sum_{k=1}^mb_{ik}a_{kj} .\] Note that \(c_{ij}\) is the dot product between the \(i\)th row vector of \(B\) and the \(j\)th column vector of \(A\). We call \(C=BA\) the product of \(B\) and \(A\).

The theorem follows from (3.4), but to provide a different perspective we are going to give another proof.

Proof.

We write \(y=Ax\) and note that \(y=(y_1,y_2, \cdots , y_m)\) with \[\begin{equation}y_k=\sum_{j=1}^n a_{kj}x_j \tag{3.5}\end{equation}\] and similarly we write \(z=By\) and note that \(z=(z_1,z_2,\cdots , z_l)\) with \[\begin{equation} z_i=\sum_{k=1}^m b_{ik}y_k. \tag{3.6}\end{equation}\] Now inserting the expression (3.5) for \(y_k\) into (3.6) gives \[z_i=\sum_{k=1}^m b_{ik}\sum_{j=1}^n a_{kj}x_j=\sum_{j=1}^n \sum_{k=1}^mb_{ik}a_{kj} x_j=\sum_{j=1}^nc_{ij}x_j ,\] where we have exchanged the order of summation.

Note that in order to multiply to matrices \(A\) and \(B\), the number of rows of \(A\) must be the same as the number of columns of \(B\) in order that \(BA\) can be formed.

Exercise 3.13:
Let \(A=\begin{pmatrix}1 &0 &-4\\ -1& 4 &2 \end{pmatrix}\) and \(B=\begin{pmatrix}3&1\\-2&0\end{pmatrix}\). Which of the products \(A^2, AB, BA, B^2\) are defined?

If the matrices are of appropriate sizes so that the multiplication is defined, then matrix multiplication is distributive and associative.

Theorem 3.14:

Let \(A,B\) be \(m\times n\) matrices and \(C\) an \(l\times m\) matrix, then \[C(A+B)=CA+CB .\] Let \(A,B\) be \(m\times n\) matrices and \(C\) an \(n\times l\) matrix, then \[(A+B)C=AC+BC .\] Let \(A\) be an \(m\times n\) matrix, \(B\) be an \(n \times l\) matrix and \(C\) a \(l\times k\) matrix, then \[A(BC)=(AB)C .\]

The proof of this theorem will be a simple consequence of general properties of linear maps which we will discuss in Chapter 7.

Now let us look at a few examples of matrices and products of them. We say that a matrix is a square matrix if \(m=n\). If \(A=(a_{ij})\) is a \(n\times n\) square matrix, then we call the elements \(a_{ii}\) the diagonal elements of \(A\) and \(a_{ij}\) for \(i\neq j\) the off-diagonal elements of \(A\). A square matrix \(A\) is called a diagonal matrix if all off-diagonal elements are \(0\).

Example 3.15:
The following is a \(3\times 3\) diagonal matrix \[\begin{pmatrix}-2 & 0 &0 \\ 0 & 3 & 0\\ 0 & 0 &1\end{pmatrix},\] with diagonal elements \(a_{11}=-2,a_{22}=3\) and \(a_{33}=1\).

A special role is played by the so called unit matrix \(I\), also known as identity matrix. This is a matrix with elements \[\delta_{ij}:=\begin{cases} 1 & i=j\\ 0 & i\neq j\end{cases} ,\] i.e., a diagonal matrix with all diagonal elements equal to 1: \[I=\begin{pmatrix}1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{pmatrix}.\] The symbol \(\delta_{ij}\) is often called the Kronecker delta. If we want to specify the size of the unit matrix we write \(I_n\) for the \(n\times n\) unit matrix (although often the size will be left implicit). The unit matrix is the matrix of the identity in multiplication, so for any \(m\times n\) matrix \(A\) \[AI_n=I_mA=A .\]

Let us now look at a couple of examples of products of matrices. We will start by looking at an example of multiplying a pair of \(2\times 2\) matrices.

Example 3.16:

We have that \[\begin{split} \begin{pmatrix}1 & 2 \\ -1 & 0\end{pmatrix}\begin{pmatrix}-5 & 1 \\ 3 & -1\end{pmatrix}&=\begin{pmatrix}1\times(-5)+2\times 3 & 1\times 1+2\times (-1)\\ -1\times (-5)+0\times 3& -1\times 1+0\times (-1)\end{pmatrix}\\ &=\begin{pmatrix}1 & -1\\ 5 & -1\end{pmatrix}, \end{split}\]

where we have explicitly written out the intermediate step where we write each element of the product matrix as a dot product of a row vector of the first matrix and a column vector of the second matrix.
Exercise 3.17:
Is matrix multiplication commutative? That is must we have \(AB=BA\) for any two matrices \(A\) and \(B\)?
Click for solution
Example 3.18:
Let us compute the product from Example 3.16 the other way round: \[\begin{pmatrix}-5 & 1 \\ 3 & -1\end{pmatrix}\begin{pmatrix}1 & 2 \\ -1 & 0\end{pmatrix}=\begin{pmatrix}-6 & -10 \\ 4 & 6\end{pmatrix}\] and we see that the result is different.

So contrary to the multiplication of numbers, the product of matrices depends on the order in which we take the product. In other words, matrix multiplication is not commutative in general, that is \[AB\neq BA .\]

 

A few other interesting matrix products are noted below.

  • It is possible for the product of two non-zero matrices to be a zero matrix, for example \(\begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}\begin{pmatrix}0 & 0 \\ 0 & 1\end{pmatrix}=\begin{pmatrix}0 & 0 \\ 0 & 0\end{pmatrix} =0\). We use \(0\) to denote a zero matrix (again, the size of this should be clear from the context, or can be specified using subscript notation).

  • Similarly, the square of a non-zero matrix can be \(0\), for example \(\begin{pmatrix}0 & 1 \\ 0 & 0 \end{pmatrix}^2=\begin{pmatrix}0 & 1 \\ 0 & 0 \end{pmatrix}\begin{pmatrix}0 & 1 \\ 0 & 0 \end{pmatrix}= \begin{pmatrix}0 & 0 \\ 0 & 0\end{pmatrix}=0\).

  • Let \(J=\begin{pmatrix}0 & -1 \\ 1 & 0\end{pmatrix}\) then \(J^2=-I\), i.e., the square of \(J\) is \(-I\), very similar to \(\mathrm{i}=\sqrt{-1}\).

Exercise 3.19:
Let \(A, B, C \in M_{n}(\mathbb{R})\) be non-zero matrices. Is it true that if \(AB = AC,\) then \(B=C\)?
Click for solution No, this is not true in general. The first two bullet points above give a counterexample. However, if a matrix is invertible, a concept we will explore towards the end of the chapter, then this will hold.

 

These examples show that matrix multiplication behaves very different from multiplication of numbers which we are used to.

It is also instructive to look at products of matrices which are not square matrices. Recall that by definition we can only form the product of \(A\) and \(B\), \(AB\), if the number of rows of \(B\) is equal to the number of columns of \(A\).

Example 3.20:
Consider for instance the following matrices \[A=\begin{pmatrix}1 & -1 &2\end{pmatrix},\quad B=\begin{pmatrix}2\\ 0\\1\end{pmatrix},\quad C=\begin{pmatrix}1 & 3 & 0\\ -2 & 1& 3\end{pmatrix}, \quad D=\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix},\] then \(A\) is \(1\times 3\) matrix, \(B\) a \(3\times 1\) matrix, \(C\) a \(2\times 3\) matrix and \(D\) a \(2\times 2\) matrix. So we can form the following products \[AB=4 ,\quad BA =\begin{pmatrix}2 & -2 & 4\\ 0 & 0 & 0\\ 1 & -1 & 2\end{pmatrix},\quad CB=\begin{pmatrix}2 \\ -1\end{pmatrix}, \quad DC=\begin{pmatrix}-2 & 1 & 3 \\ 1 & 3 & 0\end{pmatrix}, \quad D^2=I\] and no others. Notice that the product of \(A\) and \(B\) is the \(1 \times 1\) matrix \(\begin{pmatrix} 4 \end{pmatrix}\), but this is naturally identified with the scalar \(4\).

There are a few types of matrices which occur quite often and therefore have special names. We will give a list of some we will encounter:

  • Triangular matrices: These come in two types,

    • upper triangular: \(A=(a_{ij})\) with \(a_{ij}=0\) if \(i>j\), e.g., \(\begin{pmatrix}1 &3& -1 \\ 0 & 2 &1\\ 0 & 0 & 3\end{pmatrix}\)

    • lower triangular: \(A=(a_{ij})\) with \(a_{ij}=0\) if \(i<j\), e.g., \(\begin{pmatrix}1 &0& 0 \\ 5 & 2 &0\\ 2 & -7 & 3\end{pmatrix}\)

  • Symmetric matrices: \(A=(a_{ij})\) with \(a_{ij}=a_{ji}\), e.g., \(\begin{pmatrix}1 & 2 & 3 \\ 2& -1 & 0 \\ 3 & 0 & 1\end{pmatrix}\).

  • Anti- (or skew-)symmetric matrices: \(A=(a_{ij})\) with \(a_{ij}=-a_{ji}\), e.g., \(\begin{pmatrix}0 & -1 & 2 \\ 1 & 0 & 3 \\ -2 & -3 & 0\end{pmatrix}\).

The following operation on matrices occurs quite often in applications.

Definition 3.21: (Transpose)
Let \(A=(a_{ij})\in M_{m,n}(\mathbb{R})\) then the transpose of \(A\), denoted \(A^t\), is a matrix in \(M_{n,m}(\mathbb{R})\) with elements \(A^t=(a_{ji})\) (the indices \(i\) and \(j\) are switched). In other words, \(A^t\) is obtained from \(A\) by exchanging the rows with the columns.
Example 3.22:
For the matrices \(A,B,C, D\) we considered in Example 3.20 we obtain \[A^t=\begin{pmatrix}1 \\ -1 \\2\end{pmatrix},\quad B^t=\begin{pmatrix}2&0 &1\end{pmatrix},\quad C^t=\begin{pmatrix}1 & -2\\ 3 & 1\\ 0 & 3\end{pmatrix}, \quad D^t=\begin{pmatrix}0 & 1\\ 1 & 0\end{pmatrix}.\]

A matrix is symmetric if \(A^t=A\) and anti-symmetric if \(A^t=-A\). Any square matrix \(A\in M_{n,n}(\mathbb{R})\) can be decomposed into a sum of a symmetric and an anti-symmetric matrix by \[A=\frac{1}{2}(A+A^t)+\frac{1}{2}(A-A^t) .\]

One of the reasons why the transpose is important is the following relation with the dot-product.

Theorem 3.23:

Let \(A\in M_{m,n}(\mathbb{R})\), then we have for any \(x\in \mathbb{R}^n\) and \(y\in \mathbb{R}^m\) \[y\cdot Ax=(A^ty)\cdot x.\]

Proof.

The \(i\)th component of \(Ax\) is \(\displaystyle \sum_{j=1}^na_{ij}x_j\) and so \[\displaystyle y\cdot Ax=\sum_{i=1}^m\sum_{j=1}^n y_ia_{ij}x_j.\] On the other hand the \(j\)th component of \(A^ty\) is \(\displaystyle \sum_{i=1}^ma_{ij}y_i\) and so \[\displaystyle (A^ty)\cdot x=\sum_{j=1}^n\sum_{i=1}^m x_ja_{ij}y_i.\] Since the order of summation does not matter in a finite double sum the two expressions agree.

One important property of the transpose which can be derived from this relation is

Theorem 3.24:

Let \(A\in M_{m,n}(\mathbb{R})\) and \(B \in M_{n,k}(\mathbb{R})\). Then \[(AB)^t=B^tA^t\]

Proof.

Using Theorem 3.23 for \((AB)\) gives \(\big((AB)^ty\big)\cdot x=y\cdot (ABx)\) and now we apply Theorem 3.23 first to \(A\) and then to \(B\) which gives \(y\cdot (ABx)=(A^ty)\cdot (Bx)=(B^tA^t y)\cdot x\) and so we have \[\big((AB)^ty\big)\cdot x=(B^tA^t y)\cdot x.\] Since this is true for any \(x,y\) we have \((AB)^t=B^t A^t\).

There is another connection between transposes and dot products. When we have a vector \(x\in \mathbb{R}^n\) and wish to view it as a matrix, the standard convention is to view it as a column vector, that is as a \(n \times 1\) matrix. With this convention, multiplying an \(m \times n\) matrix \(A\) and \(x\) together gives the \(m \times 1\) matrix \(Ax\in \mathbb{R}^m\) as in Definition 3.9.

Moreover, if \(x= \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}, y=\begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix} \in \mathbb{R}^n\), then \[x\cdot y= x_1y_1+\cdots x_ny_n = \begin{pmatrix} x_1 & \cdots & x_n \end{pmatrix}\begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix} = x^ty.\] So, for example, \(y\cdot Ax= y^tAx\). Beware that \(xy^t\) is very different: it is an \(n \times n\) matrix \[xy^t = \begin{pmatrix}x_1y_1 & x_1y_2& \cdots & x_1y_n \\ x_2y_1&x_2y_2&\cdots&x_2y_n\\ \vdots&\vdots&\ddots&\vdots \\ x_ny_1 & x_ny_2& \cdots & x_ny_n \end{pmatrix}= (x_iy_j) \in M_{n,n}(\mathbb{R}).\]

In this section we have primarily focused on multiplication of matrices, as this is our most important operation. However, before we move on it is worth mentioning that we can also add matrices and multiply them by scalars, both of which are just done componentwise.

Definition 3.25: (Matrix addition)
For matrices \(A=(a_{ij})\) and \(B=(b_{ij})\in M_{m,n}(\mathbb{R})\) we have that \(A+B=C\in M_{m,n}(\mathbb{R})\) with \(c_{ij}=a_{ij}+b_{ij}\).
Definition 3.26: (Scalar multiplication of matrices)
For a matrix\(A=(a_{ij})\in M_{m,n}(\mathbb{R})\) and \(\lambda \in \mathbb{R}\) we have that \(\lambda A=C\in M_{m,n}(\mathbb{R})\) with \(c_{ij}=\lambda a_{ij}\).