4.1 Definition and basic properties

To begin with, we will use the axiomatic approach to define the determinant of a \(2\times 2\) matrix, and show that it gives the formula (4.1). We will generalise this to \(n\times n\) matrices later.

We will write the determinant as a function of the column vectors of a matrix4 which will take the column vectors as input and output a real number. Recall that for the \(2\times 2\) matrix \(A=\begin{pmatrix}a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix}\) the two column vectors are \(a_1=\begin{pmatrix}a_{11}\\a_{21}\end{pmatrix}\) and \(a_2=\begin{pmatrix}a_{12}\\ a_{22}\end{pmatrix}\).

Definition 4.1: (2-determinant function)

An \(2\)-determinant function \(d_2(a_1,a_2)\) is a function \[d_2 :\mathbb{R}^2\times \mathbb{R}^2\to\mathbb{R},\] which satisfies the following three conditions:

  • Multilinearity: The function is linear in each argument, that is

    • \(d_2(\lambda a_1+\mu b_1,a_2)=\lambda d_2(a_1,a_2)+\mu d_2(b_1,a_2)\) for all \(\lambda,\mu\in\mathbb{R}\) and \(a_1,a_2,b_1\in \mathbb{R}^2\), and

    • \(d_2(a_1,\lambda a_2+\mu b_2)=\lambda d_2(a_1,a_2)+\mu d_2(a_1,b_2)\) for all \(\lambda,\mu\in\mathbb{R}\) and \(a_1,a_2,b_2\in \mathbb{R}^2\).

  • Alternating: The function is antisymmetric under exchange of arguments, so \[d_2(a_2,a_1)=-d_2(a_1,a_2)\] for all \(a_1,a_2\in \mathbb{R}^2\)

  • Normalisation: \(d_2(e_1,e_2)=1\).

These three conditions prescribe what happens to the determinant if we manipulate the columns of a matrix, e.g., (A) says that exchanging columns changes the sign. In particular we can rewrite (A) as \[d_2(a_1,a_2)+d_2(a_2,a_1)=0 ,\] and so if \(a_1=a_2=a\), then \[\begin{equation} d_2(a,a)=0. \tag{4.2}\end{equation}\] That means if the two columns in a matrix are equal, then the determinant is \(0\).

The first condition can be used to find out how a determinant function behaves under elementary column operations on the matrix5. Say if we multiply column 1 by \(\lambda\), then \[d_2(\lambda a_1,a_2)=\lambda d_2(a_1,a_2) ,\] and if we add \(\lambda\) times column \(2\) to column \(1\) we get \[d_2(a_1+\lambda a_2,a_2)=d_2(a_1,a_2)+\lambda d_2(a_2,a_2)=d_2(a_1,a_2) ,\] by (4.2).

Now let us see how much the conditions in the definition restrict the function \(d_2\). If we write \(a_1=a_{11}e_1+a_{21}e_2\) and \(a_2=a_{12}e_1+a_{22}e_2\), then we can use multilinearity to obtain \[\begin{split} d_2(a_1,a_2)&=d_2(a_{11}e_1+a_{21}e_2,a_{12}e_1+a_{22}e_2)\\ &=a_{11}d_2(e_1,a_{12}e_1+a_{22}e_2)+a_{21}d_2(e_2,a_{12}e_1+a_{22}e_2)\\ &=a_{11}a_{12}d_2(e_1,e_1)+a_{11}a_{22}d_2(e_1,e_2) +a_{21}a_{12}d_2(e_2,e_1)+a_{21}a_{22}d_2(e_2,e_2) . \end{split}\] This means that the function is completely determined by its values on the vectors \(e_i\). Now (4.2) implies that \[d_2(e_1,e_1)=d_2(e_2,e_2)=0 ,\] and by antisymmetry \(d_2(e_2,e_1)=-d_2(e_1,e_2)\), hence \[d_2(a_1,a_2)=(a_{11}a_{22}-a_{21}a_{12})d_{2}(e_1,e_2) .\] Finally the normalisation \(d_2(e_1,e_2)=1\) means that \(d_2\) is actually uniquely determined and \[d_2(a_1,a_2)=a_{11}a_{22}-a_{21}a_{12} =\det A,\] as it was defined by formula (4.1). Let us also note that if we just invoke the axioms (ML), (A), we get \[d_2(a_1,a_2) = \det A \cdot d_{2}(e_1,e_2),\] that such is a function \(d_2\), without the (N) axiom equals \(\det A\) times a number which is \(d_{2}(e_1,e_2)\), its value on the identity matrix.

But with all the three axioms in play, there is only one determinant function \(d_2\) satisfying the three properties (ML), (A), (N), and it coincides with the expression (4.1), i.e., \[d_2 (a_1,a_2)=a_{11}a_{22}-a_{21}a_{12}=\det \begin{pmatrix}a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix}.\]

The determinant was originally not introduced this way, it emerged from the study of systems of linear equation as a combination of coefficients which seemed to be indicative of solvability.

The conditions in the definition probably seem to be a bit ad hoc. If we think of \(d_2(a,b)\) as a “signed area” they seem less strange: consider the area of the parallelogram with sides \(a, b\), with sign \(+\) if the direction of \(b\) is obtained from that of \(a\) by rotating up to \(\pi\) radians anti-clockwise, and with sign \(-\) otherwise. So for \(e_1, e_2\) we get a signed area of \(1\) (this is property (N)), while for \(e_1, -e_2\) we get a signed area of \(-1\). Swapping \(a,b\) to \(b,a\) switches the sign of the signed area, giving (A). Finally, on drawing a picture it is easy to see that \(a, b\) and \(a, b+\lambda a\) give the same signed areas, and (ML) can be shown similarly.

We extend the definition now to \(n\times n\) matrices:

Definition 4.2: (\(n\)-determinant function)

An \(n\)-determinant \(d_n(a_1,a_2,\cdots, a_n)\) is a function \[d_n :\underbrace{\mathbb{R}^n\times \mathbb{R}^n\times \cdots \times \mathbb{R}^n}_{n\text{ times}}\to \mathbb{R},\] which satisfies

  • Multilinearity: The function is linear in each column, that is for any \(j\) and any \(a_j,b_j\in\mathbb{R}^n\), \(\lambda,\mu\in\mathbb{R}\) \[d_n(\cdots , \lambda a_j+\mu b_j,\cdots)=\lambda d_n(\cdots , a_j,\cdots) + \mu d_n(\cdots , b_j,\cdots) ,\] where the \(\cdots\) mean the other \(n-1\) vectors stay fixed.

  • Alternating: The function is antisymmetric in each pair of arguments, that is whenever we exchange two vectors we pick up a factor \(-1\), so if \(i\neq j\) then \[d_n(\cdots, a_i,\cdots,a_j,\cdots)=-d_n(\cdots, a_j,\cdots,a_i,\cdots) .\]

  • Normalisation: \(d_n(e_1,e_2,\cdots,e_n)=1\).

We will sometimes call these three properties the axioms of the determinant. We have formulated the determinant function as function of vectors; to connect it to matrices we take these vectors to be the column vectors of a matrix. The properties \((ML)\) and \((A)\) then correspond to column operations in the same way as we discussed after the definition of a 2-determinant. The property (N) means that the unit matrix has determinant \(1\).

Before proceeding to the proof that there is only one n-determinant let us consider some properties of the determinant.

Exercise 4.3:
Let \(A \in M_n(\mathbb{R})\) and \(\lambda\in \mathbb{R}\). Does \(d_n(\lambda A)=\lambda d_n(A)\)?

We next consider a generalisation of (4.2).

Proposition 4.4:

Let \(d_n(a_1,a_2,\cdots, a_n)\) be an \(n\)-determinant, then

  1. whenever one of the vectors \(a_1,a_2,\cdots ,a_n\) is \(\mathbf{0}\) then \[d_n(a_1,a_2,\cdots, a_n)=0.\]

  2. whenever two of the vectors \(a_1,a_2,\cdots ,a_n\) are equal, then \[d_n(a_1,a_2,\cdots, a_n)=0.\]

  </p></div>\EndKnitrBlock{proposition}
Proof.

  1. We use multilinearity. We have for any \(a_j\) that \(d_n(\cdots ,\lambda a_j, \cdots)=\lambda d_n(\cdots ,a_j, \cdots)\) for any \(\lambda\in \mathbb{R}\), and setting \(\lambda=0\) gives \(d_n(\cdots ,\mathbf{0},\cdots )=0\).

  2. We rewrite condition \((A)\) in the definition as \[d_n(\cdots, a_i,\cdots,a_j,\cdots)+d_n(\cdots, a_j,\cdots,a_i,\cdots)=0 ,\] and so if \(a_i=a_j\), then \(2d_n(\cdots, a_i,\cdots,a_j,\cdots)=0\).

\(\;\)

As a direct consequence we obtain the following useful property: we can add to a column any multiple of one of the other columns without changing the value of the determinant function.

Corollary 4.5:

We have for any \(j\neq i\) and \(\lambda\in\mathbb{R}\) that \[d_n(a_1, \cdots , a_i+\lambda a_j, \cdots a_n)=d_n(a_1, \cdots , a_i, \cdots a_n) .\]

Proof.

By multilinearity we have \[d_n(a_1, \cdots , a_i+\lambda a_j, \cdots a_n)=d_n(a_1, \cdots , a_i, \cdots a_n)+\lambda d_n(a_1, \cdots , a_j, \cdots a_n),\] but in the second term two of the vectors in the arguments are the same, hence the term is \(0\).

Exercise 4.6:

What effect will the elementary column operations

  1. swapping two columns,

  2. multiplying a column by \(\lambda \in \mathbb{R}\),

  3. adding a multiple of one column to another

have on the determinant of a matrix?

Click for solution

We have that

  1. alters the sign of the value of \(d_n\),

  2. multiplies the determinant by \(\lambda\), and

  3. does not checange the determinant.

 

We can now compute the determinant function for some special classes of matrices.

Example 4.7:
For diagonal matrices, i.e, \(A=(a_{ij})\) with \(a_{ij}=0\) if \(i\neq j\), the columns are \(a_1=a_{11}e_1, a_2=a_{22}e_2, \cdots , a_n=a_{nn}e_n\) and using multilinearity in each argument and normalisation we get \[\begin{equation}\begin{split} d_n(a_{11}e_1, a_{22}e_2, \cdots , a_{nn}e_n)&=a_{11}d_n(e_1, a_{22}e_2, \cdots , a_{nn}e_n)\\ &=a_{11}a_{22}d_n(e_1, e_2, \cdots , a_{nn}e_n)\\ &=a_{11}a_{22}\cdots a_{nn}d_n(e_1, e_2, \cdots , e_n) =a_{11}a_{22}\cdots a_{nn} \end{split}\tag{4.3}\end{equation}\]
Example 4.8:

Using the properties of a determinant function we know by now we can actually already compute them, though not in a very efficient way. If we repeat what we’ve done for \(d_2\) in the case of \(3\) vectors in \(\mathbb{R}^3\) we get the slightly cumbersome formula

\[\begin{equation}\det \begin{pmatrix}a_{11}&a_{12}&a_{13} \\ a_{21}&a_{22}&a_{23} \\ a_{31}&a_{32}&a_{33}\end{pmatrix}= a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{11}a_{23}a_{32} - a_{13} a_{22} a_{31} - a_{33}a_{12}a_{21}, \tag{4.4}\end{equation}\]

but we’ll see an easier way to find such determinants later (that works for \(n > 3\) too).

We can now show that there exist only one \(n\)-determinant function.

Theorem 4.9:

There exists one, and only one, \(n\)-determinant function.

Proof (Non-examinable).

We only give a sketch of the proof. Let us expand the vectors \(a_j\), \(j=1,\dots,n\), by writing them as a sum of multiples of the vectors \(e_1, \dots , e_n\)6: \[a_{j}=\sum_{i=1}^n a_{i j}e_{i} = \sum_{i_j=1}^n a_{i_j j}e_{i_j} .\] The reason for the second expression is bookkeeping: we will have to use different indices \(i_j\) in the expansion for every vector \(a_j\) involved. So the second subscript \(j\) is just indicating that this is the vector \(e_i\) in the expansion for \(a_j\).

Insert these expansions into \(d_n(a_1,a_2, \cdots, a_n)\). Doing this for \(a_1\) and using multilinearity gives \[d_n(a_1,a_2, \cdots, a_n)=d_n\bigg(\sum_{i_1=1}^n a_{i_1 1}e_{i_1} ,a_2, \cdots, a_n\bigg) =\sum_{i_1=1}^n a_{i_1 1}d_n(e_{i_1} ,a_2, \cdots, a_n) .\] Repeating the same step for \(a_2, a_3\), etc., gives us \[\begin{equation} d_n(a_1,a_2, \cdots, a_n)=\sum_{i_1=1}^n \sum_{i_2=1}^n\cdots \sum_{i_n=1}^na_{i_1 1}a_{i_2 2}\cdots a_{i_n n} d_n(e_{i_1} ,e_{i_2}, \cdots, e_{i_n}). \tag{4.5}\end{equation}\]

This formula tells us that the function \(d_n\) is determined by its value on the vectors \(e_1, \dots , e_n\). There are \(n^n\) choices of ordered \(n\)-tuples \((e_{i_1} ,e_{i_2}, \cdots, e_{i_n})\). By Proposition 4.4 whenever at least two of the vectors \(e_{i_1} ,e_{i_2}, \cdots, e_{i_n}\) are equal then \(d_n(e_{i_1} ,e_{i_2}, \cdots, e_{i_n})=0\). Hence, there are only at most \(n!\) non-zero terms in the sum coming from the number of different ways to reorder the indices \(1,2,\ldots,n\).

If the vectors \(e_{j_1},\ldots,e_{j_n}\) are all different, we can swap pairs of vectors and after finitely many steps we can reach \(e_1,\ldots,e_n\), picking up a \(-\) sign each time by (A), so if there are \(k\) swaps we get \(d_n(e_{j_1},\ldots,e_{j_n})=(-1)^k d_n(e_1,\ldots,e_n)=(-1)^k\) by (N). So we have determined the unique value of the sum in (4.5).

What we haven’t shown is existence: perhaps no such function \(d_n\) can be defined. We don’t prove this here as it uses a little group theory, namely permutations. The rearrangement of \(e_{j_1},\ldots,e_{j_n}\) to \(e_1,\ldots,e_n\) is a permutation \(\sigma\) of the set \(\{1,\ldots,n\}\). While the number of swaps \(k\) is not well-defined, its sign, which is the function \(\mathrm{sign}(\sigma)=(-1)^k\), is well-defined, and we can define the determinant using the Leibniz formula: \[d_n(a_{1} ,a_{2}, \cdots, a_{n}) := \sum_{\text{permutations } \sigma} \mathrm{sign} \sigma a_{\sigma(1)1}a_{\sigma(2)2}\cdots a_{\sigma(n)n}\ .\] Using group theory we can show that this function satisfies the three axioms.

Given this result, we define the determinant of a matrix by applying \(d_n\) to its column vectors \(a_1,\ldots,a_n\), so \[\begin{equation}\det A := d_n(a_{1} ,a_{2}, \cdots, a_{n}) . %= \cdots\\ a_{n1}\; \cdots \; a_{nn}\epm := \sum_{\sigma\in S_n} \mathrm{sign} \sigma a_{\sigma(1)1}a_{\sigma(2)2}\cdots a_{\sigma(n)n} = \sum_{\sigma\in S_n} \mathrm{sign} \sigma \prod_{j=1}^n a_{\sigma(j)j}. \tag{4.6}\end{equation}\]

From now on we have two equivalent notations for the determinant function: \(\det\) and \(d_n\). We will mostly use the former one, as far as square matrices are concerned, but may return to \(d_n\) when we view the determinant as the function of the columns of the matrix.

In fact, the argument from the proof of Theorem 4.9 shows more than just the uniqueness of the determinant. It also shows the more general theorem below, which tells us that if we have a function which is multilinear and alternating then it it will scale the determinant function based on the value of the function on the identity matrix.

Theorem 4.10:

Let \(f_n\) be a real-valued function of \(n\) vectors \(a_1,a_2, \cdots, a_n\in \mathbb{R}^n\), satisfying just (ML) and (A). Then \[f_n(a_1,a_2, \cdots, a_n) = C\cdot \det A,\] for some constant \(C= f_n(e_1, \cdots, e_n)\) and \(A\) being the matrix whose columns are \(a_1,a_2, \cdots, a_n\).

Let us now continue with computing some determinants. We learned in (4.3) that the determinant of a diagonal matrix is just the product of the diagonal elements. The same is true for upper triangular matrices.

Theorem 4.11:

Let \(A=(a_{ij})\in M_n(\mathbb{R})\) be upper triangular, i.e., \(a_{ij}=0\) if \(i>j\), and let \(a_1, a_2,\cdots, a_n\) be the column vectors of \(A\). Then we have \[\det A=a_{11}a_{22}\cdots a_{nn} ,\] i.e., the determinant is the product of the diagonal elements.

Proof.

Let us first assume that all the diagonal elements \(a_{ii}\) are nonzero. The matrix \(A\) is of the form \[A=\begin{pmatrix} a_{11} & a_{12} & a_{13} &\cdots & a_{1n}\\ 0 & a_{22} & a_{23} &\cdots & a_{2n}\\ 0 & 0 & a_{33} & \cdots & a_{3n}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn}\end{pmatrix}.\] In the first step we subtract multiples of column 1 from the other columns to remove the entries in the first row, so \(a_2-a_{12}/a_{11} a_1\), \(a_3-a_{13}/a_{11} a_1\), etc. By Corollary 4.5 these operations do not change the determinant and hence we have \[\det A=\det \begin{pmatrix} a_{11} &0 & 0 &\cdots & 0\\ 0 & a_{22} & a_{23} &\cdots & a_{2n}\\ 0 & 0 & a_{33} & \cdots & a_{3n}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn} \end{pmatrix}.\] In the next step we repeat the same procedure with the second row, i.e., subtract suitable multiples of the second column from the other columns, and then we continue with the third row, etc. At the end we arrive at a diagonal matrix and then by (4.3) \[\det A=\det \begin{pmatrix} a_{11} &0 & 0 &\cdots & 0\\ 0 & a_{22} & 0 &\cdots & 0\\ 0 & 0 & a_{33} & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn} \end{pmatrix}=a_{11}a_{22} a_{33} \cdots a_{nn} .\] If one of the diagonal matrix elements is \(0\), then we can follow the same procedure until we arrive at the first column where the diagonal element is \(0\). But this column will be entirely \(0\) then and so by Proposition 4.4 the determinant is \(0\).

Example 4.12:
We have that \(\det \begin{pmatrix}1 & 4 & 7\\ 0 & 1 & 3 \\ 0 & 0 & -2\end{pmatrix}=-2.\)

Let us now collect a few important properties of the determinant.

Theorem 4.13:

Let \(A\) and \(B\) be \(n\times n\) matrices, then \[\det(AB)=\det(A)\det(B) .\]

Proof.

The theorem follows from Theorem 4.10. Let \(B=(b_1\; \ldots\;b_n)\). The columns of \(AB\) are \(Ab_1,\ldots,Ab_n\). So \(\det(AB) = d_n(Ab_1,\ldots, Ab_n)\), where the proven to be unique function \(d_n\) satisfies the axioms (ML), (A), and (N).

Now consider \(\det(AB)\) as a function of just \(b_1,\ldots,b_n\), that is \(\det(AB)=f_n(b_1,\ldots,b_n),\) for some function \(f_n\). This \(f_n\) satisfies (ML), because multiplication on the left by \(A\) is linear in \(b_j\), as well as (A), for swapping \(b_i\) and \(b_j\) we swap \(Ab_i\) and \(Ab_j\). So by Theorem 4.10 the value of this function equals \(\det B\) times some constant \(C\), where \(C\) is defined by equating \(b_j=e_j\), for \(j=1,\ldots,n\). But in the latter case we get \(Ae_1 = a_1, \ldots,Ae_n=a_n\). In other words, if \(b_j=e_j\), for \(j=1,\ldots,n\), then \(\det (AB)=\det(AI)=\det A\). Hence, \(C=\det A\), and \(\det(AB)=\det(A)\det(B)\).

This multiplicative property of determinants is very important. Note that generally \(AB\neq BA\), however according to this theorem we do have that \(\det(AB)=\det(BA)= \det(A)\det(B).\)

Exercise 4.14:
Is the determinant linear? Must we have \(\det(A+B)\neq \det A+\det B?\)
Click for solution

The determinant is not linear, so in general \[\det(A+B)\neq \det A+\det B.\] For example if \(A=B=I_2\) then \(\det A=\det B=1\) but \(\det(A+B)=4\).

 

One of the consequences of Theorem 4.13 is that if \(A\) is invertible, i.e., there exists an \(A^{-1}\) such that \(A^{-1}A=I\), then \(\det A^{-1} \det A=1\), and hence \(\det A\neq 0\) and \[\det A^{-1}=\frac{1}{\det A} .\] So if \(A\) is invertible, then \(\det A\neq 0\). This is an important result, and one we will return to later in Chapter 6.

Finally, we have the following theorem, which we will not prove.

Theorem 4.15:

Let \(A\) be an \(n\times n\) matrix, then \[\det A=\det A^t.\]

Let us comment on the meaning of this result. We defined the determinant of a matrix in two steps, we first defined the determinant function \(d_n(a_1,a_2,\cdots,a_n)\) as a function of \(n\) vectors, and then we related it to a matrix \(A\) by choosing for \(a_1,a_2,\cdots,a_n\) the column vectors of \(A\). We could have instead chosen the row vectors of \(A\), that would have been an alternative definition of a determinant. The theorem tells us that both ways we get the same result.

Properties (ML) and (A) from the basic Definition 4.2 tell us what happens to determinants if we manipulate the columns by linear operations, in particular they tell us what happens if we apply elementary column operations to the matrix. But using \(\det A^t=\det A\) we get the same properties for elementary row operations, as the following theorem states.

Theorem 4.16:

Let \(A\) be an \(n\times n\) matrix, then we have:

  • If \(A'\) is obtained from \(A\) by exchanging two rows, then \(\det A'=-\det A\) and if \(E\) is the elementary matrix corresponding to the row exchange, then \(\det E=-1\).

  • If \(A'\) is obtained from \(A\) by adding \(\lambda\) times row \(j\) to row \(i\), (\(i\neq j\)), then \(\det A =\det A'\) and the corresponding elementary matrix satisfies \(\det E=1\).

  • If \(A'\) is obtained from \(A\) by multiplying row \(i\) by \(\lambda\in \mathbb{R}\), then \(\det A'=\lambda\det A\) and the corresponding elementary matrix satisfies \(\det E=\lambda\).

An interesting consequence of this result is that it shows, independently from Theorem 4.13, that \(\det EA=\det E\det A\) for any elementary matrix. We see that by just computing the left and the right hand sides for each case in Theorem 4.16. This observation can be used to give a different proof of the multiplicative property \(\det (AB)=\det A\det B\); the main idea is to write \(A\) as a product of elementary matrices, which turns out to be possible if \(A\) is non-singular, and then use that we have the multiplicative property for elementary matrices.

Similarly, the results of Proposition 4.4 is true for rows as well.


  1. One could instead choose row vectors to define a determinant; both approaches give the same result. This is later expressed in the theorem that \(\det A=\det A^t\).↩︎

  2. Elementary column operations are defined the same way as elementary row operations. They can also be viewed as matrix multiplication by elementary matrices, but on the right, rather than on the left.↩︎

  3. This is known as a linear combination, an idea we will explore further in Chapters 5 and 6. ↩︎