4.1 Definition and basic properties

To begin with, we will use the axiomatic approach to define the determinant of a 2\times 2 matrix, and show that it gives the formula (4.1). We will generalise this to n\times n matrices later.

We will write the determinant as a function of the column vectors of a matrix4 which will take the column vectors as input and output a real number. Recall that for the 2\times 2 matrix A=\begin{pmatrix}a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} the two column vectors are a_1=\begin{pmatrix}a_{11}\\a_{21}\end{pmatrix} and a_2=\begin{pmatrix}a_{12}\\ a_{22}\end{pmatrix}.

Definition 4.1: (2-determinant function)

An 2-determinant function d_2(a_1,a_2) is a function d_2 :\mathbb{R}^2\times \mathbb{R}^2\to\mathbb{R}, which satisfies the following three conditions:

  • Multilinearity: The function is linear in each argument, that is

    • d_2(\lambda a_1+\mu b_1,a_2)=\lambda d_2(a_1,a_2)+\mu d_2(b_1,a_2) for all \lambda,\mu\in\mathbb{R} and a_1,a_2,b_1\in \mathbb{R}^2, and

    • d_2(a_1,\lambda a_2+\mu b_2)=\lambda d_2(a_1,a_2)+\mu d_2(a_1,b_2) for all \lambda,\mu\in\mathbb{R} and a_1,a_2,b_2\in \mathbb{R}^2.

  • Alternating: The function is antisymmetric under exchange of arguments, so d_2(a_2,a_1)=-d_2(a_1,a_2) for all a_1,a_2\in \mathbb{R}^2

  • Normalisation: d_2(e_1,e_2)=1.

These three conditions prescribe what happens to the determinant if we manipulate the columns of a matrix, e.g., (A) says that exchanging columns changes the sign. In particular we can rewrite (A) as d_2(a_1,a_2)+d_2(a_2,a_1)=0 , and so if a_1=a_2=a, then \begin{equation} d_2(a,a)=0. \tag{4.2}\end{equation} That means if the two columns in a matrix are equal, then the determinant is 0.

The first condition can be used to find out how a determinant function behaves under elementary column operations on the matrix5. Say if we multiply column 1 by \lambda, then d_2(\lambda a_1,a_2)=\lambda d_2(a_1,a_2) , and if we add \lambda times column 2 to column 1 we get d_2(a_1+\lambda a_2,a_2)=d_2(a_1,a_2)+\lambda d_2(a_2,a_2)=d_2(a_1,a_2) , by (4.2).

Now let us see how much the conditions in the definition restrict the function d_2. If we write a_1=a_{11}e_1+a_{21}e_2 and a_2=a_{12}e_1+a_{22}e_2, then we can use multilinearity to obtain \begin{split} d_2(a_1,a_2)&=d_2(a_{11}e_1+a_{21}e_2,a_{12}e_1+a_{22}e_2)\\ &=a_{11}d_2(e_1,a_{12}e_1+a_{22}e_2)+a_{21}d_2(e_2,a_{12}e_1+a_{22}e_2)\\ &=a_{11}a_{12}d_2(e_1,e_1)+a_{11}a_{22}d_2(e_1,e_2) +a_{21}a_{12}d_2(e_2,e_1)+a_{21}a_{22}d_2(e_2,e_2) . \end{split} This means that the function is completely determined by its values on the vectors e_i. Now (4.2) implies that d_2(e_1,e_1)=d_2(e_2,e_2)=0 , and by antisymmetry d_2(e_2,e_1)=-d_2(e_1,e_2), hence d_2(a_1,a_2)=(a_{11}a_{22}-a_{21}a_{12})d_{2}(e_1,e_2) . Finally the normalisation d_2(e_1,e_2)=1 means that d_2 is actually uniquely determined and d_2(a_1,a_2)=a_{11}a_{22}-a_{21}a_{12} =\det A, as it was defined by formula (4.1). Let us also note that if we just invoke the axioms (ML), (A), we get d_2(a_1,a_2) = \det A \cdot d_{2}(e_1,e_2), that such is a function d_2, without the (N) axiom equals \det A times a number which is d_{2}(e_1,e_2), its value on the identity matrix.

But with all the three axioms in play, there is only one determinant function d_2 satisfying the three properties (ML), (A), (N), and it coincides with the expression (4.1), i.e., d_2 (a_1,a_2)=a_{11}a_{22}-a_{21}a_{12}=\det \begin{pmatrix}a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix}.

The determinant was originally not introduced this way, it emerged from the study of systems of linear equation as a combination of coefficients which seemed to be indicative of solvability.

The conditions in the definition probably seem to be a bit ad hoc. If we think of d_2(a,b) as a “signed area” they seem less strange: consider the area of the parallelogram with sides a, b, with sign + if the direction of b is obtained from that of a by rotating up to \pi radians anti-clockwise, and with sign - otherwise. So for e_1, e_2 we get a signed area of 1 (this is property (N)), while for e_1, -e_2 we get a signed area of -1. Swapping a,b to b,a switches the sign of the signed area, giving (A). Finally, on drawing a picture it is easy to see that a, b and a, b+\lambda a give the same signed areas, and (ML) can be shown similarly.

We extend the definition now to n\times n matrices:

Definition 4.2: (n-determinant function)

An n-determinant d_n(a_1,a_2,\cdots, a_n) is a function d_n :\underbrace{\mathbb{R}^n\times \mathbb{R}^n\times \cdots \times \mathbb{R}^n}_{n\text{ times}}\to \mathbb{R}, which satisfies

  • Multilinearity: The function is linear in each column, that is for any j and any a_j,b_j\in\mathbb{R}^n, \lambda,\mu\in\mathbb{R} d_n(\cdots , \lambda a_j+\mu b_j,\cdots)=\lambda d_n(\cdots , a_j,\cdots) + \mu d_n(\cdots , b_j,\cdots) , where the \cdots mean the other n-1 vectors stay fixed.

  • Alternating: The function is antisymmetric in each pair of arguments, that is whenever we exchange two vectors we pick up a factor -1, so if i\neq j then d_n(\cdots, a_i,\cdots,a_j,\cdots)=-d_n(\cdots, a_j,\cdots,a_i,\cdots) .

  • Normalisation: d_n(e_1,e_2,\cdots,e_n)=1.

We will sometimes call these three properties the axioms of the determinant. We have formulated the determinant function as function of vectors; to connect it to matrices we take these vectors to be the column vectors of a matrix. The properties (ML) and (A) then correspond to column operations in the same way as we discussed after the definition of a 2-determinant. The property (N) means that the unit matrix has determinant 1.

Before proceeding to the proof that there is only one n-determinant let us consider some properties of the determinant.

Exercise 4.3:
Let A \in M_n(\mathbb{R}) and \lambda\in \mathbb{R}. Does d_n(\lambda A)=\lambda d_n(A)?

We next consider a generalisation of (4.2).

Proposition 4.4:

Let d_n(a_1,a_2,\cdots, a_n) be an n-determinant, then

  1. whenever one of the vectors a_1,a_2,\cdots ,a_n is \mathbf{0} then d_n(a_1,a_2,\cdots, a_n)=0.

  2. whenever two of the vectors a_1,a_2,\cdots ,a_n are equal, then d_n(a_1,a_2,\cdots, a_n)=0.

  </p></div>\EndKnitrBlock{proposition}
Proof.

  1. We use multilinearity. We have for any a_j that d_n(\cdots ,\lambda a_j, \cdots)=\lambda d_n(\cdots ,a_j, \cdots) for any \lambda\in \mathbb{R}, and setting \lambda=0 gives d_n(\cdots ,\mathbf{0},\cdots )=0.

  2. We rewrite condition (A) in the definition as d_n(\cdots, a_i,\cdots,a_j,\cdots)+d_n(\cdots, a_j,\cdots,a_i,\cdots)=0 , and so if a_i=a_j, then 2d_n(\cdots, a_i,\cdots,a_j,\cdots)=0.

\;

As a direct consequence we obtain the following useful property: we can add to a column any multiple of one of the other columns without changing the value of the determinant function.

Corollary 4.5:

We have for any j\neq i and \lambda\in\mathbb{R} that d_n(a_1, \cdots , a_i+\lambda a_j, \cdots a_n)=d_n(a_1, \cdots , a_i, \cdots a_n) .

Proof.

By multilinearity we have d_n(a_1, \cdots , a_i+\lambda a_j, \cdots a_n)=d_n(a_1, \cdots , a_i, \cdots a_n)+\lambda d_n(a_1, \cdots , a_j, \cdots a_n), but in the second term two of the vectors in the arguments are the same, hence the term is 0.

Exercise 4.6:

What effect will the elementary column operations

  1. swapping two columns,

  2. multiplying a column by \lambda \in \mathbb{R},

  3. adding a multiple of one column to another

have on the determinant of a matrix?

Click for solution

We have that

  1. alters the sign of the value of d_n,

  2. multiplies the determinant by \lambda, and

  3. does not checange the determinant.

 

We can now compute the determinant function for some special classes of matrices.

Example 4.7:
For diagonal matrices, i.e, A=(a_{ij}) with a_{ij}=0 if i\neq j, the columns are a_1=a_{11}e_1, a_2=a_{22}e_2, \cdots , a_n=a_{nn}e_n and using multilinearity in each argument and normalisation we get \begin{equation}\begin{split} d_n(a_{11}e_1, a_{22}e_2, \cdots , a_{nn}e_n)&=a_{11}d_n(e_1, a_{22}e_2, \cdots , a_{nn}e_n)\\ &=a_{11}a_{22}d_n(e_1, e_2, \cdots , a_{nn}e_n)\\ &=a_{11}a_{22}\cdots a_{nn}d_n(e_1, e_2, \cdots , e_n) =a_{11}a_{22}\cdots a_{nn} \end{split}\tag{4.3}\end{equation}
Example 4.8:

Using the properties of a determinant function we know by now we can actually already compute them, though not in a very efficient way. If we repeat what we’ve done for d_2 in the case of 3 vectors in \mathbb{R}^3 we get the slightly cumbersome formula

\begin{equation}\det \begin{pmatrix}a_{11}&a_{12}&a_{13} \\ a_{21}&a_{22}&a_{23} \\ a_{31}&a_{32}&a_{33}\end{pmatrix}= a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{11}a_{23}a_{32} - a_{13} a_{22} a_{31} - a_{33}a_{12}a_{21}, \tag{4.4}\end{equation}

but we’ll see an easier way to find such determinants later (that works for n > 3 too).

We can now show that there exist only one n-determinant function.

Theorem 4.9:

There exists one, and only one, n-determinant function.

Proof (Non-examinable).

We only give a sketch of the proof. Let us expand the vectors a_j, j=1,\dots,n, by writing them as a sum of multiples of the vectors e_1, \dots , e_n6: a_{j}=\sum_{i=1}^n a_{i j}e_{i} = \sum_{i_j=1}^n a_{i_j j}e_{i_j} . The reason for the second expression is bookkeeping: we will have to use different indices i_j in the expansion for every vector a_j involved. So the second subscript j is just indicating that this is the vector e_i in the expansion for a_j.

Insert these expansions into d_n(a_1,a_2, \cdots, a_n). Doing this for a_1 and using multilinearity gives d_n(a_1,a_2, \cdots, a_n)=d_n\bigg(\sum_{i_1=1}^n a_{i_1 1}e_{i_1} ,a_2, \cdots, a_n\bigg) =\sum_{i_1=1}^n a_{i_1 1}d_n(e_{i_1} ,a_2, \cdots, a_n) . Repeating the same step for a_2, a_3, etc., gives us \begin{equation} d_n(a_1,a_2, \cdots, a_n)=\sum_{i_1=1}^n \sum_{i_2=1}^n\cdots \sum_{i_n=1}^na_{i_1 1}a_{i_2 2}\cdots a_{i_n n} d_n(e_{i_1} ,e_{i_2}, \cdots, e_{i_n}). \tag{4.5}\end{equation}

This formula tells us that the function d_n is determined by its value on the vectors e_1, \dots , e_n. There are n^n choices of ordered n-tuples (e_{i_1} ,e_{i_2}, \cdots, e_{i_n}). By Proposition 4.4 whenever at least two of the vectors e_{i_1} ,e_{i_2}, \cdots, e_{i_n} are equal then d_n(e_{i_1} ,e_{i_2}, \cdots, e_{i_n})=0. Hence, there are only at most n! non-zero terms in the sum coming from the number of different ways to reorder the indices 1,2,\ldots,n.

If the vectors e_{j_1},\ldots,e_{j_n} are all different, we can swap pairs of vectors and after finitely many steps we can reach e_1,\ldots,e_n, picking up a - sign each time by (A), so if there are k swaps we get d_n(e_{j_1},\ldots,e_{j_n})=(-1)^k d_n(e_1,\ldots,e_n)=(-1)^k by (N). So we have determined the unique value of the sum in (4.5).

What we haven’t shown is existence: perhaps no such function d_n can be defined. We don’t prove this here as it uses a little group theory, namely permutations. The rearrangement of e_{j_1},\ldots,e_{j_n} to e_1,\ldots,e_n is a permutation \sigma of the set \{1,\ldots,n\}. While the number of swaps k is not well-defined, its sign, which is the function \mathrm{sign}(\sigma)=(-1)^k, is well-defined, and we can define the determinant using the Leibniz formula: d_n(a_{1} ,a_{2}, \cdots, a_{n}) := \sum_{\text{permutations } \sigma} \mathrm{sign} \sigma a_{\sigma(1)1}a_{\sigma(2)2}\cdots a_{\sigma(n)n}\ . Using group theory we can show that this function satisfies the three axioms.

Given this result, we define the determinant of a matrix by applying d_n to its column vectors a_1,\ldots,a_n, so \begin{equation}\det A := d_n(a_{1} ,a_{2}, \cdots, a_{n}) . %= \cdots\\ a_{n1}\; \cdots \; a_{nn}\epm := \sum_{\sigma\in S_n} \mathrm{sign} \sigma a_{\sigma(1)1}a_{\sigma(2)2}\cdots a_{\sigma(n)n} = \sum_{\sigma\in S_n} \mathrm{sign} \sigma \prod_{j=1}^n a_{\sigma(j)j}. \tag{4.6}\end{equation}

From now on we have two equivalent notations for the determinant function: \det and d_n. We will mostly use the former one, as far as square matrices are concerned, but may return to d_n when we view the determinant as the function of the columns of the matrix.

In fact, the argument from the proof of Theorem 4.9 shows more than just the uniqueness of the determinant. It also shows the more general theorem below, which tells us that if we have a function which is multilinear and alternating then it it will scale the determinant function based on the value of the function on the identity matrix.

Theorem 4.10:

Let f_n be a real-valued function of n vectors a_1,a_2, \cdots, a_n\in \mathbb{R}^n, satisfying just (ML) and (A). Then f_n(a_1,a_2, \cdots, a_n) = C\cdot \det A, for some constant C= f_n(e_1, \cdots, e_n) and A being the matrix whose columns are a_1,a_2, \cdots, a_n.

Let us now continue with computing some determinants. We learned in (4.3) that the determinant of a diagonal matrix is just the product of the diagonal elements. The same is true for upper triangular matrices.

Theorem 4.11:

Let A=(a_{ij})\in M_n(\mathbb{R}) be upper triangular, i.e., a_{ij}=0 if i>j, and let a_1, a_2,\cdots, a_n be the column vectors of A. Then we have \det A=a_{11}a_{22}\cdots a_{nn} , i.e., the determinant is the product of the diagonal elements.

Proof.

Let us first assume that all the diagonal elements a_{ii} are nonzero. The matrix A is of the form A=\begin{pmatrix} a_{11} & a_{12} & a_{13} &\cdots & a_{1n}\\ 0 & a_{22} & a_{23} &\cdots & a_{2n}\\ 0 & 0 & a_{33} & \cdots & a_{3n}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn}\end{pmatrix}. In the first step we subtract multiples of column 1 from the other columns to remove the entries in the first row, so a_2-a_{12}/a_{11} a_1, a_3-a_{13}/a_{11} a_1, etc. By Corollary 4.5 these operations do not change the determinant and hence we have \det A=\det \begin{pmatrix} a_{11} &0 & 0 &\cdots & 0\\ 0 & a_{22} & a_{23} &\cdots & a_{2n}\\ 0 & 0 & a_{33} & \cdots & a_{3n}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn} \end{pmatrix}. In the next step we repeat the same procedure with the second row, i.e., subtract suitable multiples of the second column from the other columns, and then we continue with the third row, etc. At the end we arrive at a diagonal matrix and then by (4.3) \det A=\det \begin{pmatrix} a_{11} &0 & 0 &\cdots & 0\\ 0 & a_{22} & 0 &\cdots & 0\\ 0 & 0 & a_{33} & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0& \cdots & a_{nn} \end{pmatrix}=a_{11}a_{22} a_{33} \cdots a_{nn} . If one of the diagonal matrix elements is 0, then we can follow the same procedure until we arrive at the first column where the diagonal element is 0. But this column will be entirely 0 then and so by Proposition 4.4 the determinant is 0.

Example 4.12:
We have that \det \begin{pmatrix}1 & 4 & 7\\ 0 & 1 & 3 \\ 0 & 0 & -2\end{pmatrix}=-2.

Let us now collect a few important properties of the determinant.

Theorem 4.13:

Let A and B be n\times n matrices, then \det(AB)=\det(A)\det(B) .

Proof.

The theorem follows from Theorem 4.10. Let B=(b_1\; \ldots\;b_n). The columns of AB are Ab_1,\ldots,Ab_n. So \det(AB) = d_n(Ab_1,\ldots, Ab_n), where the proven to be unique function d_n satisfies the axioms (ML), (A), and (N).

Now consider \det(AB) as a function of just b_1,\ldots,b_n, that is \det(AB)=f_n(b_1,\ldots,b_n), for some function f_n. This f_n satisfies (ML), because multiplication on the left by A is linear in b_j, as well as (A), for swapping b_i and b_j we swap Ab_i and Ab_j. So by Theorem 4.10 the value of this function equals \det B times some constant C, where C is defined by equating b_j=e_j, for j=1,\ldots,n. But in the latter case we get Ae_1 = a_1, \ldots,Ae_n=a_n. In other words, if b_j=e_j, for j=1,\ldots,n, then \det (AB)=\det(AI)=\det A. Hence, C=\det A, and \det(AB)=\det(A)\det(B).

This multiplicative property of determinants is very important. Note that generally AB\neq BA, however according to this theorem we do have that \det(AB)=\det(BA)= \det(A)\det(B).

Exercise 4.14:
Is the determinant linear? Must we have \det(A+B)\neq \det A+\det B?
Click for solution

The determinant is not linear, so in general \det(A+B)\neq \det A+\det B. For example if A=B=I_2 then \det A=\det B=1 but \det(A+B)=4.

 

One of the consequences of Theorem 4.13 is that if A is invertible, i.e., there exists an A^{-1} such that A^{-1}A=I, then \det A^{-1} \det A=1, and hence \det A\neq 0 and \det A^{-1}=\frac{1}{\det A} . So if A is invertible, then \det A\neq 0. This is an important result, and one we will return to later in Chapter 6.

Finally, we have the following theorem, which we will not prove.

Theorem 4.15:

Let A be an n\times n matrix, then \det A=\det A^t.

Let us comment on the meaning of this result. We defined the determinant of a matrix in two steps, we first defined the determinant function d_n(a_1,a_2,\cdots,a_n) as a function of n vectors, and then we related it to a matrix A by choosing for a_1,a_2,\cdots,a_n the column vectors of A. We could have instead chosen the row vectors of A, that would have been an alternative definition of a determinant. The theorem tells us that both ways we get the same result.

Properties (ML) and (A) from the basic Definition 4.2 tell us what happens to determinants if we manipulate the columns by linear operations, in particular they tell us what happens if we apply elementary column operations to the matrix. But using \det A^t=\det A we get the same properties for elementary row operations, as the following theorem states.

Theorem 4.16:

Let A be an n\times n matrix, then we have:

  • If A' is obtained from A by exchanging two rows, then \det A'=-\det A and if E is the elementary matrix corresponding to the row exchange, then \det E=-1.

  • If A' is obtained from A by adding \lambda times row j to row i, (i\neq j), then \det A =\det A' and the corresponding elementary matrix satisfies \det E=1.

  • If A' is obtained from A by multiplying row i by \lambda\in \mathbb{R}, then \det A'=\lambda\det A and the corresponding elementary matrix satisfies \det E=\lambda.

An interesting consequence of this result is that it shows, independently from Theorem 4.13, that \det EA=\det E\det A for any elementary matrix. We see that by just computing the left and the right hand sides for each case in Theorem 4.16. This observation can be used to give a different proof of the multiplicative property \det (AB)=\det A\det B; the main idea is to write A as a product of elementary matrices, which turns out to be possible if A is non-singular, and then use that we have the multiplicative property for elementary matrices.

Similarly, the results of Proposition 4.4 is true for rows as well.


  1. One could instead choose row vectors to define a determinant; both approaches give the same result. This is later expressed in the theorem that \det A=\det A^t.↩︎

  2. Elementary column operations are defined the same way as elementary row operations. They can also be viewed as matrix multiplication by elementary matrices, but on the right, rather than on the left.↩︎

  3. This is known as a linear combination, an idea we will explore further in Chapters 5 and 6. ↩︎