2.1 Matrix Theory

\[ A= \left[ \begin{array} {cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right] \]

\[ A' = \left[ \begin{array} {cc} a_{11} & a_{21} \\ a_{12} & a_{22} \end{array} \right] \]

\[ \begin{aligned} \mathbf{(ABC)'} & = \mathbf{C'B'A'} \\ \mathbf{A(B+C)} & = \mathbf{AB + AC} \\ \mathbf{AB} & \neq \mathbf{BA} \\ \mathbf{(A')'} & = \mathbf{A} \\ \mathbf{(A+B)'} & = \mathbf{A' + B'} \\ \mathbf{(AB)'} & = \mathbf{B'A'} \\ \mathbf{(AB)^{-1}} & = \mathbf{B^{-1}A^{-1}} \\ \mathbf{A+B} & = \mathbf{B +A} \\ \mathbf{AA^{-1}} & = \mathbf{I} \end{aligned} \]

If A has an inverse, it is called invertible. If A is not invertible it is called singular.

\[ \begin{aligned} \mathbf{A} &= \left(\begin{array} {ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ \end{array}\right) \left(\begin{array} {ccc} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \\ \end{array}\right) \\ &= \left(\begin{array} {ccc} a_{11}b_{11}+a_{12}b_{21}+a_{13}b_{31} & \sum_{i=1}^{3}a_{1i}b_{i2} & \sum_{i=1}^{3}a_{1i}b_{i3} \\ \sum_{i=1}^{3}a_{2i}b_{i1} & \sum_{i=1}^{3}a_{2i}b_{i2} & \sum_{i=1}^{3}a_{2i}b_{i3} \\ \end{array}\right) \end{aligned} \]

Let \(\mathbf{a}\) be a \(3 \times 1\) vector, then the quadratic form is

\[ \mathbf{a'Ba} = \sum_{i=1}^{3}\sum_{i=1}^{3}a_i b_{ij} a_{j} \]

Length of a vector
Let \(\mathbf{a}\) be a vector, \(||\mathbf{a}||\) (the 2-norm of the vector) is the length of vector \(\mathbf{a}\), is the square root of the inner product of the vector with itself:

\[ ||\mathbf{a}|| = \sqrt{\mathbf{a'a}} \]

2.1.1 Rank

  • Dimension of space spanned by its columns (or its rows).
  • Number of linearly independent columns/rows

For a \(n \times k\) matrix A and \(k \times k\) matrix B

  • \(rank(A)\leq min(n,k)\)
  • \(rank(A) = rank(A') = rank(A'A)=rank(AA')\)
  • \(rank(AB)=min(rank(A),rank(B))\)
  • B is invertible if and only if \(rank(B) = k\) (non-singular)

2.1.2 Inverse

In scalar, \(a = 0\) then \(1/a\) does not exist.

In matrix, a matrix is invertible when it’s a non-zero matrix.

A non-singular square matrix \(\mathbf{A}\) is invertible if there exists a non-singular square matrix \(\mathbf{B}\) such that, \[AB=I\] Then \(A^{-1}=B\). For a \(2\times2\) matrix,

\[ A = \left(\begin{array}{cc} a & b \\ c & d \\ \end{array} \right) \]

\[ A^{-1}= \frac{1}{ad-bc} \left(\begin{array}{cc} d & -b \\ -c & a \\ \end{array} \right) \]

For the partition matrix,

\[ \left[\begin{array} {cc} A & B \\ C & D \\ \end{array} \right]^{-1} = \left[\begin{array} {cc} \mathbf{(A-BD^{-1}C)^{-1}} & \mathbf{-(A-BD^{-1}C)^{-1}BD^-1} \\ \mathbf{-DC(A-BD^{-1}C)^{-1}} & \mathbf{D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1}} \end{array} \right] \]

Properties for a non-singular square matrix

  • \(\mathbf{A^{-1}}=A\)
  • for a non-zero scalar b, \(\mathbf{(bA)^{-1}=b^{-1}A^{-1}}\)
  • for a matrix B, \(\mathbf(BA)^{-1}=B^{-1}A^{-1}\) only if \(\mathbf{B}\) is non-singular
  • \(\mathbf{(A^{-1})'=(A')^{-1}}\)
  • Never notate \(\mathbf{1/A}\)

2.1.3 Definiteness

A symmetric square k x k matrix, \(\mathbf{A}\), is Positive Semi-Definite if for any non-zero \(k \times 1\) vector \(\mathbf{x}\), \[\mathbf{x'Ax \geq 0 }\]

A symmetric square k x k matrix, \(\mathbf{A}\), is Negative Semi-Definite if for any non-zero \(k \times 1\) vector \(\mathbf{x}\) \[\mathbf{x'Ax \leq 0 }\]

\(\mathbf{A}\) is indefinite if it is neither positive semi-definite or negative semi-definite.

The identity matrix is positive definite

Example Let \(\mathbf{x} =(x_1 x_2)'\), then for a \(2 \times 2\) identity matrix,

\[ \begin{aligned} \mathbf{x'Ix} &= (x_1 x_2) \left(\begin{array} {cc} 1 & 0 \\ 0 & 1 \\ \end{array} \right) \left(\begin{array}{c} x_1 \\ x_2 \\ \end{array} \right) \\ &= (x_1 x_2) \left(\begin{array} {c} x_1 \\ x_2 \\ \end{array} \right) \\ &= x_1^2 + x_2^2 >0 \end{aligned} \]

Definiteness gives us the ability to compare matrices \(\mathbf{A-B}\) is PSD

This property also helps us show efficiency (which variance covariance matrix of one estimator is smaller than another)

Properties

  • any variance matrix is PSD
  • a matrix \(\mathbf{A}\) is PSD if and only if there exists a matrix \(\mathbf{B}\) such that \(\mathbf{A=B'B}\)
  • if \(\mathbf{A}\) is PSD, then \(\mathbf{B'AB}\) is PSD
  • if \(\mathbf{A}\) and \(\mathbf{C}\) are non-singular, then \(\mathbf{A-C}\) is PSD if and only if \(\mathbf{C^{-1}-A^{-1}}\)
  • if \(\mathbf{A}\) is PD (ND) then \(A^{-1}\) is PD (ND)

Note

  • Indefinite \(\mathbf{A}\) is neither PSD nor NSD. There is no comparable concept in scalar.
  • If a square matrix is PSD and invertible then it is PD

Example:

  1. Invertible / Indefinite

\[ \left[ \begin{array} {cc} -1 & 0 \\ 0 & 10 \end{array} \right] \]

  1. Non-invertible/ Indefinite

\[ \left[ \begin{array} {cc} 0 & 1 \\ 0 & 0 \end{array} \right] \]

  1. Invertible / PSD

\[ \left[ \begin{array} {cc} 1 & 0 \\ 0 & 1 \end{array} \right] \]

  1. Non-Invertible / PSD

\[ \left[ \begin{array} {cc} 0 & 0 \\ 0 & 1 \end{array} \right] \]

2.1.4 Matrix Calculus

\(y=f(x_1,x_2,...,x_k)=f(x)\) where \(x\) is a \(1 \times k\) row vector.

The Gradient (first order derivative with respect to a vector) is,

\[ \frac{\partial{f(x)}}{\partial{x}}= \left(\begin{array}{c} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \\ \dots \\ \frac{\partial{f(x)}}{\partial{x_k}} \end{array} \right) \]

The Hessian (second order derivative with respect to a vector) is,

\[ \frac{\partial^2{f(x)}}{\partial{x}\partial{x'}}= \left(\begin{array} {cccc} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_1}} & \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_2}} & \dots & \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_k}} \\ \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_2}} & \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_2}} & \dots & \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_k}} \\ \dots & \dots & \ddots & \dots\\ \frac{\partial^2{f(x)}}{\partial{x_k}\partial{x_1}} & \frac{\partial^2{f(x)}}{\partial{x_k}\partial{x_2}} & \dots & \frac{\partial^2{f(x)}}{\partial{x_k}\partial{x_k}} \end{array} \right) \]

Define the derivative of \(f(\mathbf{X})\) with respect to \(\mathbf{X}_{(n \times p)}\) as the matrix

\[ \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} = (\frac{\partial f(\mathbf{X})}{\partial x_{ij}}) \]

Define \(\mathbf{a}\) to be a vector and \(\mathbf{A}\) to be a matrix which does not depend upon \(\mathbf{y}\). Then

\[ \frac{\partial \mathbf{a'y}}{\partial \mathbf{y}} = \mathbf{a} \]

\[ \frac{\partial \mathbf{y'y}}{\partial \mathbf{y}} = 2\mathbf{y} \]

\[ \frac{\partial \mathbf{y'Ay}}{\partial \mathbf{y}} = \mathbf{(A + A')y} \]

If \(\mathbf{X}\) is a symmetric matrix then

\[ \frac{\partial |\mathbf{X}|}{\partial x_{ij}} = \begin{cases} X_{ii}, i = j \\ X_{ij}, i \neq j \end{cases} \]

where \(X_{ij}\) is the \((i,j)\)-th cofactor of \(\mathbf{X}\)

If \(\mathbf{X}\) is symmetric and \(\mathbf{A}\) is a matrix which does not depend upon \(\mathbf{X}\) then

\[ \frac{\partial tr \mathbf{XA}}{\partial \mathbf{X}} = \mathbf{A} + \mathbf{A}' - diag(\mathbf{A}) \]

If \(\mathbf{X}\) is symmetric and we let \(\mathbf{J}_{ij}\) be a matrix which has a 1 in the \((i,j)\)-th position and 0s elsewhere, then

\[ \frac{\partial \mathbf{X}6{-1}}{\partial x_{ij}} = \begin{cases} - \mathbf{X}^{-1}\mathbf{J}_{ii} \mathbf{X}^{-1} &, i = j \\ - \mathbf{X}^{-1}(\mathbf{J}_{ij} + \mathbf{J}_{ji}) \mathbf{X}^{-1} &, i \neq j \end{cases} \]

2.1.5 Optimization

Scalar Optimization Vector Optimization
First Order Condition \[\frac{\partial{f(x_0)}}{\partial{x}}=0\] \[\frac{\partial{f(x_0)}}{\partial{x}}=\left(\begin{array}{c}0 \\ .\\ .\\ .\\ 0\end{array}\right)\]

Second Order Condition

Convex \(\rightarrow\) Min

\[\frac{\partial^2{f(x_0)}}{\partial{x^2}} > 0\] \[\frac{\partial^2{f(x_0)}}{\partial{xx'}}>0\]
Concave \(\rightarrow\) Max \[\frac{\partial^2{f(x_0)}}{\partial{x^2}} < 0\] \[\frac{\partial^2{f(x_0)}}{\partial{xx'}}<0\]