2.1 Matrix Theory
Matrix \(A\) represents the original matrix. It’s a 2x2 matrix with elements \(a_{ij}\), where \(i\) represents the row and \(j\) represents the column.
\[ A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \] \(A'\) is the transpose of \(A\). The transpose of a matrix flips its rows and columns.
\[ A' = \begin{bmatrix} a_{11} & a_{21} \\ a_{12} & a_{22} \end{bmatrix} \]
Fundamental properties and rules of matrices, essential for understanding operations in linear algebra:
\[ \begin{aligned} \mathbf{(ABC)'} & = \mathbf{C'B'A'} \quad &\text{(Transpose reverses order in a product)} \\ \mathbf{A(B+C)} & = \mathbf{AB + AC} \quad &\text{(Distributive property)} \\ \mathbf{AB} & \neq \mathbf{BA} \quad &\text{(Multiplication is not commutative)} \\ \mathbf{(A')'} & = \mathbf{A} \quad &\text{(Double transpose is the original matrix)} \\ \mathbf{(A+B)'} & = \mathbf{A' + B'} \quad &\text{(Transpose of a sum is the sum of transposes)} \\ \mathbf{(AB)'} & = \mathbf{B'A'} \quad &\text{(Transpose reverses order in a product)} \\ \mathbf{(AB)^{-1}} & = \mathbf{B^{-1}A^{-1}} \quad &\text{(Inverse reverses order in a product)} \\ \mathbf{A+B} & = \mathbf{B + A} \quad &\text{(Addition is commutative)} \\ \mathbf{AA^{-1}} & = \mathbf{I} \quad &\text{(Matrix times its inverse is identity)} \end{aligned} \] These properties are critical in solving systems of equations, optimizing models, and performing data transformations.
If a matrix \(\mathbf{A}\) has an inverse, it is called invertible. If \(\mathbf{A}\) does not have an inverse, it is referred to as singular.
The product of two matrices \(\mathbf{A}\) and \(\mathbf{B}\) is computed as:
\[ \begin{aligned} \mathbf{A} &= \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{bmatrix} \\ &= \begin{bmatrix} a_{11}b_{11}+a_{12}b_{21}+a_{13}b_{31} & \sum_{i=1}^{3}a_{1i}b_{i2} & \sum_{i=1}^{3}a_{1i}b_{i3} \\ \sum_{i=1}^{3}a_{2i}b_{i1} & \sum_{i=1}^{3}a_{2i}b_{i2} & \sum_{i=1}^{3}a_{2i}b_{i3} \end{bmatrix} \end{aligned} \]
Quadratic Form
Let \(\mathbf{a}\) be a \(3 \times 1\) vector. The quadratic form involving a matrix \(\mathbf{B}\) is given by:
\[ \mathbf{a'Ba} = \sum_{i=1}^{3}\sum_{j=1}^{3}a_i b_{ij} a_{j} \]
Length of a Vector
The length (or 2-norm) of a vector \(\mathbf{a}\), denoted as \(||\mathbf{a}||\), is defined as the square root of the inner product of the vector with itself:
\[ ||\mathbf{a}|| = \sqrt{\mathbf{a'a}} \]
2.1.1 Rank of a Matrix
The rank of a matrix refers to:
- The dimension of the space spanned by its columns (or rows).
- The number of linearly independent columns or rows.
For an \(n \times k\) matrix \(\mathbf{A}\) and a \(k \times k\) matrix \(\mathbf{B}\), the following properties hold:
- \(\text{rank}(\mathbf{A}) \leq \min(n, k)\)
- \(\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A'}) = \text{rank}(\mathbf{A'A}) = \text{rank}(\mathbf{AA'})\)
- \(\text{rank}(\mathbf{AB}) = \min(\text{rank}(\mathbf{A}), \text{rank}(\mathbf{B}))\)
- \(\mathbf{B}\) is invertible (non-singular) if and only if \(\text{rank}(\mathbf{B}) = k\).
2.1.2 Inverse of a Matrix
In scalar algebra, if \(a = 0\), then \(1/a\) does not exist.
In matrix algebra, a matrix is invertible if it is non-singular, meaning it has a non-zero determinant and its inverse exists. A square matrix \(\mathbf{A}\) is invertible if there exists another square matrix \(\mathbf{B}\) such that:
\[ \mathbf{AB} = \mathbf{I} \quad \text{(Identity Matrix)}. \]
In this case, \(\mathbf{A}^{-1} = \mathbf{B}\).
For a \(2 \times 2\) matrix:
\[ \mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \]
The inverse is:
\[ \mathbf{A}^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \]
This inverse exists only if \(ad - bc \neq 0\), where \(ad - bc\) is the determinant of \(\mathbf{A}\).
For a partitioned block matrix:
\[ \begin{bmatrix} A & B \\ C & D \end{bmatrix}^{-1} = \begin{bmatrix} \mathbf{(A-BD^{-1}C)^{-1}} & \mathbf{-(A-BD^{-1}C)^{-1}BD^{-1}} \\ \mathbf{-D^{-1}C(A-BD^{-1}C)^{-1}} & \mathbf{D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1}} \end{bmatrix} \]
This formula assumes that \(\mathbf{D}\) and \(\mathbf{A - BD^{-1}C}\) are invertible.
Properties of the Inverse for Non-Singular Matrices
- \(\mathbf{(A^{-1})^{-1}} = \mathbf{A}\)
- For a non-zero scalar \(b\), \(\mathbf{(bA)^{-1} = b^{-1}A^{-1}}\)
- For a matrix \(\mathbf{B}\), \(\mathbf{(BA)^{-1} = B^{-1}A^{-1}}\) (only if \(\mathbf{B}\) is non-singular).
- \(\mathbf{(A^{-1})' = (A')^{-1}}\) (the transpose of the inverse equals the inverse of the transpose).
- Never notate \(\mathbf{1/A}\); use \(\mathbf{A^{-1}}\) instead.
Notes: - The determinant of a matrix determines whether it is invertible. For square matrices, a determinant of \(0\) means the matrix is singular and has no inverse.
- Always verify the conditions for invertibility, particularly when dealing with partitioned or block matrices.
2.1.3 Definiteness of a Matrix
A symmetric square \(k \times k\) matrix \(\mathbf{A}\) is classified based on the following conditions:
Positive Semi-Definite (PSD): \(\mathbf{A}\) is PSD if, for any non-zero \(k \times 1\) vector \(\mathbf{x}\): \[ \mathbf{x'Ax \geq 0}. \]
Negative Semi-Definite (NSD): \(\mathbf{A}\) is NSD if, for any non-zero \(k \times 1\) vector \(\mathbf{x}\): \[ \mathbf{x'Ax \leq 0}. \]
Indefinite: \(\mathbf{A}\) is indefinite if it is neither PSD nor NSD.
The identity matrix is always positive definite (PD).
Example
Let \(\mathbf{x} = (x_1, x_2)'\), and consider a \(2 \times 2\) identity matrix \(\mathbf{I}\):
\[ \begin{aligned} \mathbf{x'Ix} &= (x_1, x_2) \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ &= (x_1, x_2) \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \\ &= x_1^2 + x_2^2 \geq 0. \end{aligned} \]
Thus, \(\mathbf{I}\) is PD because \(\mathbf{x'Ix} > 0\) for all non-zero \(\mathbf{x}\).
Properties of Definiteness
- Any variance-covariance matrix is PSD.
- A matrix \(\mathbf{A}\) is PSD if and only if there exists a matrix \(\mathbf{B}\) such that: \[ \mathbf{A = B'B}. \]
- If \(\mathbf{A}\) is PSD, then \(\mathbf{B'AB}\) is also PSD for any conformable matrix \(\mathbf{B}\).
- If \(\mathbf{A}\) and \(\mathbf{C}\) are non-singular, then \(\mathbf{A - C}\) is PSD if and only if \(\mathbf{C^{-1} - A^{-1}}\) is PSD.
- If \(\mathbf{A}\) is PD (or ND), then \(\mathbf{A^{-1}}\) is also PD (or ND).
Notes
- An indefinite matrix \(\mathbf{A}\) is neither PSD nor NSD. This concept does not have a direct counterpart in scalar algebra.
- If a square matrix is PSD and invertible, then it is PD.
Examples of Definiteness
Invertible / Indefinite: \[ \begin{bmatrix} -1 & 0 \\ 0 & 10 \end{bmatrix} \]
Non-Invertible / Indefinite: \[ \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \]
Invertible / PSD: \[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]
Non-Invertible / PSD: \[ \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix} \]
2.1.4 Matrix Calculus
Consider a scalar function \(y = f(x_1, x_2, \dots, x_k) = f(x)\), where \(x\) is a \(1 \times k\) row vector.
2.1.4.1 Gradient (First-Order Derivative)
The gradient, or the first-order derivative of \(f(x)\) with respect to the vector \(x\), is given by:
\[ \frac{\partial f(x)}{\partial x} = \begin{bmatrix} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_2} \\ \vdots \\ \frac{\partial f(x)}{\partial x_k} \end{bmatrix} \]
2.1.4.2 Hessian (Second-Order Derivative)
The Hessian, or the second-order derivative of \(f(x)\) with respect to \(x\), is a symmetric matrix defined as:
\[ \frac{\partial^2 f(x)}{\partial x \partial x'} = \begin{bmatrix} \frac{\partial^2 f(x)}{\partial x_1^2} & \frac{\partial^2 f(x)}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f(x)}{\partial x_1 \partial x_k} \\ \frac{\partial^2 f(x)}{\partial x_2 \partial x_1} & \frac{\partial^2 f(x)}{\partial x_2^2} & \cdots & \frac{\partial^2 f(x)}{\partial x_2 \partial x_k} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f(x)}{\partial x_k \partial x_1} & \frac{\partial^2 f(x)}{\partial x_k \partial x_2} & \cdots & \frac{\partial^2 f(x)}{\partial x_k^2} \end{bmatrix} \]
2.1.4.3 Derivative of a Scalar Function with Respect to a Matrix
Let \(f(\mathbf{X})\) be a scalar function, where \(\mathbf{X}\) is an \(n \times p\) matrix. The derivative is:
\[ \frac{\partial f(\mathbf{X})}{\partial \mathbf{X}} = \begin{bmatrix} \frac{\partial f(\mathbf{X})}{\partial x_{11}} & \cdots & \frac{\partial f(\mathbf{X})}{\partial x_{1p}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f(\mathbf{X})}{\partial x_{n1}} & \cdots & \frac{\partial f(\mathbf{X})}{\partial x_{np}} \end{bmatrix} \]
2.1.4.4 Common Matrix Derivatives
- If \(\mathbf{a}\) is a vector and \(\mathbf{A}\) is a matrix independent of \(\mathbf{y}\):
- \(\frac{\partial \mathbf{a'y}}{\partial \mathbf{y}} = \mathbf{a}\)
- \(\frac{\partial \mathbf{y'y}}{\partial \mathbf{y}} = 2\mathbf{y}\)
- \(\frac{\partial \mathbf{y'Ay}}{\partial \mathbf{y}} = (\mathbf{A} + \mathbf{A'})\mathbf{y}\)
- If \(\mathbf{X}\) is symmetric:
- \(\frac{\partial |\mathbf{X}|}{\partial x_{ij}} = \begin{cases} X_{ii}, & i = j \\ X_{ij}, & i \neq j \end{cases}\) where \(X_{ij}\) is the \((i,j)\)-th cofactor of \(\mathbf{X}\).
- If \(\mathbf{X}\) is symmetric and \(\mathbf{A}\) is a matrix independent of \(\mathbf{X}\):
- \(\frac{\partial \text{tr}(\mathbf{XA})}{\partial \mathbf{X}} = \mathbf{A} + \mathbf{A'} - \text{diag}(\mathbf{A})\).
- If \(\mathbf{X}\) is symmetric, let \(\mathbf{J}_{ij}\) be a matrix with 1 at the \((i,j)\)-th position and 0 elsewhere:
- \(\frac{\partial \mathbf{X}^{-1}}{\partial x_{ij}} = \begin{cases} -\mathbf{X}^{-1}\mathbf{J}_{ii}\mathbf{X}^{-1}, & i = j \\ -\mathbf{X}^{-1}(\mathbf{J}_{ij} + \mathbf{J}_{ji})\mathbf{X}^{-1}, & i \neq j \end{cases}.\)
2.1.5 Optimization in Scalar and Vector Spaces
Optimization is the process of finding the minimum or maximum of a function. The conditions for optimization differ depending on whether the function involves a scalar or a vector. Below is a comparison of scalar and vector optimization:
Condition | Scalar Optimization | Vector Optimization |
---|---|---|
First-Order Condition | \[\frac{\partial f(x_0)}{\partial x} = 0\] | \[\frac{\partial f(x_0)}{\partial x} = \begin{bmatrix} 0 \\ \vdots \\ 0 \end{bmatrix}\] |
Second-Order Condition For convex functions, this implies a minimum. |
\[\frac{\partial^2 f(x_0)}{\partial x^2} > 0\] | \[\frac{\partial^2 f(x_0)}{\partial x \partial x'} > 0\] |
For concave functions, this implies a maximum. | \[\frac{\partial^2 f(x_0)}{\partial x^2} < 0\] | \[\frac{\partial^2 f(x_0)}{\partial x \partial x'} < 0\] |
Key Concepts
- First-Order Condition:
- The first-order derivative of the function must equal zero at a critical point. This holds for both scalar and vector functions:
- In the scalar case, \(\frac{\partial f(x)}{\partial x} = 0\) identifies critical points.
- In the vector case, \(\frac{\partial f(x)}{\partial x}\) is a gradient vector, and the condition is satisfied when all elements of the gradient are zero.
- The first-order derivative of the function must equal zero at a critical point. This holds for both scalar and vector functions:
- Second-Order Condition:
- The second-order derivative determines whether the critical point is a minimum, maximum, or saddle point:
- For scalar functions, \(\frac{\partial^2 f(x)}{\partial x^2} > 0\) implies a local minimum, while \(\frac{\partial^2 f(x)}{\partial x^2} < 0\) implies a local maximum.
- For vector functions, the Hessian matrix \(\frac{\partial^2 f(x)}{\partial x \partial x'}\) must be:
- Positive Definite: For a minimum (convex function).
- Negative Definite: For a maximum (concave function).
- Indefinite: For a saddle point (neither minimum nor maximum).
- The second-order derivative determines whether the critical point is a minimum, maximum, or saddle point:
- Convex and Concave Functions:
- A function \(f(x)\) is:
- Convex if \(\frac{\partial^2 f(x)}{\partial x^2} > 0\) or the Hessian \(\frac{\partial^2 f(x)}{\partial x \partial x'}\) is positive definite.
- Concave if \(\frac{\partial^2 f(x)}{\partial x^2} < 0\) or the Hessian is negative definite.
- Convexity ensures global optimization for minimization problems, while concavity ensures global optimization for maximization problems.
- A function \(f(x)\) is:
- Hessian Matrix:
- In vector optimization, the Hessian \(\frac{\partial^2 f(x)}{\partial x \partial x'}\) plays a crucial role in determining the nature of critical points:
- Positive definite Hessian: All eigenvalues are positive.
- Negative definite Hessian: All eigenvalues are negative.
- Indefinite Hessian: Eigenvalues have mixed signs.
- In vector optimization, the Hessian \(\frac{\partial^2 f(x)}{\partial x \partial x'}\) plays a crucial role in determining the nature of critical points: