A Matrix calculus
Matrix calculus is a mathematical field specifically dealing with calculus for matrices and vectors. It extends traditional calculus concepts to apply to matrices and vectors that involve multidimensional data. In machine learning, statistics, and optimization problems, matrix calculus serves as an essential tool.
Note that NOT to be confused with vector calculus
First we need to identify the difference between scalar \(y\), vector \(\mathbf{y}\), and matrix \(\mathbf{Y}\).
The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following table.
Scalar | Vector | Matrix | |
---|---|---|---|
Scalar | \(\frac{\partial y}{\partial x}\) | \(\frac{\partial \mathbf{y}}{\partial x}\) | \(\frac{\partial \mathbf{Y}}{\partial x}\) |
Vector | \(\frac{\partial y}{\partial \mathbf{x}}\) | \(\frac{\partial \mathbf{y}}{\partial \mathbf{x}}\) | |
Matrix | \(\frac{\partial y}{\partial \mathbf{X}}\) |
- Derivatives with vectors
- Vector-by-scalar \[ {\frac {\partial \mathbf {y} }{\partial x}}={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x}}\\{\frac {\partial y_{2}}{\partial x}}\\\vdots \\{\frac {\partial y_{m}}{\partial x}}\\\end{bmatrix}}. \]
- Scalar-by-vector \[ \displaystyle {\frac {\partial y}{\partial \mathbf {x} }}={\begin{bmatrix}{\dfrac {\partial y}{\partial x_{1}}}&{\dfrac {\partial y}{\partial x_{2}}}&\cdots &{\dfrac {\partial y}{\partial x_{n}}}\end{bmatrix}} \]
- Vector-by-vector \[ {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x_{1}}}&{\frac {\partial y_{1}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{1}}{\partial x_{n}}}\\{\frac {\partial y_{2}}{\partial x_{1}}}&{\frac {\partial y_{2}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{2}}{\partial x_{n}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m}}{\partial x_{1}}}&{\frac {\partial y_{m}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{n}}}\\\end{bmatrix}} \]
- Derivatives with matrices
- Matrix-by-scalar \[ {\frac {\partial \mathbf {Y} }{\partial x}}={\begin{bmatrix}{\frac {\partial y_{11}}{\partial x}}&{\frac {\partial y_{12}}{\partial x}}&\cdots &{\frac {\partial y_{1n}}{\partial x}}\\{\frac {\partial y_{21}}{\partial x}}&{\frac {\partial y_{22}}{\partial x}}&\cdots &{\frac {\partial y_{2n}}{\partial x}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m1}}{\partial x}}&{\frac {\partial y_{m2}}{\partial x}}&\cdots &{\frac {\partial y_{mn}}{\partial x}}\\\end{bmatrix}} \]
- Scalar-by-matrix \[ \displaystyle {\frac {\partial y}{\partial \mathbf {X} }}={\begin{bmatrix}{\frac {\partial y}{\partial x_{11}}}&{\frac {\partial y}{\partial x_{21}}}&\cdots &{\frac {\partial y}{\partial x_{p1}}}\\{\frac {\partial y}{\partial x_{12}}}&{\frac {\partial y}{\partial x_{22}}}&\cdots &{\frac {\partial y}{\partial x_{p2}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y}{\partial x_{1q}}}&{\frac {\partial y}{\partial x_{2q}}}&\cdots &{\frac {\partial y}{\partial x_{pq}}}\\\end{bmatrix}} \]