6.5 Useful matrix information in statistics
In statistics, we often work with vectors and matrices. The vector of responses variables is often written as \[ \textbf{y} = \left[ \begin{array}{c} y_1\\ y_2\\ y_3\\ \vdots\\ y_n \end{array} \right]. \] These values of \(y_i\) are often assumed to have been generated from a process with means \(\mu_i\), so we can write \[ \boldsymbol{\mu} = \left[ \begin{array}{c} \mu_1\\ \mu_2\\ \mu_3\\ \vdots\\ \mu_n \end{array} \right], \] or \(E[\textbf{y}] = \boldsymbol{\mu}\).
For an appropriately conformable \(m\times n\) matrix \({C}\), \(E[C \textbf{y}] = C\boldsymbol{\mu}\). This is the matrix equivalent of \(E[cX] = c E[X] = c\mu\).
Similarly, \(\text{var}[C\textbf{y}] = C\text{var}[\textbf{y}]C^T\). This is the matrix equivalent of stating \(\text{var}[c X] = c^2 \text{var}[X]\).
A further useful thing to note: Consider an unknown \(n\times 1\) vector \(\textbf{z}\) and a \(n\times n\) matrix \(M\). Then differentiating \(S = \textbf{z}^T M \textbf{z}\) with respect to \(\textbf{z}\) gives \[ \frac{dS}{d\textbf{z}} = 2M\textbf{z}. \] This is the matrix equivalent of differentiating \(y = a x^2\) and getting \(2ax\).