3.6 Multivariate Probability Distributions Using Matrix Algebra

In this section, we show how matrix algebra can be used to simplify many of the messy expressions concerning expectations and covariances between multiple random variables, and we show how certain multivariate probability distributions (e.g., the multivariate normal distribution) can be expressed using matrix algebra.

3.6.1 Random vectors

Let \(X_{1},\ldots,X_{n}\) denote \(n\) random variables. For \(i=1,\ldots,n\) let \(\mu_{i}=E[X_{i}]\) and \(\sigma_{i}^{2}=\mathrm{var}(X_{i})\), and let \(\sigma_{ij}=\mathrm{cov}(X_{i},X_{j})\) for \(i\neq j\). Define the \(n\times1\) random vector \(\mathbf{X}=(X_{1},\ldots,X_{n})^{\prime}\). Associated with \(\mathbf{X}\) is the \(n\times1\) vector of expected values: \[\begin{equation} \underset{n\times1}{\mu}=E[\mathbf{X}]=\left(\begin{array}{c} E[X_{1}]\\ \vdots\\ E[X_{n}] \end{array}\right)=\left(\begin{array}{c} \mu_{1}\\ \vdots\\ \mu_{n} \end{array}\right) \tag{3.3}. \end{equation}\]

3.6.2 Covariance matrix

The covariance matrix, \(\Sigma\), summarizes the variances and covariances of the elements of the random vector \(\mathbf{X}\). In general, the covariance matrix of a random vector \(\mathbf{X}\) (sometimes called the variance of the vector \(\mathbf{X}\)) with mean
vector \(\mu\) is defined as:

\[\begin{equation} \mathrm{cov}(\mathbf{X})=E[(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}]=\Sigma. \tag{3.4} \end{equation}\]

If \(\mathbf{X}\) has \(n\) elements then \(\Sigma\) will be the symmetric and positive semi-definite \(n\times n\) matrix,

\[\begin{equation} \underset{n\times n}{\Sigma}=\left(\begin{array}{cccc} \sigma_{1}^{2} & \sigma_{12} & \cdots & \sigma_{1n}\\ \sigma_{12} & \sigma_{2}^{2} & \cdots & \sigma_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{1n} & \sigma_{2n} & \cdots & \sigma_{n}^{2} \end{array}\right) \tag{3.5}. \end{equation}\]

For the case \(n=2\), we have by direct calculation \[\begin{align*} E[(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}] & =E\left[\left(\begin{array}{l} X_{1}-\mu_{1}\\ X_{2}-\mu_{2} \end{array}\right)\cdot\left(X_{1}-\mu_{1},X_{2}-\mu_{2}\right)\right]\\ & =E\left[\left(\begin{array}{ll} (X_{1}-\mu_{1})^{2} & (X_{1}-\mu_{1})(X_{2}-\mu_{2})\\ (X_{2}-\mu_{2})(X_{1}-\mu_{1}) & (X_{2}-\mu_{2})^{2} \end{array}\right)\right]\\ & =\left(\begin{array}{ll} E[(X_{1}-\mu_{1})^{2}] & E[(X_{1}-\mu_{1})(X_{2}-\mu_{2})]\\ E[(X_{2}-\mu_{2})(X_{1}-\mu_{1})] & E[(X_{2}-\mu_{2})^{2}] \end{array}\right)\\ & =\left(\begin{array}{ll} \mathrm{var}(X_{1}) & \mathrm{cov}(X_{1},X_{2})\\ \mathrm{cov}(X_{2},X_{1}) & \mathrm{var}(X_{2}) \end{array}\right)=\left(\begin{array}{ll} \sigma_{1}^{2} & \sigma_{12}\\ \sigma_{12} & \sigma_{2}^{2} \end{array}\right)=\Sigma. \end{align*}\]

If \(\rho_{12} \ne \pm 1\) then \(\mathrm{det}(\Sigma) \ne 0\), \(\Sigma\) has rank 2 and is positive definite. However, if \(\rho_{12} = \pm 1\) then \(\mathrm{det}(\Sigma)=0\), \(\Sigma\) has rank 1 and is positive semi-definite.

3.6.3 Correlation matrix

The correlation matrix, \(\,\mathbf{C}\), summarizes all pairwise correlations between the elements of the \(n\times1\) random vector \(\mathbf{X}\) and is given by

\[\begin{equation} \mathbf{C}=\left(\begin{array}{cccc} 1 & \rho_{12} & \cdots\ & \rho_{1n}\\ \rho_{12} & 1 & \cdots\ & \rho_{2n}\\ \vdots\ & \vdots\ & \ddots\ & \vdots\\ \rho_{1n} & \rho_{2n} & \cdots\ & 1 \end{array}\right) \tag{2.55} \end{equation}\]

Let \(\Sigma\) denote the covariance matrix of \(\mathbf{X}\). Then the correlation matrix \(\mathbf{C}\) can be computed using \[\begin{equation} \mathbf{C}=\mathbf{D}^{-1}\Sigma\mathbf{D}^{-1},\tag{3.6} \end{equation}\] where \(\mathbf{D}\) is an \(n\times n\) diagonal matrix with the standard deviations of the elements of \(\mathbf{X}\) along the main diagonal: \[ \mathbf{D}=\left(\begin{array}{cccc} \sigma_{1} & 0 & \cdots & 0\\ 0 & \sigma_{2} & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & \sigma_{n} \end{array}\right). \]

Example 2.28 (Computing correlation matrix from covariance matrix in R)

In R the cov2cor() function can be used to compute the correlation matrix given the covariance matrix using (3.6):

sigma.1 = 1
sigma.2 = 2
rho.12 = 0.5
sigma.12 = rho.12*sigma.1*sigma.2
Sigma = matrix(c(sigma.1^2, sigma.12, sigma.12, sigma.2^2), 
               2, 2, byrow=TRUE)
Sigma
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    4
cov2cor(Sigma)
##      [,1] [,2]
## [1,]  1.0  0.5
## [2,]  0.5  1.0

\(\blacksquare\)

3.6.4 Variance of linear combination of random vectors

Consider the \(n\times1\) random vector \(\mathbf{X}\) with mean vector \(\mu\) and covariance matrix \(\Sigma\). Let \(\mathbf{a}=(a_{1},\ldots,a_{n})^{\prime}\) be an \(n\times1\) vector of constants and consider the random variable \(Y=\mathbf{a}^{\prime}\mathbf{X}=a_{1}X_{1}+\cdots+a_{n}X_{n}\). The expected value of \(Y\) is:

\[\begin{equation} \mu_{Y}=E[Y]=E[\mathbf{a}^{\prime}\mathbf{X}]=\mathbf{a}^{\prime}E[\mathbf{X}]= \mathbf{a}^{\prime}\mu \tag{2.56}. \end{equation}\]

The variance of \(Y\) is \[ \mathrm{var}(Y)=\mathrm{var}(\mathbf{a}^{\prime}\mathbf{X})=E[(\mathbf{a}^{\prime}\mathbf{X}-\mathbf{a}^{\prime}\mu)^{2}]=E[(\mathbf{a}^{\prime}(\mathbf{X}-\mu))^{2}] \] since \(Y=\mathbf{a}^{\prime}\mathbf{X}\) is a scalar. Now we use a trick from matrix algebra. If \(z\) is a scalar (think of \(z=2\)) then \(z^{\prime}z=z\cdot z^{\prime}=z^{2}\). Let \(z=\mathbf{a}^{\prime}(\mathbf{X}-\mu)\) and so \(z\cdot z^{\prime}=\mathbf{a}^{\prime}(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}\mathbf{a}\). Then, \[\begin{align} \mathrm{var}(Y) & =E[z^{2}]=E[z\cdot z^{\prime}]\\ & =E[\mathbf{a}^{\prime}(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}\mathbf{a}] \nonumber \\ & =\mathbf{a}^{\prime}E[(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}]\mathbf{a} \nonumber \\ & =\mathbf{a}^{\prime}\mathrm{cov}(\mathbf{X})\mathbf{a}=\mathbf{a}^{\prime}\Sigma\mathbf{a} \tag{2.57}. \end{align}\]

3.6.5 Covariance between linear combination of two random vectors

Consider the \(n\times1\) random vector \(\mathbf{X}\) with mean vector \(\mu\) and covariance matrix \(\Sigma\). Let \(\mathbf{a}=(a_{1},\ldots,a_{n})^{\prime}\) and \(\mathbf{b}=(b_{1},\ldots,b_{n})^{\prime}\) be \(n\times1\) vectors of constants, and consider the random variable \(Y=\mathbf{a}^{\prime}\mathbf{X}=a_{1}X_{1}+\cdots+a_{n}X_{n}\) and \(Z=\mathbf{b}^{\prime}\mathbf{X}=b_{1}X_{1}+\cdots+b_{n}X_{n}\). From the definition of covariance we have: \[ \mathrm{cov}(Y,Z)=E[(Y-E[Y])(Z-E[Z])] \] which may be rewritten in matrix notation as, \[\begin{align} \mathrm{cov}(\mathbf{a}^{\prime}\mathbf{X,b}^{\prime}\mathbf{X}) & =E[(\mathbf{a}^{\prime}\mathbf{X}-\mathbf{a}^{\prime}\mu)(\mathbf{b}^{\prime}\mathbf{X-b}^{\prime}\mu)]\nonumber \\ & =E[\mathbf{a}^{\prime}\mathbf{(X-}\mu\mathbf{)b}^{\prime}\mathbf{(X-}\mu\mathbf{)}] \nonumber \\ & =E[\mathbf{a}^{\prime}\mathbf{(X-}\mu\mathbf{)(X-}\mu\mathbf{)}^{\prime}\mathbf{b}] \nonumber \\ & =\mathbf{a}^{\prime}E[(\mathbf{X-}\mu\mathbf{)(X-}\mu)^{\prime}]\mathbf{b} \nonumber \\ & =\mathbf{a}^{\prime}\Sigma \mathbf{b} \tag{3.7}. \end{align}\] Since \(\mathbf{a}^{\prime}\mathbf{(X-}\mu\mathbf{)}\) and \(\mathbf{b}^{\prime}\mathbf{(X-}\mu\mathbf{)}\) are scalars, we can use the trick: \[ \mathbf{a}^{\prime}\mathbf{(X-}\mu\mathbf{)b}^{\prime}\mathbf{(X-}\mu\mathbf{)=a}^{\prime}\mathbf{(X-}\mu\mathbf{)(X-}\mu\mathbf{)}^{\prime}\mathbf{b}. \]

3.6.6 Variance of a vector of linear functions of a random vector

Consider the \(n\times1\) random vector \(\mathbf{X}\) with mean vector \(\mu\) and covariance matrix \(\Sigma\). Let \(\mathbf{A}\) be an \(m\times n\) matrix. Then the \(m\times1\) random vector \(\mathbf{Y}=\mathbf{AX}\) represents \(m\) linear functions of \(\mathbf{X}\). By the linearity of expectation, we have \(E[\mathbf{Y}]=\mathbf{A}E[\mathbf{X}]=\mathbf{A\mu}.\) The \(m\times m\) covaraince matrix of \(\mathbf{Y}\) is given by \[\begin{eqnarray} \mathrm{var}(\mathbf{Y}) & = & E[(\mathbf{Y}-E[\mathbf{Y}])(\mathbf{Y}-E[\mathbf{Y}])^{\prime}] \nonumber \\ & = & E[(\mathbf{AX}-\mathbf{A}\mu)(\mathbf{AX}-\mathbf{A}\mu)^{\prime}] \nonumber \\ & = & E[\mathbf{A}(\mathbf{X}-\mu)(\mathbf{A}(\mathbf{X}-\mu))^{\prime}] \nonumber \\ & = & E[\mathbf{A}(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}\mathbf{A}^{\prime}] \nonumber \\ & = & \mathbf{A}E[(\mathbf{X}-\mu)(\mathbf{X}-\mu)^{\prime}]\mathbf{A}^{\prime} \nonumber \\ & = & \mathbf{A}\Sigma\mathbf{A}^{\prime} \tag{3.8}. \end{eqnarray}\]

3.6.7 Multivariate normal distribution

Consider \(n\) random variables \(X_{1},\ldots,X_{n}\) and assume they are jointly normally distributed. Define the \(n\times1\) vectors \(\mathbf{X}=(X_{1},\ldots,X_{n})^{\prime}\), \(\mathbf{x}=(x_{1},\ldots,x_{n})^{\prime}\) and \(\mu=(\mu_{1},\ldots,\mu_{n})^{\prime}\), and the \(n\times n\) covariance matrix: \[ \Sigma=\left(\begin{array}{cccc} \sigma_{1}^{2} & \sigma_{12} & \cdots & \sigma_{1n}\\ \sigma_{12} & \sigma_{2}^{2} & \cdots & \sigma_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{1n} & \sigma_{2n} & \cdots & \sigma_{n}^{2} \end{array}\right). \] Then \(\mathbf{X}\sim N(\mu,\Sigma)\) means that the random vector \(\mathbf{X}\) has a multivariate normal distribution with mean vector \(\mu\) and covariance matrix \(\Sigma\). The pdf of the multivariate normal distribution can be compactly expressed as: \[\begin{align*} f(\mathbf{x}) & =\frac{1}{2\pi^{n/2}\det(\Sigma)^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\mu)^{\prime}\Sigma^{-1}(\mathbf{x}-\mu)}\\ & =(2\pi)^{-n/2}\det(\Sigma)^{-1/2}e^{-\frac{1}{2}(\mathbf{x}-\mu)^{\prime}\Sigma^{-1}(\mathbf{x}-\mu)}. \end{align*}\]

Example 3.8 (Creating a multivariate normal random vector from a vector of independent standard normal random variables)

Let \(\mathbf{Z}=(Z_{1},\ldots,Z_{n})^{\prime}\) denote a vector of \(n\) iid standard normal random variables. Then \(\mathbf{Z}\sim N(\mathbf{0},\,\mathbf{I}_{n})\) where \(\mathbf{I}_{n}\) denotes the \(n-\)dimensional identity matrix. Given \(\mathbf{Z}\), we would like to create a random vector \(\mathbf{X}\) such that \(\mathbf{X}\sim N(\mu,\,\Sigma).\) Let \(\Sigma^{1/2}\) denote the upper triangular Cholesky factor of \(\Sigma\) such that \(\Sigma=\Sigma^{1/2 \prime}\Sigma^{1/2}.\) Define \(\mathbf{X}\) using \[ \mathbf{X}=\mu+\Sigma^{1/2 \prime}\mathbf{Z}. \] Then \[\begin{eqnarray*} E[\mathbf{X}] & = & \mu+\Sigma^{1/2}E[\mathbf{Z}]=\mu,\\ \mathrm{var}(\mathbf{X}) & = & \mathrm{var}\left(\Sigma^{1/2}\mathbf{Z}\right)=\Sigma^{1/2 \prime}\mathrm{var}(\mathbf{Z})\Sigma^{1/2}=\Sigma^{1/2 \prime}\mathbf{I}_{n}\Sigma^{1/2}=\Sigma. \end{eqnarray*}\] Hence, \(\mathbf{X}\sim N(\mu,\,\Sigma).\)

\(\blacksquare\)

Example 2.30 (Simulating multivariate normal random vectors in R)

Here, we illustrate how to simulate \(\mathbf{X} \sim N(\mu, \Sigma)\) where \(\mu = (1,1)^{\prime}\) and \[ \Sigma = \left(\begin{array}{ll} 1 & 1\\ 1 & 4 \end{array}\right) \] First, specify the mean vector and covariance matrix for the the bivariate normal distribution:

mu.vec = c(1,1)
sigma.1 = 1
sigma.2 = 2
rho.12 = 0.5
sigma.12 = rho.12*sigma.1*sigma.2
Sigma = matrix(c(sigma.1^2, sigma.12, sigma.12, sigma.2^2), 
               2, 2, byrow=TRUE)
mu.vec
## [1] 1 1
Sigma
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    4

Second, compute the Cholesky factorization of \(\Sigma\):

Sigma.5 = chol(Sigma)
Sigma.5
##      [,1]     [,2]
## [1,]    1 1.000000
## [2,]    0 1.732051

Third, simulate \(\mathbf{Z} \sim N(\mathbf{0}, \mathbf{I}_{2})\)

n = 2
set.seed(123)
Z = rnorm(n)

Fourth, compute \(\mathbf{X} = \mu + \Sigma^{1/2 \prime}\mathbf{Z}\)

X = mu.vec + t(Sigma.5)%*%Z
X
##            [,1]
## [1,] 0.43952435
## [2,] 0.04084525

\(\blacksquare\)