Chapter 6 Multivariate Normal Distributions

To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. However throughout this course we have studied joint probability distribution functions: that is the study of multiple quantities simultaneously. This begs the question, are there any standard probability distributions in higher dimensions?

6.1 Matrices: symmetry, eigenvalues and postive definiteness

Throughout this chapter, we will use the language of matrices freely. As a point of notation, we will represent all matrices with variable names in a bold font, for example \(\mathbf{A}\). A vector will be thought of as a matrix in this sense: it is a matrix with either \(1\) row or \(1\) column.

Further when we use the notation \(\mathbf{A} = (a_{ij})\), this labels the entry of \(\mathbf{A}\) in the \(i^{th}\) row and \(j^{th}\) column by \(a_{i,j}\).

We consolidate our knowledge on the definition of symmetric matrices, the determinant of an \(n \times n\) matrix, the definitions of eigenvalues and eigenvectors, and what it means for matrices to be positive definite.

Let \(\mathbf{A}\) be a \(n \times n\) matrix.Then \(\mathbf{A}\) is said to be symmetric if \(\mathbf{A} = \mathbf{A}^{T}\).

Definition 6.1.1 is equivalent to \(\mathbf{A}=(a_{ij})\) satisfying \(a_{ij}=a_{ji}\) for all \(i,j\) with \(1\leq i,j \leq n\).

Are the following matrices symmetric? \[1. \quad \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}; \hspace{40pt} 2. \quad \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix};\] \[3. \quad \begin{pmatrix} 0 & 3 & 4 \\ -3 & 0 & 7 \\ -4 & -7 & 0 \end{pmatrix}; \hspace{40pt} 4. \quad \begin{pmatrix} 9 & 2 & 3 \\ 2 & -1 & -8 \\ 3 & -8 & 0 \end{pmatrix}.\]



The answers are

  1. Yes;

  2. Yes;

  3. No;

  4. Yes.

Let \(\mathbf{A}\) be a \(n \times n\) matrix. A scalar \(\lambda\) is called an eigenvalue of \(\mathbf{A}\) if there is a non-zero vector \(\mathbf{v}\) such that \[\mathbf{Av} = \lambda \mathbf{v}.\] The vector \(\mathbf{v}\) corresponding to \(\lambda\) is called an eigenvector of \(\mathbf{A}\).

It was shown that an eigenvalue \(\lambda\) of an \(n \times n\) matrix \(\mathbf{A}\) satisfies the characteristic polynomial: \[\det(\mathbf{A} - \lambda \mathbf{I_n})=0.\] Recall that the determinant of any \(n \times n\) matrix \(\mathbf{M}\) is calculated by fixing some value \(j\) with \(1\leq j \leq n\), and finding \[\det (\mathbf{M}) = \sum\limits_{i=1}^{n}(-1)^{i+j}a_{ij}M_{ij},\] where \(M_{ij}\) is the minor of \(\mathbf{M}\), that is, the determinant of the \((n-1) \times (n-1)\) matrix obtained by deleting the \(i^{th}\) row and \(j^{th}\) column of \(\mathbf{M}\).

What are the eigenvalues and eigenvectors of \(\mathbf{A} = \begin{pmatrix} 5 & 2 & 0 \\ 2 & 5 & 0 \\ -3 & 4 & 6 \end{pmatrix}\)?



Note that \(\mathbf{A}\) is a \(3\times 3\) matrix, so in the notation of Definition 6.1.3, \(n=3\). Calculate \[\begin{align*} A - \lambda I_3 &= \begin{pmatrix} 5 & 2 & 0 \\ 2 & 5 & 0 \\ -3 & 4 & 6 \end{pmatrix} - \lambda \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \\[9pt] &= \begin{pmatrix} 5-\lambda & 2 & 0 \\ 2 & 5-\lambda & 0 \\ -3 & 4 & 6-\lambda \end{pmatrix}. \end{align*}\]

So \[\begin{align*} \det (A - \lambda I_3) &= 0 \\[9pt] \det \begin{pmatrix} 5-\lambda & 2 & 0 \\ 2 & 5-\lambda & 0 \\ -3 & 4 & 6-\lambda \end{pmatrix} &=0 \\[9pt] (5-\lambda) \big[ (5-\lambda)(6-\lambda) - 0 \big] - 2 \big[ 2(6-\lambda) -0 \big] &=0 \\[9pt] (5-\lambda) (5-\lambda)(6-\lambda) - 4(6-\lambda) &= 0 \\[9pt] (6-\lambda) \big[ (5-\lambda)(5-\lambda) - 4\big] &=0 \\[9pt] (6-\lambda) (\lambda^2 -10 \lambda +21) &=0 \\[9pt] (6-\lambda) (\lambda-3)(\lambda-7) &=0 \end{align*}\]

So \(\lambda_1=3, \lambda_2 =6, \lambda_3 =7\) are the eigenvalues.

For \(\lambda_1=3\), we are look for a eigenvector \(v_1 = \begin{pmatrix} x \\ y \\ z \end{pmatrix}\) satisfying \[\begin{align*} &(A - 3I_3) v_1 = \mathbf{0} \\[9pt] \iff \hspace{10pt} &\begin{pmatrix} 2 & 2 & 0 \\ 2 & 2 & 0 \\ -3 & 4 & 3 \end{pmatrix}\begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \\[9pt] \iff \hspace{10pt} &\begin{cases} 2x+2y=0 \hspace{100pt} (1)\\ 2x+2y=0 \\ -3x + 4y +3z=0 \hspace{70pt} (2) \end{cases} \end{align*}\]

Equation (1) rearranges to \(y=-x\). Substituting this into equation (2), we obtain \[\begin{align*} -3x-4x+3z &= 0 \\[9pt] 3z &= 7x \end{align*}\]

So clearly \(x=3, y=-3\) and \(z=7\) is a solution. So we have an eigenvector \[v_1 = \begin{pmatrix} 3 \\ -3 \\ 7 \end{pmatrix}.\]

For \(\lambda_2 = 6\), we are looking for eigenvector \(v_2 = \begin{pmatrix} x \\ y \\ z \end{pmatrix}\) satisfying \[\begin{align*} &(A - 6I_3) v_1 = \mathbf{0} \\[9pt] \iff \hspace{10pt} &\begin{pmatrix} -1 & 2 & 0 \\ 2 & -1 & 0 \\ -3 & 4 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \\[9pt] \iff \hspace{10pt} &\begin{cases} -x+2y=0 \\ 2x-y=0 \\ -3x + 4y=0 \end{cases} \end{align*}\]

Similarly to the previous eigenvalue, we solve these three equations simultaneously giving \[v_2 = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}.\]

For \(\lambda_3 = 7\), we are looking for eigenvector \(v_3 = \begin{pmatrix} x \\ y \\ z \end{pmatrix}\) satisfying \[\begin{align*} &(A - 7I_3) v_3 = \mathbf{0} \\[9pt] \iff \hspace{10pt} &\begin{pmatrix} -2 & 2 & 0 \\ 2 & -2 & 0 \\ -3 & 4 & -1 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix} \\[9pt] \iff \hspace{10pt} &\begin{cases} -2x+2y=0 \\ 2x-2y=0 \\ -3x + 4y-z=0 \end{cases} \end{align*}\]

Similarly to the first eigenvalue, we solve these three equations simultaneously giving \[v_3 = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}.\]

Therefore \(\mathbf{A}\) has eigenvalues \(3,6,7\) with corresponding eigenvectors given respectively by \(\begin{pmatrix} 3 \\ -3 \\ 7 \end{pmatrix}, \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}, \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}\).

A symmetric \(n \times n\) matrix \(\mathbf{A}\) is positive definite if the quantity \(\mathbf{z}^{T} \mathbf{A z}\) is positive for all column vectors \(\mathbf{z}\) of length \(n\).

Note that \(\mathbf{A z}\) is the transformation of \(z\) under \(A\). It follows that \(\mathbf{z}^{T} \mathbf{A z}\) is the dot product of \(\mathbf{z}\) and \(\mathbf{A z}\). This being positive is equivalent to the angle between \(\mathbf{z}\) and \(\mathbf{A z}\) being less than \(\frac{\pi}{2}\). Therefore a positive definite matrix \(\mathbf{A}\) is informally a matrix that transforms all vectors \(\mathbf{z}\) in such a way that they still point in the same general direction.

Determine whether the matrix \(\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\) is positive definite.

For any vector \(\mathbf{z} = \begin{pmatrix} x \\ y \end{pmatrix}\), we have \[\mathbf{z}^T \mathbf{Az} = \begin{pmatrix} x & y \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} x & y \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = x^2 + y^2.\] Note for all real numbers \(x,y\), we have \(x^2 \geq 0\) and \(y^2 \geq 0\). It follows that \(\mathbf{z}^T \mathbf{Az} \geq 0\) for all \(\mathbf{z}\).
Determine whether the matrix \(\begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}\) is positive definite.

Set \(\mathbf{z} = \begin{pmatrix} -1 \\ 1 \end{pmatrix}\). We have \[\mathbf{z}^T \mathbf{Az} = \begin{pmatrix} -1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix} \begin{pmatrix} -1 \\ 1 \end{pmatrix} = \begin{pmatrix} -1 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ -1 \end{pmatrix} = -2 <0.\] Therefore \(\begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}\) is not positive definite.

Matrix calculations can be performed in R. To create a matrix variable you can use the matrix function. As input the function takes a vector, below this is `c(2, 4, 3, 1, 5, 7)’, representing the entries, and a parameter nrows representing the number of rows the entries should be sorted into.

A = matrix(  c(2, 4, 3, 1, 5, 7), nrow=2) 
A

R also allows for algebraic manipulations of matrices such as matrix addition, matrix multiplication and scalar multiplication.

B = matrix(  c(3, 7, 4, 9), nrow=2) 
C = matrix(  c(6, 2, 5, 8), nrow=2) 
B+C
B*C
2*B

Finally R allows for easy calculation of eigenvalues and eigenvectors of a square matrix.

D = matrix(  c(13, -4, 2, -4, 11, -2, 2, -2, 8), nrow=3)
ev <- eigen(D)
values <- ev$values 
vectors <- ev$vectors

values
vectors

R also allows for algebraic manipulations of matrices

6.2 Definition of Multivariate Normal Distribution

The normal distribution that we saw in Section 1.3 generalises to higher dimensions. Let \(n \in \mathbb{Z}_{\geq 1}\) represent the dimension we are interested in.

Informally a multivariate normal distribution can be defined as follows: Consider \(n\) standard normal distributions \(Z_1, Z_2, \ldots, Z_n\), that is, \(Z_i \sim N(0,1)\) for all \(i\) with \(1 \leq i \leq n\). From this, Form a vector of random variables \(\mathbf{Z} = \begin{pmatrix} Z_1 & Z_2 & \cdots & Z_n \end{pmatrix}^{T}\). Then for any \(n \times n\) non-singular matrix \(\mathbf{A}\) and any vector \(\mathbf{\mu} = \begin{pmatrix} \mu_1 & \mu_2 & \cdots & \mu_n \end{pmatrix}^T\), define \(\mathbf{X} = \mathbf{AZ}+\mathbf{\mu}\). Then the vector of random variables \(\mathbf{X} = \begin{pmatrix} X_1 & X_2 & \cdots & X_n \end{pmatrix}^T\) is distributed by a multivariate normal distribution.

This is saying that every component \(X_i\) is a linear combination of some fixed group of standard normal distributions, and a translation.

Formally we define the multivariate normal distribution by the joint probability density function of \(X_1, X_2, \ldots, X_n\).

Let \(\mathbf{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_n \end{pmatrix}\) and \(\mathbf{\Sigma} = (\sigma_{ij})\) be an \(n \times n\) real, symmetric, positive definite matrix all of whose eigenvalues are positive.

A vector of random variables \(\mathbf{X} = \begin{pmatrix} X_1 \\ X_2 \\ \vdots \\ X_n \end{pmatrix}\) is said to have an \(\mathit{n}\)-dimensional normal distribution with parameters \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\), denoted \(N_n(\mathbf{\mu},\mathbf{\Sigma})\), if the joint PDF of \(\mathbf{X}\) is given by \[ f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^n\cdot \det(\mathbf{\Sigma})}} \exp \left\{ -\frac{1}{2} (\mathbf{x}-\mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{x}-\mathbf{\mu}) \right\}.\]

The PDF outlined in the formal Definition 6.2.1 can be derived from the informal construction of \(\mathbf{X}\) at the beginning of the section by taking \(\mathbf{\Sigma} = \mathbf{AA}^T\).

Note for \(n=1\), Definition 6.2.1 simplifies to give the PDF of the one-dimensional normal distribution.

The multivariate normal distribution has the following important properties:

  • The expectation of \(X_i\) is given by the \(i^{th}\) entry of \(\mathbf{\mu}\), that is, \(E[X_i]=\mu_i\) for all \(i\). Succinctly, the vector \(\mathbf{\mu}\) is the expectation vector of \(X\);

  • The covariance \(\text{Cov}(X_i,X_j)\), reducing to \(\text{Var}(X_i)\) when \(i=j\), is given by the element in the \(i^{th}\) row and \(j^{th}\) column of \(\mathbf{\Sigma}\), that is, \(\sigma_{ij} = \text{Cov}(X_i,X_j)\) for all \(i\) and \(j\). Succinctly, the matrix \(\Sigma\) is the variance-covariance matrix of \(\mathbf{X}\). It follows that if \(X_1,X_2,\dots,X_n\) are independent then \(\sigma_{ij}=0\) for all \(i \neq j\);

These two points indicate why \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\) are the two input parameters for the multivariate normal distribution.

6.3 Two-Dimensional Normal Distribution

In this section we study the \(2\)-dimensional normal distribution, often referred to as the bivariate normal distribution.

When \(n=2\), or another suitably small value, we can explicitly label the elements of \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\) as \(\begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix}\) and \(\begin{pmatrix} \sigma_1^2 & \sigma_1 \sigma_2 \rho \\ \sigma_1 \sigma_2 \rho & \sigma_2^2 \end{pmatrix}\) respectively. Here \(\sigma_1\) is the standard deviation of \(X_1\), \(\sigma_2\) is the standard deviation of \(X_2\) and \(\rho\) is the correlation coefficient \(\rho(X_1,X_2)\).

Why are the upper right and lower left entries of \(\mathbf{\Sigma}\) equal to \(\sigma_1 \sigma_2 \rho\)?

Show that \(\Sigma\) is non-singular if and only if \(\lvert \rho \rvert <1\) and \(\sigma_1\sigma_2 \neq 0\).

Substitution of these values into Definition 6.2.1 gives \[\begin{align*} f_{X_1,X_2}(x_1,x_2) = \frac{1}{2 \pi \sigma_1 \sigma_2 \sqrt{1-\rho^2}} exp \left\{ - \frac{1}{2(1-\rho^2)} \left[ \left( \frac{x_1-\mu_1}{\sigma_1} \right)^2 \\[5pt] \hspace{150pt}-2\rho \left( \frac{x_1-\mu_1}{\sigma_1} \right)\left( \frac{x_2-\mu_2}{\sigma_2} \right) + \left( \frac{x_2-\mu_2}{\sigma_2} \right)^2 \right] \right\} \end{align*}\]

The following code creates a contour plot of the PDF above with \(\mathbf{\mu} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}\) and \(\mathbf{\Sigma} = \begin{pmatrix} 2 & -1 \\ -1 & 2 \end{pmatrix}\).

library(mnormt)

#create bivariate normal distribution
x     <- seq(-3, 3, 0.1) 
y     <- seq(-3, 3, 0.1)
mu    <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f     <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z     <- outer(x, y, f)

#create contour plot
contour(x, y, z)

A sample of size \(200\) can be observed from a bivariate normal distribution using the following R code.

library(plotly)
library(mixtools)

# Set up means, standard deviations, correlations
mu1 <- 1;  mu2 <- 2
sig1 <- 1;  sig2 <- 1;  rho <- 0;

# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)

#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)

In the above code, by setting mu as a vector of length \(n\), and \(Sigma\) as a \(n \times n\) matrix before inputting them into rmvnorm allows one to simulate an \(n\)-dimensional normal distribution.

6.4 Properties of Multivariate Normal Distribution

The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.

Let \(\mathbf{X} \sim N_n(\mathbf{\mu},\mathbf{\Sigma})\) and consider some \(p \times n\) matrix \(\mathbf{D}\). Define a new random vector \(\mathbf{Z} = \mathbf{D}\mathbf{X}\). Then \(\mathbf{Z} \sim N_p (\mathbf{D} \mathbf{\mu}, \mathbf{D} \mathbf{\Sigma} \mathbf{D}^T)\);

As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.

The marginal distribution of each component \(X_i\) is a one-dimensional normal distribution.

This is a direct consequence of Theorem 6.4.1 by setting \(\mathbf{D}=(0,\dots,0,1,0,\dots,0)\) where the \(1\) is in the \(i^{th}\) position.

Suppose \(\mathbf{X}=(X_1,X_2,X_3)^T \sim N_3(\mathbf{0},\mathbf{\Sigma})\), where \[\mathbf{\Sigma} = \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix}.\] Find the distribution of \(Z=X_1+X_2\).



Writing \(Z = X_1+X_2\), in the form \(\mathbf{DX}\) requires \(\mathbf{D} = \begin{pmatrix} 1 & 1 & 0 \end{pmatrix}\). By the properties of a multivariate normal distribution \[Z \sim N(\mathbf{D0}, \mathbf{D \Sigma D}^T).\] Calculate \(\mathbf{D0}=\mathbf{0}\) and \[\begin{align*} \mathbf{D} \mathbf{\Sigma} \mathbf{D}^T &= \begin{pmatrix} 1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} \\[5pt] &= \begin{pmatrix} 3 & 5 & 0 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix} \\[5pt] &= 8. \end{align*}\] Therefore, \(Z \sim N(0,8)\).
Suppose \(\mathbf{X}=(X_1,X_2,X_3)^T \sim N_3(\mathbf{0},\mathbf{\Sigma})\), where \[\mathbf{\Sigma} = \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix}.\] Determine the constant \(c\) such that \(Z_1 = 2X_1 + cX_2\) and \(Z_2 = 2X_1 + cX_3\) are independent.



Writing \(\mathbf{Z} = \begin{pmatrix} Z_1 \\ Z_2 \end{pmatrix} = \begin{pmatrix} 2X_1 + cX_2 \\ 2X_1 + cX_3 \end{pmatrix}\) in the form \(\mathbf{DX}\) requires

\[\mathbf{D} = \begin{pmatrix} 2 & c & 0 \\ 2 & 0 & c \end{pmatrix}.\]

By the properties of a multivariate normal distribution,

\[\mathbf{Z} \sim N_2(\mathbf{D0}, \mathbf{D \Sigma D}^T).\]

Calculate \(\mathbf{D0} = \mathbf{0}\) and

\[\begin{align*} \mathbf{D \Sigma D}^T &= \begin{pmatrix} 2 & c & 0 \\ 2 & 0 & c \end{pmatrix} \begin{pmatrix} 2 & 1 & 0 \\ 1 & 4 & 0 \\ 0 & 0 & 5 \end{pmatrix} \begin{pmatrix} 2 & 2 \\ c & 0 \\ 0 & c \end{pmatrix} \\[5pt] &= \begin{pmatrix} 4+c & 2+4c & 0 \\ 4 & 2 & 5c \end{pmatrix} \begin{pmatrix} 2 & 2 \\ c & 0 \\ 0 & c \end{pmatrix} \\[5pt] &= \begin{pmatrix} 8+4c+4c^2 & 8+2c \\ 8+2c & 8+5c^2 \end{pmatrix}. \end{align*}\]

For \(Z_1\) and \(Z_2\) to be independent necessarily \(\text{Cov}(Z_1,Z_2) = 8+2c = 0\). Therefore \(c=-4\).