Chapter 6 Multivariate Normal Distributions

To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. However throughout this course we have studied joint probability distribution functions: that is the study of multiple quantities simultaneously. This begs the question, are there any standard probability distributions in higher dimensions?

6.1 Matrices: symmetry, eigenvalues and postive definiteness

Throughout this chapter, we will use the language of matrices freely. As a point of notation, we will represent all matrices with variable names in a bold font, for example A. A vector will be thought of as a matrix in this sense: it is a matrix with either 1 row or 1 column.

Further when we use the notation A=(aij), this labels the entry of A in the ith row and jth column by ai,j.

We consolidate our knowledge on the definition of symmetric matrices, the determinant of an n×n matrix, the definitions of eigenvalues and eigenvectors, and what it means for matrices to be positive definite.

Let A be a n×n matrix.Then A is said to be symmetric if A=AT.

Definition 6.1.1 is equivalent to A=(aij) satisfying aij=aji for all i,j with 1i,jn.

Are the following matrices symmetric? 1.(1001);2.(1001); 3.(034307470);4.(923218380).



The answers are

  1. Yes;

  2. Yes;

  3. No;

  4. Yes.

Let A be a n×n matrix. A scalar λ is called an eigenvalue of A if there is a non-zero vector v such that Av=λv. The vector v corresponding to λ is called an eigenvector of A.

It was shown that an eigenvalue λ of an n×n matrix A satisfies the characteristic polynomial: det(AλIn)=0. Recall that the determinant of any n×n matrix M is calculated by fixing some value j with 1jn, and finding det(M)=i=1n(1)i+jaijMij, where Mij is the minor of M, that is, the determinant of the (n1)×(n1) matrix obtained by deleting the ith row and jth column of M.

What are the eigenvalues and eigenvectors of A=(520250346)?



Note that A is a 3×3 matrix, so in the notation of Definition 6.1.3, n=3. Calculate AλI3=(520250346)λ(100010001)=(5λ2025λ0346λ).

So det(AλI3)=0det(5λ2025λ0346λ)=0(5λ)[(5λ)(6λ)0]2[2(6λ)0]=0(5λ)(5λ)(6λ)4(6λ)=0(6λ)[(5λ)(5λ)4]=0(6λ)(λ210λ+21)=0(6λ)(λ3)(λ7)=0

So λ1=3,λ2=6,λ3=7 are the eigenvalues.

For λ1=3, we are look for a eigenvector v1=(xyz) satisfying (A3I3)v1=0(220220343)(xyz)=(000){2x+2y=0(1)2x+2y=03x+4y+3z=0(2)

Equation (1) rearranges to y=x. Substituting this into equation (2), we obtain 3x4x+3z=03z=7x

So clearly x=3,y=3 and z=7 is a solution. So we have an eigenvector v1=(337).

For λ2=6, we are looking for eigenvector v2=(xyz) satisfying (A6I3)v1=0(120210340)(xyz)=(000){x+2y=02xy=03x+4y=0

Similarly to the previous eigenvalue, we solve these three equations simultaneously giving v2=(001).

For λ3=7, we are looking for eigenvector v3=(xyz) satisfying (A7I3)v3=0(220220341)(xyz)=(000){2x+2y=02x2y=03x+4yz=0

Similarly to the first eigenvalue, we solve these three equations simultaneously giving v3=(111).

Therefore A has eigenvalues 3,6,7 with corresponding eigenvectors given respectively by (337),(001),(111).

A symmetric n×n matrix A is positive definite if the quantity zTAz is positive for all column vectors z of length n.

Note that Az is the transformation of z under A. It follows that zTAz is the dot product of z and Az. This being positive is equivalent to the angle between z and Az being less than π2. Therefore a positive definite matrix A is informally a matrix that transforms all vectors z in such a way that they still point in the same general direction.

Determine whether the matrix (1001) is positive definite.

For any vector z=(xy), we have zTAz=(xy)(1001)(xy)=(xy)(xy)=x2+y2. Note for all real numbers x,y, we have x20 and y20. It follows that zTAz0 for all z.
Determine whether the matrix (1221) is positive definite.

Set z=(11). We have zTAz=(11)(1221)(11)=(11)(11)=2<0. Therefore (1221) is not positive definite.

Matrix calculations can be performed in R. To create a matrix variable you can use the matrix function. As input the function takes a vector, below this is `c(2, 4, 3, 1, 5, 7)’, representing the entries, and a parameter nrows representing the number of rows the entries should be sorted into.

A = matrix(  c(2, 4, 3, 1, 5, 7), nrow=2) 
A

R also allows for algebraic manipulations of matrices such as matrix addition, matrix multiplication and scalar multiplication.

B = matrix(  c(3, 7, 4, 9), nrow=2) 
C = matrix(  c(6, 2, 5, 8), nrow=2) 
B+C
B*C
2*B

Finally R allows for easy calculation of eigenvalues and eigenvectors of a square matrix.

D = matrix(  c(13, -4, 2, -4, 11, -2, 2, -2, 8), nrow=3)
ev <- eigen(D)
values <- ev$values 
vectors <- ev$vectors

values
vectors

R also allows for algebraic manipulations of matrices

6.2 Definition of Multivariate Normal Distribution

The normal distribution that we saw in Section 1.3 generalises to higher dimensions. Let nZ1 represent the dimension we are interested in.

Informally a multivariate normal distribution can be defined as follows: Consider n standard normal distributions Z1,Z2,,Zn, that is, ZiN(0,1) for all i with 1in. From this, Form a vector of random variables Z=(Z1Z2Zn)T. Then for any n×n non-singular matrix A and any vector μ=(μ1μ2μn)T, define X=AZ+μ. Then the vector of random variables X=(X1X2Xn)T is distributed by a multivariate normal distribution.

This is saying that every component Xi is a linear combination of some fixed group of standard normal distributions, and a translation.

Formally we define the multivariate normal distribution by the joint probability density function of X1,X2,,Xn.

Let μ=(μ1μ2μn) and Σ=(σij) be an n×n real, symmetric, positive definite matrix all of whose eigenvalues are positive.

A vector of random variables X=(X1X2Xn) is said to have an n-dimensional normal distribution with parameters μ and Σ, denoted Nn(μ,Σ), if the joint PDF of X is given by fX(x)=1(2π)ndet(Σ)exp{12(xμ)TΣ1(xμ)}.

The PDF outlined in the formal Definition 6.2.1 can be derived from the informal construction of X at the beginning of the section by taking Σ=AAT.

Note for n=1, Definition 6.2.1 simplifies to give the PDF of the one-dimensional normal distribution.

The multivariate normal distribution has the following important properties:

  • The expectation of Xi is given by the ith entry of μ, that is, E[Xi]=μi for all i. Succinctly, the vector μ is the expectation vector of X;

  • The covariance Cov(Xi,Xj), reducing to Var(Xi) when i=j, is given by the element in the ith row and jth column of Σ, that is, σij=Cov(Xi,Xj) for all i and j. Succinctly, the matrix Σ is the variance-covariance matrix of X. It follows that if X1,X2,,Xn are independent then σij=0 for all ij;

These two points indicate why μ and Σ are the two input parameters for the multivariate normal distribution.

6.3 Two-Dimensional Normal Distribution

In this section we study the 2-dimensional normal distribution, often referred to as the bivariate normal distribution.

When n=2, or another suitably small value, we can explicitly label the elements of μ and Σ as (μ1μ2) and (σ12σ1σ2ρσ1σ2ρσ22) respectively. Here σ1 is the standard deviation of X1, σ2 is the standard deviation of X2 and ρ is the correlation coefficient ρ(X1,X2).

Why are the upper right and lower left entries of Σ equal to σ1σ2ρ?

Show that Σ is non-singular if and only if |ρ|<1 and σ1σ20.

Substitution of these values into Definition 6.2.1 gives fX1,X2(x1,x2)=12πσ1σ21ρ2exp{12(1ρ2)[(x1μ1σ1)22ρ(x1μ1σ1)(x2μ2σ2)+(x2μ2σ2)2]}

The following code creates a contour plot of the PDF above with μ=(00) and Σ=(2112).

library(mnormt)

#create bivariate normal distribution
x     <- seq(-3, 3, 0.1) 
y     <- seq(-3, 3, 0.1)
mu    <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f     <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z     <- outer(x, y, f)

#create contour plot
contour(x, y, z)

A sample of size 200 can be observed from a bivariate normal distribution using the following R code.

library(plotly)
library(mixtools)

# Set up means, standard deviations, correlations
mu1 <- 1;  mu2 <- 2
sig1 <- 1;  sig2 <- 1;  rho <- 0;

# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)

#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)

In the above code, by setting mu as a vector of length n, and Sigma as a n×n matrix before inputting them into rmvnorm allows one to simulate an n-dimensional normal distribution.

6.4 Properties of Multivariate Normal Distribution

The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.

Let XNn(μ,Σ) and consider some p×n matrix D. Define a new random vector Z=DX. Then ZNp(Dμ,DΣDT);

As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.

The marginal distribution of each component Xi is a one-dimensional normal distribution.

This is a direct consequence of Theorem 6.4.1 by setting D=(0,,0,1,0,,0) where the 1 is in the ith position.

Suppose X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005). Find the distribution of Z=X1+X2.



Writing Z=X1+X2, in the form DX requires D=(110). By the properties of a multivariate normal distribution ZN(D0,DΣDT). Calculate D0=0 and DΣDT=(110)(210140005)(110)=(350)(110)=8. Therefore, ZN(0,8).
Suppose X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005). Determine the constant c such that Z1=2X1+cX2 and Z2=2X1+cX3 are independent.



Writing Z=(Z1Z2)=(2X1+cX22X1+cX3) in the form DX requires

D=(2c020c).

By the properties of a multivariate normal distribution,

ZN2(D0,DΣDT).

Calculate D0=0 and

DΣDT=(2c020c)(210140005)(22c00c)=(4+c2+4c0425c)(22c00c)=(8+4c+4c28+2c8+2c8+5c2).

For Z1 and Z2 to be independent necessarily Cov(Z1,Z2)=8+2c=0. Therefore c=4.