Chapter 6 Multivariate Normal Distributions
To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. However throughout this course we have studied joint probability distribution functions: that is the study of multiple quantities simultaneously. This begs the question, are there any standard probability distributions in higher dimensions?
6.1 Matrices: symmetry, eigenvalues and postive definiteness
Throughout this chapter, we will use the language of matrices freely. As a point of notation, we will represent all matrices with variable names in a bold font, for example AA. A vector will be thought of as a matrix in this sense: it is a matrix with either 11 row or 11 column.
Further when we use the notation A=(aij)A=(aij), this labels the entry of AA in the ithith row and jthjth column by ai,jai,j.
We consolidate our knowledge on the definition of symmetric matrices, the determinant of an n×nn×n matrix, the definitions of eigenvalues and eigenvectors, and what it means for matrices to be positive definite.
Let AA be a n×nn×n matrix.Then AA is said to be symmetric if A=ATA=AT.
Definition 6.1.1 is equivalent to A=(aij)A=(aij) satisfying aij=ajiaij=aji for all i,ji,j with 1≤i,j≤n1≤i,j≤n.
Are the following matrices symmetric? 1.(1001);2.(100−1);1.(1001);2.(100−1); 3.(034−307−4−70);4.(9232−1−83−80).3.⎛⎜⎝034−307−4−70⎞⎟⎠;4.⎛⎜⎝9232−1−83−80⎞⎟⎠.
The answers are
Yes;
Yes;
No;
Yes.
Let AA be a n×nn×n matrix. A scalar λλ is called an eigenvalue of AA if there is a non-zero vector vv such that Av=λv.Av=λv. The vector vv corresponding to λλ is called an eigenvector of AA.
It was shown that an eigenvalue λλ of an n×nn×n matrix AA satisfies the characteristic polynomial: det(A−λIn)=0.det(A−λIn)=0. Recall that the determinant of any n×nn×n matrix MM is calculated by fixing some value j with 1≤j≤n, and finding det(M)=n∑i=1(−1)i+jaijMij, where Mij is the minor of M, that is, the determinant of the (n−1)×(n−1) matrix obtained by deleting the ith row and jth column of M.
What are the eigenvalues and eigenvectors of A=(520250−346)?
Note that A is a 3×3 matrix, so in the notation of Definition 6.1.3, n=3. Calculate
A−λI3=(520250−346)−λ(100010001)=(5−λ2025−λ0−346−λ).
So det(A−λI3)=0det(5−λ2025−λ0−346−λ)=0(5−λ)[(5−λ)(6−λ)−0]−2[2(6−λ)−0]=0(5−λ)(5−λ)(6−λ)−4(6−λ)=0(6−λ)[(5−λ)(5−λ)−4]=0(6−λ)(λ2−10λ+21)=0(6−λ)(λ−3)(λ−7)=0
So λ1=3,λ2=6,λ3=7 are the eigenvalues.
For λ1=3, we are look for a eigenvector v1=(xyz) satisfying (A−3I3)v1=0⟺(220220−343)(xyz)=(000)⟺{2x+2y=0(1)2x+2y=0−3x+4y+3z=0(2)
Equation (1) rearranges to y=−x. Substituting this into equation (2), we obtain −3x−4x+3z=03z=7x
So clearly x=3,y=−3 and z=7 is a solution. So we have an eigenvector v1=(3−37).
For λ2=6, we are looking for eigenvector v2=(xyz) satisfying (A−6I3)v1=0⟺(−1202−10−340)(xyz)=(000)⟺{−x+2y=02x−y=0−3x+4y=0
Similarly to the previous eigenvalue, we solve these three equations simultaneously giving v2=(001).
For λ3=7, we are looking for eigenvector v3=(xyz) satisfying (A−7I3)v3=0⟺(−2202−20−34−1)(xyz)=(000)⟺{−2x+2y=02x−2y=0−3x+4y−z=0
Similarly to the first eigenvalue, we solve these three equations simultaneously giving v3=(111).
Therefore A has eigenvalues 3,6,7 with corresponding eigenvectors given respectively by (3−37),(001),(111).A symmetric n×n matrix A is positive definite if the quantity zTAz is positive for all column vectors z of length n.
Note that Az is the transformation of z under A. It follows that zTAz is the dot product of z and Az. This being positive is equivalent to the angle between z and Az being less than π2. Therefore a positive definite matrix A is informally a matrix that transforms all vectors z in such a way that they still point in the same general direction.
Determine whether the matrix (1001) is positive definite.
For any vector z=(xy), we have zTAz=(xy)(1001)(xy)=(xy)(xy)=x2+y2. Note for all real numbers x,y, we have x2≥0 and y2≥0. It follows that zTAz≥0 for all z.
Determine whether the matrix (1221) is positive definite.
Set z=(−11). We have zTAz=(−11)(1221)(−11)=(−11)(1−1)=−2<0. Therefore (1221) is not positive definite.
Matrix calculations can be performed in R. To create a matrix variable you can use the matrix function. As input the function takes a vector, below this is `c(2, 4, 3, 1, 5, 7)’, representing the entries, and a parameter nrows representing the number of rows the entries should be sorted into.
R also allows for algebraic manipulations of matrices such as matrix addition, matrix multiplication and scalar multiplication.
Finally R allows for easy calculation of eigenvalues and eigenvectors of a square matrix.
D = matrix( c(13, -4, 2, -4, 11, -2, 2, -2, 8), nrow=3)
ev <- eigen(D)
values <- ev$values
vectors <- ev$vectors
values
vectors
R also allows for algebraic manipulations of matrices
6.2 Definition of Multivariate Normal Distribution
The normal distribution that we saw in Section 1.3 generalises to higher dimensions. Let n∈Z≥1 represent the dimension we are interested in.
Informally a multivariate normal distribution can be defined as follows: Consider n standard normal distributions Z1,Z2,…,Zn, that is, Zi∼N(0,1) for all i with 1≤i≤n. From this, Form a vector of random variables Z=(Z1Z2⋯Zn)T. Then for any n×n non-singular matrix A and any vector μ=(μ1μ2⋯μn)T, define X=AZ+μ. Then the vector of random variables X=(X1X2⋯Xn)T is distributed by a multivariate normal distribution.
This is saying that every component Xi is a linear combination of some fixed group of standard normal distributions, and a translation.
Formally we define the multivariate normal distribution by the joint probability density function of X1,X2,…,Xn.
Let μ=(μ1μ2⋮μn) and Σ=(σij) be an n×n real, symmetric, positive definite matrix all of whose eigenvalues are positive.
A vector of random variables X=(X1X2⋮Xn) is said to have an n-dimensional normal distribution with parameters μ and Σ, denoted Nn(μ,Σ), if the joint PDF of X is given by fX(x)=1√(2π)n⋅det(Σ)exp{−12(x−μ)TΣ−1(x−μ)}.
The PDF outlined in the formal Definition 6.2.1 can be derived from the informal construction of X at the beginning of the section by taking Σ=AAT.
Note for n=1, Definition 6.2.1 simplifies to give the PDF of the one-dimensional normal distribution.
The multivariate normal distribution has the following important properties:
The expectation of Xi is given by the ith entry of μ, that is, E[Xi]=μi for all i. Succinctly, the vector μ is the expectation vector of X;
The covariance Cov(Xi,Xj), reducing to Var(Xi) when i=j, is given by the element in the ith row and jth column of Σ, that is, σij=Cov(Xi,Xj) for all i and j. Succinctly, the matrix Σ is the variance-covariance matrix of X. It follows that if X1,X2,…,Xn are independent then σij=0 for all i≠j;
These two points indicate why μ and Σ are the two input parameters for the multivariate normal distribution.
6.3 Two-Dimensional Normal Distribution
In this section we study the 2-dimensional normal distribution, often referred to as the bivariate normal distribution.
When n=2, or another suitably small value, we can explicitly label the elements of μ and Σ as (μ1μ2) and (σ21σ1σ2ρσ1σ2ρσ22) respectively. Here σ1 is the standard deviation of X1, σ2 is the standard deviation of X2 and ρ is the correlation coefficient ρ(X1,X2).
Why are the upper right and lower left entries of Σ equal to σ1σ2ρ?
Show that Σ is non-singular if and only if |ρ|<1 and σ1σ2≠0.
Substitution of these values into Definition 6.2.1 gives fX1,X2(x1,x2)=12πσ1σ2√1−ρ2exp{−12(1−ρ2)[(x1−μ1σ1)2−2ρ(x1−μ1σ1)(x2−μ2σ2)+(x2−μ2σ2)2]}
The following code creates a contour plot of the PDF above with μ=(00) and Σ=(2−1−12).
library(mnormt)
#create bivariate normal distribution
x <- seq(-3, 3, 0.1)
y <- seq(-3, 3, 0.1)
mu <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z <- outer(x, y, f)
#create contour plot
contour(x, y, z)
A sample of size 200 can be observed from a bivariate normal distribution using the following R code.
library(plotly)
library(mixtools)
# Set up means, standard deviations, correlations
mu1 <- 1; mu2 <- 2
sig1 <- 1; sig2 <- 1; rho <- 0;
# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)
#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)
In the above code, by setting mu as a vector of length n, and Sigma as a n×n matrix before inputting them into rmvnorm allows one to simulate an n-dimensional normal distribution.
6.4 Properties of Multivariate Normal Distribution
The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.
Let X∼Nn(μ,Σ) and consider some p×n matrix D. Define a new random vector Z=DX. Then Z∼Np(Dμ,DΣDT);
As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.
The marginal distribution of each component Xi is a one-dimensional normal distribution.
This is a direct consequence of Theorem 6.4.1 by setting D=(0,…,0,1,0,…,0) where the 1 is in the ith position.
Suppose X=(X1,X2,X3)T∼N3(0,Σ), where Σ=(210140005). Find the distribution of Z=X1+X2.
Suppose X=(X1,X2,X3)T∼N3(0,Σ), where Σ=(210140005). Determine the constant c such that Z1=2X1+cX2 and Z2=2X1+cX3 are independent.
Writing Z=(Z1Z2)=(2X1+cX22X1+cX3) in the form DX requires
D=(2c020c).
By the properties of a multivariate normal distribution,
Z∼N2(D0,DΣDT).
Calculate D0=0 and
DΣDT=(2c020c)(210140005)(22c00c)=(4+c2+4c0425c)(22c00c)=(8+4c+4c28+2c8+2c8+5c2).
For Z1 and Z2 to be independent necessarily Cov(Z1,Z2)=8+2c=0. Therefore c=−4.