Chapter 2 Multivariate Normal Distributions

To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. This begs the question, are there any standard probability distributions in higher dimensions?

2.1 Definition of Multivariate Normal Distribution

The normal distribution generalises to higher dimensions. Let nZ1nZ1 represent the dimension we are interested in.

Informally a multivariate normal distribution can be defined as follows: Consider nn standard normal distributions Z1,Z2,,ZnZ1,Z2,,Zn, that is, ZiN(0,1)ZiN(0,1) for all ii with 1in1in. From this, Form a vector of random variables Z=(Z1Z2Zn)TZ=(Z1Z2Zn)T. Then for any n×nn×n non-singular matrix AA and any vector μ=(μ1μ2μn)Tμ=(μ1μ2μn)T, define X=AZ+μX=AZ+μ. Then the vector of random variables X=(X1X2Xn)TX=(X1X2Xn)T is distributed by a multivariate normal distribution.

This is saying that every component XiXi is a linear combination of some fixed group of standard normal distributions, and a translation.

Formally we define the multivariate normal distribution by the joint probability density function of X1,X2,,XnX1,X2,,Xn.

Let μ=(μ1μ2μn)μ=⎜ ⎜ ⎜ ⎜μ1μ2μn⎟ ⎟ ⎟ ⎟ and Σ=(σij)Σ=(σij) be an n×nn×n real, symmetric, positive definite matrix all of whose eigenvalues are positive.

A vector of random variables X=(X1X2Xn)X=⎜ ⎜ ⎜ ⎜X1X2Xn⎟ ⎟ ⎟ ⎟ is said to have an nn-dimensional normal distribution with parameters μμ and ΣΣ, denoted Nn(μ,Σ)Nn(μ,Σ), if the joint PDF of XX is given by fX(x)=1(2π)ndet(Σ)exp{12(xμ)TΣ1(xμ)}.fX(x)=1(2π)ndet(Σ)exp{12(xμ)TΣ1(xμ)}.

The PDF outlined in the formal Definition 2.1.1 can be derived from the informal construction of XX at the beginning of the section by taking Σ=AATΣ=AAT.

Note for n=1n=1, Definition 2.1.1 simplifies to give the PDF of the one-dimensional normal distribution.

The multivariate normal distribution has the following important properties:

  • The expectation of XiXi is given by the ithith entry of μμ, that is, E[Xi]=μiE[Xi]=μi for all ii. Succinctly, the vector μμ is the expectation vector of XX;

  • The covariance Cov(Xi,Xj)Cov(Xi,Xj), reducing to Var(Xi)Var(Xi) when i=ji=j, is given by the element in the ithith row and jthjth column of ΣΣ, that is, σij=Cov(Xi,Xj)σij=Cov(Xi,Xj) for all ii and jj. Succinctly, the matrix ΣΣ is the variance-covariance matrix of XX. It follows that if X1,X2,,XnX1,X2,,Xn are independent then σij=0σij=0 for all ijij;

These two points indicate why μμ and ΣΣ are the two input parameters for the multivariate normal distribution.

Consider xX=Np(μx,Σx,x)xX=Np(μx,Σx,x) and yY=Nq(μy,Σy,y)yY=Nq(μy,Σy,y). Then xx and yy are independent if and only if Cov(x,y)=0p,qCov(x,y)=0p,q.

Clearly if xx and yy are independent then Cov(x,y)=0p,qCov(x,y)=0p,q by well-established results in statistics. It remains to show the other direction of the implication.

Assume Cov(x,y)=0p,qCov(x,y)=0p,q. Then z=(xy)Np+q(μ,Σ),where μ=(μxμx) and Σ=(Σx,x0p,q0p,qΣy,y).z=(xy)Np+q(μ,Σ),where μ=(μxμx) and Σ=(Σx,x0p,q0p,qΣy,y). Now calculate that fYX(yx)=fX,Y(x,y)fX(x)=1(2π)p+qdet(Σ)exp{12(zμ)TΣ(zμ)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)p+qdet(Σx,x)det(Σy,y)exp{12(xμx)TΣx,x(xμx)12(yμy)TΣy,y(yμy)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)qdet(Σy,y)exp{12(yμy)TΣy,y(yμy)}=fY(y).fYX(yx)=fX,Y(x,y)fX(x)=1(2π)p+qdet(Σ)exp{12(zμ)TΣ(zμ)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)p+qdet(Σx,x)det(Σy,y)exp{12(xμx)TΣx,x(xμx)12(yμy)TΣy,y(yμy)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)qdet(Σy,y)exp{12(yμy)TΣy,y(yμy)}=fY(y). Therefore xx and yy are independent.

In practice, one often has observations that are assumed to come from a multivariate distribution whose parameters are not specified. The mean and the the sample variance matrix can be approximated using the usual sample statistics.

Consider x1,,xnNp(μ,Σ)x1,,xnNp(μ,Σ). Show that ˉxNp(μ,1nΣ)¯xNp(μ,1nΣ).

If x1,,xnx1,,xn is an independent and identically distributed random sample from Np(μ,Σ)Np(μ,Σ), then the sample mean ˉx=1nni=1xi¯x=1nni=1xi and sample variance matrix S=1nni=1(xiˉx)(xiˉx)TS=1nni=1(xi¯x)(xi¯x)T are independent.

Since x1,,xnNp(μ,Σ)x1,,xnNp(μ,Σ) from Exercise 2.1, we know ˉxNp(μ,1nΣ)¯xNp(μ,1nΣ). Set yi=xiˉxyi=xi¯x. Then Cov(ˉx,yi)=Cov(ˉx,xiˉx)=Cov(ˉx,xi)Cov(ˉx,ˉx)=Cov(1nnj=1xj,xi)Cov(ˉx,ˉx)=1nnj=1E[(xjμ)(xiμ)T]E[(ˉxμ)(ˉxμ)T]=1nΣ1nΣ=0p×p.Cov(¯x,yi)=Cov(¯x,xi¯x)=Cov(¯x,xi)Cov(¯x,¯x)=Cov(1nnj=1xj,xi)Cov(¯x,¯x)=1nnj=1E[(xjμ)(xiμ)T]E[(¯xμ)(¯xμ)T]=1nΣ1nΣ=0p×p.

Therefore ˉx¯x and yiyi are independent. Since S=1nni=1(xiˉx)(xiˉx)T=1nni=1yiyTi,S=1nni=1(xi¯x)(xi¯x)T=1nni=1yiyTi, it follows that ˉx¯x and SS are independent.

2.2 Two-Dimensional Normal Distribution

In this section we study the 22-dimensional normal distribution, often referred to as the bivariate normal distribution.

When n=2n=2, or another suitably small value, we can explicitly label the elements of μμ and ΣΣ as (μ1μ2)(μ1μ2) and (σ21σ1σ2ρσ1σ2ρσ22)(σ21σ1σ2ρσ1σ2ρσ22) respectively. Here σ1σ1 is the standard deviation of X1X1, σ2σ2 is the standard deviation of X2X2 and ρρ is the correlation coefficient ρ(X1,X2)ρ(X1,X2).

Why are the upper right and lower left entries of ΣΣ equal to σ1σ2ρσ1σ2ρ?

Show that ΣΣ is non-singular if and only if |ρ|<1|ρ|<1 and σ1σ20σ1σ20.

Substitution of these values into Definition 2.1.1 gives fX1,X2(x1,x2)=12πσ1σ21ρ2exp{12(1ρ2)[(x1μ1σ1)22ρ(x1μ1σ1)(x2μ2σ2)+(x2μ2σ2)2]}fX1,X2(x1,x2)=12πσ1σ21ρ2exp{12(1ρ2)[(x1μ1σ1)22ρ(x1μ1σ1)(x2μ2σ2)+(x2μ2σ2)2]}

The following code creates a contour plot of the PDF above with μ=(00)μ=(00) and Σ=(2112)Σ=(2112).

library(mnormt)

#create bivariate normal distribution
x     <- seq(-3, 3, 0.1) 
y     <- seq(-3, 3, 0.1)
mu    <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f     <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z     <- outer(x, y, f)

#create contour plot
contour(x, y, z)

A sample of size 200200 can be observed from a bivariate normal distribution using the following R code.

library(plotly)
library(mixtools)

# Set up means, standard deviations, correlations
mu1 <- 1;  mu2 <- 2
sig1 <- 1;  sig2 <- 1;  rho <- 0;

# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)

#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)

In the above code, by setting mu as a vector of length nn, and SigmaSigma as a n×nn×n matrix before inputting them into rmvnorm allows one to simulate an nn-dimensional normal distribution.

2.3 Properties of Multivariate Normal Distribution

The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.

Let XNn(μ,Σ)XNn(μ,Σ) and consider some p×np×n matrix DD. Define a new random vector Z=DXZ=DX. Then ZNp(Dμ,DΣDT)ZNp(Dμ,DΣDT);

As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.

The marginal distribution of each component XiXi is a one-dimensional normal distribution.

This is a direct consequence of Theorem 2.3.1 by setting D=(0,,0,1,0,,0)D=(0,,0,1,0,,0) where the 11 is in the ithith position.

Suppose X=(X1,X2,X3)TN3(0,Σ)X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005).Σ=210140005. Find the distribution of Z=X1+X2Z=X1+X2.



Writing Z=X1+X2Z=X1+X2, in the form DXDX requires D=(110)D=(110). By the properties of a multivariate normal distribution ZN(D0,DΣDT).ZN(D0,DΣDT). Calculate D0=0D0=0 and DΣDT=(110)(210140005)(110)=(350)(110)=8.DΣDT=(110)210140005110=(350)110=8. Therefore, ZN(0,8)ZN(0,8).
Suppose X=(X1,X2,X3)TN3(0,Σ)X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005).Σ=210140005. Determine the constant cc such that Z1=2X1+cX2Z1=2X1+cX2 and Z2=2X1+cX3Z2=2X1+cX3 are independent.



Writing Z=(Z1Z2)=(2X1+cX22X1+cX3)Z=(Z1Z2)=(2X1+cX22X1+cX3) in the form DXDX requires

D=(2c020c).D=(2c020c).

By the properties of a multivariate normal distribution,

ZN2(D0,DΣDT).ZN2(D0,DΣDT).

Calculate D0=0D0=0 and

DΣDT=(2c020c)(210140005)(22c00c)=(4+c2+4c0425c)(22c00c)=(8+4c+4c28+2c8+2c8+5c2).DΣDT=(2c020c)21014000522c00c=(4+c2+4c0425c)22c00c=(8+4c+4c28+2c8+2c8+5c2).

The independence of Z1Z1 and Z2Z2 is equivalent to Cov(Z1,Z2)=8+2c=0Cov(Z1,Z2)=8+2c=0 by Proposition 2.1.2. Therefore c=4c=4.

2.4 Inference Testing

Let x1,,xnx1,,xn be a random sample from Np(μ,Σ)Np(μ,Σ) where μμ is unknown, but the covariance matrix ΣΣ is known. Suppose one wished to do hypothesis testing for the mean of the population from the sample. In particular, how can one conduct the hypothesis test H0:μ=aversusH1:μaH0:μ=aversusH1:μa where aa is some specified fixed vector, to infer on the population mean μμ.

If xNp(μ,Σ)xNp(μ,Σ), and ΣΣ is positive definite. Then (xμ)TΣ1(xμ)χ2p.(xμ)TΣ1(xμ)χ2p.

Define y=Σ12(xμ)y=Σ12(xμ). Then by Theorem 2.3.1 yNp(0,Ip)yNp(0,Ip), and so by Corollary 2.3.2 the components of yy have independent univariate distributions with mean 00 and variance 11.

Then (xμ)TΣ1(xμ)=(Σ12(xμ))T(Σ12(xμ))=yTy=pi=1y2i.(xμ)TΣ1(xμ)=(Σ12(xμ))T(Σ12(xμ))=yTy=pi=1y2i. Since each yiN(0,1)yiN(0,1) and are pairwise independent, it follows by the definition of the Chi-squared distribution that (xμ)TΣ1(xμ)χ2p.(xμ)TΣ1(xμ)χ2p.

By Exercise 2.1 ˉxNp(μ,1nΣ)n12ˉxNp(n12μ,Σ).¯xNp(μ,1nΣ)n12¯xNp(n12μ,Σ). Define the test statistic ζ2=n(ˉxa)TΣ1(ˉxa).ζ2=n(¯xa)TΣ1(¯xa). Assuming the null hypothesis H0H0 is true, the test statistic can be written ζ2=n(ˉxμ)TΣ1(ˉxμ)=(n12ˉxn12μ)TΣ1(n12ˉxn12μ)ζ2=n(¯xμ)TΣ1(¯xμ)=(n12¯xn12μ)TΣ1(n12¯xn12μ) and so by Proposition 2.4.1, the test statistic ζ2ζ2 follows a χ2pχ2p distribution. Therefore at significance level αα, one should reject the null hypothesis H0H0 if ζ2>χ2p,αζ2>χ2p,α, where χ2p,αχ2p,α is the upper αα quantile of the χ2pχ2p distribution. The pp-value is given by p=P(χ2p>ζ2)p=P(χ2p>ζ2). Furthermore the 100(1α)%100(1α)% confidence region for μμ is {a:ζ2χ2p,α}{a:ζ2χ2p,α} which will be an ellipsoid.

Download the file weight-height_data.csv from the Moodle page. The data details the weights and heights of 50005000 adult males in stones and inches respectively. Assume that the data follows a N2(μ,Σ)N2(μ,Σ) distribution where Σ=(8.248.948.9391.3).Σ=(8.248.948.9391.3). A recent article in a men’s health magazine claims that men have an average height of 68.9768.97 inches and an average weight of 186.93186.93 stone. Use R to conduct a hypothesis to determine whether you are 95%95% sure that this claim is accurate.



Begin by loading the data into R:

df <- read.csv("weight-height_data.csv")
print(head(df))
##     Height   Weight
## 1 73.84702 241.8936
## 2 68.78190 162.3105
## 3 74.11011 212.7409
## 4 71.73098 220.0425
## 5 69.88180 206.3498
## 6 67.25302 152.2122

The entirety of the data can be visualised using the plot() function. Note that the data follows an ellipse shape, typical of a multivariate normal distribution.

plot(df[,1],df[,2],
     main="Scatter Plot of Height and Weight of Males",
     xlab="Height",
     ylab="Weight",
     cex=0.3,
     pch=19
     )

The question is equivalent to conducting the hypothesis test H0:μ=(68.7186.93)versusH1:μ(68.7186.93)H0:μ=(68.7186.93)versusH1:μ(68.7186.93) at the 95%95% significance level. By the above theory, one should calculate the test statistic ζ2ζ2. This can be completed by the following code. Note that the sample mean ˉx¯x is calculated using the mean() function that belongs to the matlib library. and then stored in a variable sample_mean. The covariance matrix ΣΣ given in the question is stored in a variable cov_mat, and the proposed mean is stored in a variable a.

library(matlib)
sample_mean = c(mean(df[,1]), mean(df[,2]))
a = c(68.97,186.93)
cov_mat = matrix(c(8.2, 48.9, 48.9, 391.3), nrow=2, ncol=2)
zeta2 = 5000 * t(sample_mean - a) %*% inv(cov_mat) %*%  (sample_mean - a)
zeta2
##          [,1]
## [1,] 4.956178

The test statistic should finally be compared to the critical value c=χ22,0.95c=χ22,0.95, that is the value such that P(χ22c)=0.95. This can be computed using the built-in quantile function for the chi-squared distribution in R called qchisq().

zeta2 < qchisq(0.95, 2)
##      [,1]
## [1,] TRUE

The output indicates that the test statistic is within an acceptable range, and we do not reject the null hypothesis. Indeed the p-value of the observed value for the test statistic can be calculated using the distribution function of the chi-squared distribution. In R, this is given by pchisq().

1-pchisq(zeta2, 2)
##            [,1]
## [1,] 0.08390342

One might think conducting p different hypothesis tests, one z-test to consider the mean of each of the univariate normal distributions governing each variable, which would be equivalent to the above hypothesis test. However this is not the case. Since the correlations between the variables is not taken into consideration, some of the p separate univariate z-tests may indicate a rejection of the null hypotheses while the multivariate analogue described above.