Chapter 2 Multivariate Normal Distributions

To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. This begs the question, are there any standard probability distributions in higher dimensions?

2.1 Definition of Multivariate Normal Distribution

The normal distribution generalises to higher dimensions. Let nZ1 represent the dimension we are interested in.

Informally a multivariate normal distribution can be defined as follows: Consider n standard normal distributions Z1,Z2,,Zn, that is, ZiN(0,1) for all i with 1in. From this, Form a vector of random variables Z=(Z1Z2Zn)T. Then for any n×n non-singular matrix A and any vector μ=(μ1μ2μn)T, define X=AZ+μ. Then the vector of random variables X=(X1X2Xn)T is distributed by a multivariate normal distribution.

This is saying that every component Xi is a linear combination of some fixed group of standard normal distributions, and a translation.

Formally we define the multivariate normal distribution by the joint probability density function of X1,X2,,Xn.

Let μ=(μ1μ2μn) and Σ=(σij) be an n×n real, symmetric, positive definite matrix all of whose eigenvalues are positive.

A vector of random variables X=(X1X2Xn) is said to have an n-dimensional normal distribution with parameters μ and Σ, denoted Nn(μ,Σ), if the joint PDF of X is given by fX(x)=1(2π)ndet(Σ)exp{12(xμ)TΣ1(xμ)}.

The PDF outlined in the formal Definition 2.1.1 can be derived from the informal construction of X at the beginning of the section by taking Σ=AAT.

Note for n=1, Definition 2.1.1 simplifies to give the PDF of the one-dimensional normal distribution.

The multivariate normal distribution has the following important properties:

  • The expectation of Xi is given by the ith entry of μ, that is, E[Xi]=μi for all i. Succinctly, the vector μ is the expectation vector of X;

  • The covariance Cov(Xi,Xj), reducing to Var(Xi) when i=j, is given by the element in the ith row and jth column of Σ, that is, σij=Cov(Xi,Xj) for all i and j. Succinctly, the matrix Σ is the variance-covariance matrix of X. It follows that if X1,X2,,Xn are independent then σij=0 for all ij;

These two points indicate why μ and Σ are the two input parameters for the multivariate normal distribution.

Consider xX=Np(μx,Σx,x) and yY=Nq(μy,Σy,y). Then x and y are independent if and only if Cov(x,y)=0p,q.

Clearly if x and y are independent then Cov(x,y)=0p,q by well-established results in statistics. It remains to show the other direction of the implication.

Assume Cov(x,y)=0p,q. Then z=(xy)Np+q(μ,Σ),where μ=(μxμx) and Σ=(Σx,x0p,q0p,qΣy,y). Now calculate that fYX(yx)=fX,Y(x,y)fX(x)=1(2π)p+qdet(Σ)exp{12(zμ)TΣ(zμ)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)p+qdet(Σx,x)det(Σy,y)exp{12(xμx)TΣx,x(xμx)12(yμy)TΣy,y(yμy)}1(2π)pdet(Σx,x)exp{12(xμx)TΣx,x(xμx)}=1(2π)qdet(Σy,y)exp{12(yμy)TΣy,y(yμy)}=fY(y). Therefore x and y are independent.

In practice, one often has observations that are assumed to come from a multivariate distribution whose parameters are not specified. The mean and the the sample variance matrix can be approximated using the usual sample statistics.

Consider x1,,xnNp(μ,Σ). Show that x¯Np(μ,1nΣ).

If x1,,xn is an independent and identically distributed random sample from Np(μ,Σ), then the sample mean x¯=1ni=1nxi and sample variance matrix S=1ni=1n(xix¯)(xix¯)T are independent.

Since x1,,xnNp(μ,Σ) from Exercise 2.1, we know x¯Np(μ,1nΣ). Set yi=xix¯. Then Cov(x¯,yi)=Cov(x¯,xix¯)=Cov(x¯,xi)Cov(x¯,x¯)=Cov(1nj=1nxj,xi)Cov(x¯,x¯)=1nj=1nE[(xjμ)(xiμ)T]E[(x¯μ)(x¯μ)T]=1nΣ1nΣ=0p×p.

Therefore x¯ and yi are independent. Since S=1ni=1n(xix¯)(xix¯)T=1ni=1nyiyiT, it follows that x¯ and S are independent.

2.2 Two-Dimensional Normal Distribution

In this section we study the 2-dimensional normal distribution, often referred to as the bivariate normal distribution.

When n=2, or another suitably small value, we can explicitly label the elements of μ and Σ as (μ1μ2) and (σ12σ1σ2ρσ1σ2ρσ22) respectively. Here σ1 is the standard deviation of X1, σ2 is the standard deviation of X2 and ρ is the correlation coefficient ρ(X1,X2).

Why are the upper right and lower left entries of Σ equal to σ1σ2ρ?

Show that Σ is non-singular if and only if |ρ|<1 and σ1σ20.

Substitution of these values into Definition 2.1.1 gives fX1,X2(x1,x2)=12πσ1σ21ρ2exp{12(1ρ2)[(x1μ1σ1)22ρ(x1μ1σ1)(x2μ2σ2)+(x2μ2σ2)2]}

The following code creates a contour plot of the PDF above with μ=(00) and Σ=(2112).

library(mnormt)

#create bivariate normal distribution
x     <- seq(-3, 3, 0.1) 
y     <- seq(-3, 3, 0.1)
mu    <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f     <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z     <- outer(x, y, f)

#create contour plot
contour(x, y, z)

A sample of size 200 can be observed from a bivariate normal distribution using the following R code.

library(plotly)
library(mixtools)

# Set up means, standard deviations, correlations
mu1 <- 1;  mu2 <- 2
sig1 <- 1;  sig2 <- 1;  rho <- 0;

# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)

#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)

In the above code, by setting mu as a vector of length n, and Sigma as a n×n matrix before inputting them into rmvnorm allows one to simulate an n-dimensional normal distribution.

2.3 Properties of Multivariate Normal Distribution

The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.

Let XNn(μ,Σ) and consider some p×n matrix D. Define a new random vector Z=DX. Then ZNp(Dμ,DΣDT);

As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.

The marginal distribution of each component Xi is a one-dimensional normal distribution.

This is a direct consequence of Theorem 2.3.1 by setting D=(0,,0,1,0,,0) where the 1 is in the ith position.

Suppose X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005). Find the distribution of Z=X1+X2.



Writing Z=X1+X2, in the form DX requires D=(110). By the properties of a multivariate normal distribution ZN(D0,DΣDT). Calculate D0=0 and DΣDT=(110)(210140005)(110)=(350)(110)=8. Therefore, ZN(0,8).
Suppose X=(X1,X2,X3)TN3(0,Σ), where Σ=(210140005). Determine the constant c such that Z1=2X1+cX2 and Z2=2X1+cX3 are independent.



Writing Z=(Z1Z2)=(2X1+cX22X1+cX3) in the form DX requires

D=(2c020c).

By the properties of a multivariate normal distribution,

ZN2(D0,DΣDT).

Calculate D0=0 and

DΣDT=(2c020c)(210140005)(22c00c)=(4+c2+4c0425c)(22c00c)=(8+4c+4c28+2c8+2c8+5c2).

The independence of Z1 and Z2 is equivalent to Cov(Z1,Z2)=8+2c=0 by Proposition 2.1.2. Therefore c=4.

2.4 Inference Testing

Let x1,,xn be a random sample from Np(μ,Σ) where μ is unknown, but the covariance matrix Σ is known. Suppose one wished to do hypothesis testing for the mean of the population from the sample. In particular, how can one conduct the hypothesis test H0:μ=aversusH1:μa where a is some specified fixed vector, to infer on the population mean μ.

If xNp(μ,Σ), and Σ is positive definite. Then (xμ)TΣ1(xμ)χp2.

Define y=Σ12(xμ). Then by Theorem 2.3.1 yNp(0,Ip), and so by Corollary 2.3.2 the components of y have independent univariate distributions with mean 0 and variance 1.

Then (xμ)TΣ1(xμ)=(Σ12(xμ))T(Σ12(xμ))=yTy=i=1pyi2. Since each yiN(0,1) and are pairwise independent, it follows by the definition of the Chi-squared distribution that (xμ)TΣ1(xμ)χp2.

By Exercise 2.1 x¯Np(μ,1nΣ)n12x¯Np(n12μ,Σ). Define the test statistic ζ2=n(x¯a)TΣ1(x¯a). Assuming the null hypothesis H0 is true, the test statistic can be written ζ2=n(x¯μ)TΣ1(x¯μ)=(n12x¯n12μ)TΣ1(n12x¯n12μ) and so by Proposition 2.4.1, the test statistic ζ2 follows a χp2 distribution. Therefore at significance level α, one should reject the null hypothesis H0 if ζ2>χp,α2, where χp,α2 is the upper α quantile of the χp2 distribution. The p-value is given by p=P(χp2>ζ2). Furthermore the 100(1α)% confidence region for μ is {a:ζ2χp,α2} which will be an ellipsoid.

Download the file weight-height_data.csv from the Moodle page. The data details the weights and heights of 5000 adult males in stones and inches respectively. Assume that the data follows a N2(μ,Σ) distribution where Σ=(8.248.948.9391.3). A recent article in a men’s health magazine claims that men have an average height of 68.97 inches and an average weight of 186.93 stone. Use R to conduct a hypothesis to determine whether you are 95% sure that this claim is accurate.



Begin by loading the data into R:

df <- read.csv("weight-height_data.csv")
print(head(df))
##     Height   Weight
## 1 73.84702 241.8936
## 2 68.78190 162.3105
## 3 74.11011 212.7409
## 4 71.73098 220.0425
## 5 69.88180 206.3498
## 6 67.25302 152.2122

The entirety of the data can be visualised using the plot() function. Note that the data follows an ellipse shape, typical of a multivariate normal distribution.

plot(df[,1],df[,2],
     main="Scatter Plot of Height and Weight of Males",
     xlab="Height",
     ylab="Weight",
     cex=0.3,
     pch=19
     )

The question is equivalent to conducting the hypothesis test H0:μ=(68.7186.93)versusH1:μ(68.7186.93) at the 95% significance level. By the above theory, one should calculate the test statistic ζ2. This can be completed by the following code. Note that the sample mean x¯ is calculated using the mean() function that belongs to the matlib library. and then stored in a variable sample_mean. The covariance matrix Σ given in the question is stored in a variable cov_mat, and the proposed mean is stored in a variable a.

library(matlib)
sample_mean = c(mean(df[,1]), mean(df[,2]))
a = c(68.97,186.93)
cov_mat = matrix(c(8.2, 48.9, 48.9, 391.3), nrow=2, ncol=2)
zeta2 = 5000 * t(sample_mean - a) %*% inv(cov_mat) %*%  (sample_mean - a)
zeta2
##          [,1]
## [1,] 4.956178

The test statistic should finally be compared to the critical value c=χ2,0.952, that is the value such that P(χ22c)=0.95. This can be computed using the built-in quantile function for the chi-squared distribution in R called qchisq().

zeta2 < qchisq(0.95, 2)
##      [,1]
## [1,] TRUE

The output indicates that the test statistic is within an acceptable range, and we do not reject the null hypothesis. Indeed the p-value of the observed value for the test statistic can be calculated using the distribution function of the chi-squared distribution. In R, this is given by pchisq().

1-pchisq(zeta2, 2)
##            [,1]
## [1,] 0.08390342

One might think conducting p different hypothesis tests, one z-test to consider the mean of each of the univariate normal distributions governing each variable, which would be equivalent to the above hypothesis test. However this is not the case. Since the correlations between the variables is not taken into consideration, some of the p separate univariate z-tests may indicate a rejection of the null hypotheses while the multivariate analogue described above.