Chapter 2 Multivariate Normal Distributions
To date, we have studied a number of standard probability distributions: uniform, Bernoulli, binomial, geometric, negative binomial, Poisson, exponential, gamma, normal. These are all random variables that govern exactly one quantity. This begs the question, are there any standard probability distributions in higher dimensions?
2.1 Definition of Multivariate Normal Distribution
The normal distribution generalises to higher dimensions. Let n∈Z≥1n∈Z≥1 represent the dimension we are interested in.
Informally a multivariate normal distribution can be defined as follows: Consider nn standard normal distributions Z1,Z2,…,ZnZ1,Z2,…,Zn, that is, Zi∼N(0,1)Zi∼N(0,1) for all ii with 1≤i≤n1≤i≤n. From this, Form a vector of random variables Z=(Z1Z2⋯Zn)TZ=(Z1Z2⋯Zn)T. Then for any n×nn×n non-singular matrix AA and any vector μ=(μ1μ2⋯μn)Tμ=(μ1μ2⋯μn)T, define X=AZ+μX=AZ+μ. Then the vector of random variables X=(X1X2⋯Xn)TX=(X1X2⋯Xn)T is distributed by a multivariate normal distribution.
This is saying that every component XiXi is a linear combination of some fixed group of standard normal distributions, and a translation.
Formally we define the multivariate normal distribution by the joint probability density function of X1,X2,…,XnX1,X2,…,Xn.
Let μ=(μ1μ2⋮μn)μ=⎛⎜ ⎜ ⎜ ⎜⎝μ1μ2⋮μn⎞⎟ ⎟ ⎟ ⎟⎠ and Σ=(σij)Σ=(σij) be an n×nn×n real, symmetric, positive definite matrix all of whose eigenvalues are positive.
A vector of random variables X=(X1X2⋮Xn)X=⎛⎜ ⎜ ⎜ ⎜⎝X1X2⋮Xn⎞⎟ ⎟ ⎟ ⎟⎠ is said to have an nn-dimensional normal distribution with parameters μμ and ΣΣ, denoted Nn(μ,Σ)Nn(μ,Σ), if the joint PDF of XX is given by fX(x)=1√(2π)n⋅det(Σ)exp{−12(x−μ)TΣ−1(x−μ)}.fX(x)=1√(2π)n⋅det(Σ)exp{−12(x−μ)TΣ−1(x−μ)}.
The PDF outlined in the formal Definition 2.1.1 can be derived from the informal construction of XX at the beginning of the section by taking Σ=AATΣ=AAT.
Note for n=1n=1, Definition 2.1.1 simplifies to give the PDF of the one-dimensional normal distribution.
The multivariate normal distribution has the following important properties:
The expectation of XiXi is given by the ithith entry of μμ, that is, E[Xi]=μiE[Xi]=μi for all ii. Succinctly, the vector μμ is the expectation vector of XX;
The covariance Cov(Xi,Xj)Cov(Xi,Xj), reducing to Var(Xi)Var(Xi) when i=ji=j, is given by the element in the ithith row and jthjth column of ΣΣ, that is, σij=Cov(Xi,Xj)σij=Cov(Xi,Xj) for all ii and jj. Succinctly, the matrix ΣΣ is the variance-covariance matrix of XX. It follows that if X1,X2,…,XnX1,X2,…,Xn are independent then σij=0σij=0 for all i≠ji≠j;
These two points indicate why μμ and ΣΣ are the two input parameters for the multivariate normal distribution.
Consider x∼X=Np(μx,Σx,x)x∼X=Np(μx,Σx,x) and y∼Y=Nq(μy,Σy,y)y∼Y=Nq(μy,Σy,y). Then xx and yy are independent if and only if Cov(x,y)=0p,qCov(x,y)=0p,q.
Clearly if xx and yy are independent then Cov(x,y)=0p,qCov(x,y)=0p,q by well-established results in statistics. It remains to show the other direction of the implication.
Assume Cov(x,y)=0p,qCov(x,y)=0p,q. Then z=(xy)∼Np+q(μ,Σ),where μ=(μxμx) and Σ=(Σx,x0p,q0p,qΣy,y).z=(xy)∼Np+q(μ,Σ),where μ=(μxμx) and Σ=(Σx,x0p,q0p,qΣy,y). Now calculate that fY∣X(y∣x)=fX,Y(x,y)fX(x)=1√(2π)p+qdet(Σ)⋅exp{−12(z−μ)TΣ(z−μ)}1√(2π)pdet(Σx,x)⋅exp{−12(x−μx)TΣx,x(x−μx)}=1√(2π)p+qdet(Σx,x)det(Σy,y)⋅exp{−12(x−μx)TΣx,x(x−μx)−12(y−μy)TΣy,y(y−μy)}1√(2π)pdet(Σx,x)⋅exp{−12(x−μx)TΣx,x(x−μx)}=1√(2π)qdet(Σy,y)⋅exp{−12(y−μy)TΣy,y(y−μy)}=fY(y).fY∣X(y∣x)=fX,Y(x,y)fX(x)=1√(2π)p+qdet(Σ)⋅exp{−12(z−μ)TΣ(z−μ)}1√(2π)pdet(Σx,x)⋅exp{−12(x−μx)TΣx,x(x−μx)}=1√(2π)p+qdet(Σx,x)det(Σy,y)⋅exp{−12(x−μx)TΣx,x(x−μx)−12(y−μy)TΣy,y(y−μy)}1√(2π)pdet(Σx,x)⋅exp{−12(x−μx)TΣx,x(x−μx)}=1√(2π)qdet(Σy,y)⋅exp{−12(y−μy)TΣy,y(y−μy)}=fY(y). Therefore xx and yy are independent.
In practice, one often has observations that are assumed to come from a multivariate distribution whose parameters are not specified. The mean and the the sample variance matrix can be approximated using the usual sample statistics.
Consider x1,…,xn∼Np(μ,Σ)x1,…,xn∼Np(μ,Σ). Show that ˉx∼Np(μ,1nΣ)¯x∼Np(μ,1nΣ).
If x1,…,xnx1,…,xn is an independent and identically distributed random sample from Np(μ,Σ)Np(μ,Σ), then the sample mean ˉx=1nn∑i=1xi¯x=1nn∑i=1xi and sample variance matrix S=1nn∑i=1(xi−ˉx)(xi−ˉx)TS=1nn∑i=1(xi−¯x)(xi−¯x)T are independent.
Since x1,…,xn∼Np(μ,Σ)x1,…,xn∼Np(μ,Σ) from Exercise 2.1, we know ˉx∼Np(μ,1nΣ)¯x∼Np(μ,1nΣ). Set yi=xi−ˉxyi=xi−¯x. Then Cov(ˉx,yi)=Cov(ˉx,xi−ˉx)=Cov(ˉx,xi)−Cov(ˉx,ˉx)=Cov(1nn∑j=1xj,xi)−Cov(ˉx,ˉx)=1nn∑j=1E[(xj−μ)(xi−μ)T]−E[(ˉx−μ)(ˉx−μ)T]=1nΣ−1nΣ=0p×p.Cov(¯x,yi)=Cov(¯x,xi−¯x)=Cov(¯x,xi)−Cov(¯x,¯x)=Cov(1nn∑j=1xj,xi)−Cov(¯x,¯x)=1nn∑j=1E[(xj−μ)(xi−μ)T]−E[(¯x−μ)(¯x−μ)T]=1nΣ−1nΣ=0p×p.
Therefore ˉx¯x and yiyi are independent. Since S=1nn∑i=1(xi−ˉx)(xi−ˉx)T=1nn∑i=1yiyTi,S=1nn∑i=1(xi−¯x)(xi−¯x)T=1nn∑i=1yiyTi, it follows that ˉx¯x and SS are independent.
2.2 Two-Dimensional Normal Distribution
In this section we study the 22-dimensional normal distribution, often referred to as the bivariate normal distribution.
When n=2n=2, or another suitably small value, we can explicitly label the elements of μμ and ΣΣ as (μ1μ2)(μ1μ2) and (σ21σ1σ2ρσ1σ2ρσ22)(σ21σ1σ2ρσ1σ2ρσ22) respectively. Here σ1σ1 is the standard deviation of X1X1, σ2σ2 is the standard deviation of X2X2 and ρρ is the correlation coefficient ρ(X1,X2)ρ(X1,X2).
Why are the upper right and lower left entries of ΣΣ equal to σ1σ2ρσ1σ2ρ?
Show that ΣΣ is non-singular if and only if |ρ|<1|ρ|<1 and σ1σ2≠0σ1σ2≠0.
Substitution of these values into Definition 2.1.1 gives fX1,X2(x1,x2)=12πσ1σ2√1−ρ2exp{−12(1−ρ2)[(x1−μ1σ1)2−2ρ(x1−μ1σ1)(x2−μ2σ2)+(x2−μ2σ2)2]}fX1,X2(x1,x2)=12πσ1σ2√1−ρ2exp{−12(1−ρ2)[(x1−μ1σ1)2−2ρ(x1−μ1σ1)(x2−μ2σ2)+(x2−μ2σ2)2]}
The following code creates a contour plot of the PDF above with μ=(00)μ=(00) and Σ=(2−1−12)Σ=(2−1−12).
library(mnormt)
#create bivariate normal distribution
x <- seq(-3, 3, 0.1)
y <- seq(-3, 3, 0.1)
mu <- c(0, 0)
sigma <- matrix(c(2, -1, -1, 2), nrow=2)
f <- function(x, y) dmnorm(cbind(x, y), mu, sigma)
z <- outer(x, y, f)
#create contour plot
contour(x, y, z)
A sample of size 200200 can be observed from a bivariate normal distribution using the following R code.
library(plotly)
library(mixtools)
# Set up means, standard deviations, correlations
mu1 <- 1; mu2 <- 2
sig1 <- 1; sig2 <- 1; rho <- 0;
# Construct mean vector and (co)variance matrix
mu <- c(mu1,mu2)
Sigma <- matrix(c(sig1^2,rho*sig1*sig2,rho*sig1*sig2,sig2^2), ncol=2)
#plot all realisations
X=rmvnorm(n=200, mu, Sigma)
plot(X)
In the above code, by setting mu as a vector of length nn, and SigmaSigma as a n×nn×n matrix before inputting them into rmvnorm allows one to simulate an nn-dimensional normal distribution.
2.3 Properties of Multivariate Normal Distribution
The following theorem states that the transformation of a multivariate normal distribution is itself a multivariate normal distribution.
Let X∼Nn(μ,Σ)X∼Nn(μ,Σ) and consider some p×np×n matrix DD. Define a new random vector Z=DXZ=DX. Then Z∼Np(Dμ,DΣDT)Z∼Np(Dμ,DΣDT);
As a corollary of this theorem, we can easily calculate the marginal distributions of a multivariate normal distribution.
The marginal distribution of each component XiXi is a one-dimensional normal distribution.
This is a direct consequence of Theorem 2.3.1 by setting D=(0,…,0,1,0,…,0)D=(0,…,0,1,0,…,0) where the 11 is in the ithith position.
Suppose X=(X1,X2,X3)T∼N3(0,Σ)X=(X1,X2,X3)T∼N3(0,Σ), where Σ=(210140005).Σ=⎛⎜⎝210140005⎞⎟⎠. Find the distribution of Z=X1+X2Z=X1+X2.
Suppose X=(X1,X2,X3)T∼N3(0,Σ)X=(X1,X2,X3)T∼N3(0,Σ), where Σ=(210140005).Σ=⎛⎜⎝210140005⎞⎟⎠. Determine the constant cc such that Z1=2X1+cX2Z1=2X1+cX2 and Z2=2X1+cX3Z2=2X1+cX3 are independent.
Writing Z=(Z1Z2)=(2X1+cX22X1+cX3)Z=(Z1Z2)=(2X1+cX22X1+cX3) in the form DXDX requires
D=(2c020c).D=(2c020c).
By the properties of a multivariate normal distribution,
Z∼N2(D0,DΣDT).Z∼N2(D0,DΣDT).
Calculate D0=0D0=0 and
DΣDT=(2c020c)(210140005)(22c00c)=(4+c2+4c0425c)(22c00c)=(8+4c+4c28+2c8+2c8+5c2).DΣDT=(2c020c)⎛⎜⎝210140005⎞⎟⎠⎛⎜⎝22c00c⎞⎟⎠=(4+c2+4c0425c)⎛⎜⎝22c00c⎞⎟⎠=(8+4c+4c28+2c8+2c8+5c2).
The independence of Z1Z1 and Z2Z2 is equivalent to Cov(Z1,Z2)=8+2c=0Cov(Z1,Z2)=8+2c=0 by Proposition 2.1.2. Therefore c=−4c=−4.2.4 Inference Testing
Let x1,…,xnx1,…,xn be a random sample from Np(μ,Σ)Np(μ,Σ) where μμ is unknown, but the covariance matrix ΣΣ is known. Suppose one wished to do hypothesis testing for the mean of the population from the sample. In particular, how can one conduct the hypothesis test H0:μ=aversusH1:μ≠aH0:μ=aversusH1:μ≠a where aa is some specified fixed vector, to infer on the population mean μμ.
If x∼Np(μ,Σ)x∼Np(μ,Σ), and ΣΣ is positive definite. Then (x−μ)TΣ−1(x−μ)∼χ2p.(x−μ)TΣ−1(x−μ)∼χ2p.
Define y=Σ−12(x−μ)y=Σ−12(x−μ). Then by Theorem 2.3.1 y∼Np(0,Ip)y∼Np(0,Ip), and so by Corollary 2.3.2 the components of yy have independent univariate distributions with mean 00 and variance 11.
Then (x−μ)TΣ−1(x−μ)=(Σ−12(x−μ))T(Σ−12(x−μ))=yTy=p∑i=1y2i.(x−μ)TΣ−1(x−μ)=(Σ−12(x−μ))T(Σ−12(x−μ))=yTy=p∑i=1y2i. Since each yi∼N(0,1)yi∼N(0,1) and are pairwise independent, it follows by the definition of the Chi-squared distribution that (x−μ)TΣ−1(x−μ)∼χ2p.(x−μ)TΣ−1(x−μ)∼χ2p.
By Exercise 2.1 ˉx∼Np(μ,1nΣ)⟹n12ˉx∼Np(n12μ,Σ).¯x∼Np(μ,1nΣ)⟹n12¯x∼Np(n12μ,Σ). Define the test statistic ζ2=n(ˉx−a)TΣ−1(ˉx−a).ζ2=n(¯x−a)TΣ−1(¯x−a). Assuming the null hypothesis H0H0 is true, the test statistic can be written ζ2=n(ˉx−μ)TΣ−1(ˉx−μ)=(n12ˉx−n12μ)TΣ−1(n12ˉx−n12μ)ζ2=n(¯x−μ)TΣ−1(¯x−μ)=(n12¯x−n12μ)TΣ−1(n12¯x−n12μ) and so by Proposition 2.4.1, the test statistic ζ2ζ2 follows a χ2pχ2p distribution. Therefore at significance level αα, one should reject the null hypothesis H0H0 if ζ2>χ2p,αζ2>χ2p,α, where χ2p,αχ2p,α is the upper αα quantile of the χ2pχ2p distribution. The pp-value is given by p=P(χ2p>ζ2)p=P(χ2p>ζ2). Furthermore the 100(1−α)%100(1−α)% confidence region for μμ is {a:ζ2≤χ2p,α}{a:ζ2≤χ2p,α} which will be an ellipsoid.
Download the file weight-height_data.csv from the Moodle page. The data details the weights and heights of 50005000 adult males in stones and inches respectively. Assume that the data follows a N2(μ,Σ)N2(μ,Σ) distribution where Σ=(8.248.948.9391.3).Σ=(8.248.948.9391.3). A recent article in a men’s health magazine claims that men have an average height of 68.9768.97 inches and an average weight of 186.93186.93 stone. Use R to conduct a hypothesis to determine whether you are 95%95% sure that this claim is accurate.
Begin by loading the data into R:
## Height Weight
## 1 73.84702 241.8936
## 2 68.78190 162.3105
## 3 74.11011 212.7409
## 4 71.73098 220.0425
## 5 69.88180 206.3498
## 6 67.25302 152.2122
The entirety of the data can be visualised using the plot() function. Note that the data follows an ellipse shape, typical of a multivariate normal distribution.
plot(df[,1],df[,2],
main="Scatter Plot of Height and Weight of Males",
xlab="Height",
ylab="Weight",
cex=0.3,
pch=19
)
The question is equivalent to conducting the hypothesis test H0:μ=(68.7186.93)versusH1:μ≠(68.7186.93)H0:μ=(68.7186.93)versusH1:μ≠(68.7186.93) at the 95%95% significance level. By the above theory, one should calculate the test statistic ζ2ζ2. This can be completed by the following code. Note that the sample mean ˉx¯x is calculated using the mean() function that belongs to the matlib library. and then stored in a variable sample_mean. The covariance matrix ΣΣ given in the question is stored in a variable cov_mat, and the proposed mean is stored in a variable a.
library(matlib)
sample_mean = c(mean(df[,1]), mean(df[,2]))
a = c(68.97,186.93)
cov_mat = matrix(c(8.2, 48.9, 48.9, 391.3), nrow=2, ncol=2)
zeta2 = 5000 * t(sample_mean - a) %*% inv(cov_mat) %*% (sample_mean - a)
zeta2
## [,1]
## [1,] 4.956178
The test statistic should finally be compared to the critical value c=χ22,0.95c=χ22,0.95, that is the value such that P(χ22≤c)=0.95. This can be computed using the built-in quantile function for the chi-squared distribution in R called qchisq().
## [,1]
## [1,] TRUE
The output indicates that the test statistic is within an acceptable range, and we do not reject the null hypothesis. Indeed the p-value of the observed value for the test statistic can be calculated using the distribution function of the chi-squared distribution. In R, this is given by pchisq().
## [,1]
## [1,] 0.08390342
One might think conducting p different hypothesis tests, one z-test to consider the mean of each of the univariate normal distributions governing each variable, which would be equivalent to the above hypothesis test. However this is not the case. Since the correlations between the variables is not taken into consideration, some of the p separate univariate z-tests may indicate a rejection of the null hypotheses while the multivariate analogue described above.