Chapter 59: multivariate normal distribution
Definition 40.1 probability density function (PDF) of normal distribution (= Gaussian distribution)
N(x∣μ,σ2)=1√2πσexp{−(x−μ)22σ2}
If a continuous random variable X follows a normal distribution with mean 0 and variance σ2
X∼n(μ,σ2)=N(μ,σ2)⇔fX(x)=1σ√2πe−12(x−μσ)2=e−12(x−μσ)2σ√2π=n(x∣μ,σ2)=1√2πσexp{−(x−μ)22σ2}=N(x∣μ,σ2)
A random variable X can be standardized by subtracting the mean μ and dividing by the standard deviation σ, resulting in the standardized random variable Z
Z=X−μσ or z=x−μσ
The standardized random variable Z follows the standard normal distribution
Z∼n(0,12)=N(0,12)⇔fZ(z)=e−12(x−01)21⋅√2π=1√2πexp{−z22}=n(z∣0,12)=N(z∣0,12)
To generalize from univariate random variables to multivariate random vectors, a random vector15
Z=⟨Z1,Z2,⋯,Zp⟩=[Z1Z2⋯Zp]⊺=[Z1Z2⋮Zp]
with p random variable components is said to follow the standard multivariate normal distribution if and only if its joint PDF is given by
fZ(z)=1(2π)p/2exp{−z⊺z2}=1(2π)p/2exp{−z⋅z2}
(59.1) can be rewritten as the following
fZ(z)=1√2π1√2π⋯1√2π⏟p times exp{−z212−z222−⋯−z2p2}=1√2πexp{−z212}1√2πexp{−z222}⋯1√2πexp{−z2p2}=f(z1)f(z2)⋯f(zp)
where
fZi(zi)=1√2πexp{−z2i2}=f(zi)⇒Zi∼N(0,12)=n(0,12)
fZi(zi)=∫∞−∞⋯∫∞−∞fZ(z1,…,zi−1,zi,zi+1,…,zp)dz1⋯dzi−1dzi+1⋯dzp=∫∞−∞⋯∫∞−∞f(z1)⋯f(zi−1)f(zi)f(zi+1)⋯f(zp)dz1⋯dzi−1dzi+1⋯dzp=f(zi)∫∞−∞f(z1)dz1⋯∫∞−∞f(zi−1)dzi−1∫∞−∞f(zi+1)dzi+1⋯∫∞−∞f(zp)dzp=f(zi)(???)=1√2πexp{−z2i2}
Definition 18.1 covariance matrix of a random vector8
C[X]=Cov[X]=V[X]=E[[X−E(X)][X−E(X)]⊺]
X=⟨X1,X2,⋯,Xp⟩=[X1X2⋯Xp]⊺=[X1X2⋮Xp]
E[X]=⟨E[X1],E[X2],⋯,E[Xp]⟩=[E[X1]E[X2]⋯E[Xp]]⊺=[E[X1]E[X2]⋮E[Xp]]
X−E[X]=[X1−E[X1]X2−E[X2]⋮Xp−E[Xp]]
[X−E(X)][X−E(X)]⊺=[X1−E[X1]X2−E[X2]⋮Xp−E[Xp]][X1−E[X1]X2−E[X2]⋯Xp−E[Xp]]=[(X1−E[X1])(X1−E[X1])(X1−E[X1])(X2−E[X2])⋯(X1−E[X1])(Xp−E[Xp])(X2−E[X2])(X1−E[X1])(X2−E[X2])(X2−E[X2])⋯(X2−E[X2])(Xp−E[Xp])⋮⋮⋱⋮(Xp−E[Xp])(X1−E[X1])(Xp−E[Xp])(X2−E[X2])⋯(Xp−E[Xp])(Xp−E[Xp])]=[(X1−E[X1])2⋯(X1−E[X1])(Xp−E[Xp])⋮⋱⋮(Xp−E[Xp])(X1−E[X1])⋯(Xp−E[Xp])2]
E[[X−E(X)][X−E(X)]⊺]=E[(X1−E[X1])2⋯(X1−E[X1])(Xp−E[Xp])⋮⋱⋮(Xp−E[Xp])(X1−E[X1])⋯(Xp−E[Xp])2]=[E[(X1−E[X1])2]⋯E[(X1−E[X1])(Xp−E[Xp])]⋮⋱⋮E[(Xp−E[Xp])(X1−E[X1])]⋯E[(Xp−E[Xp])2]]=[V(X1,X1)⋯V(X1,Xp)⋮⋱⋮V(Xp,X1)⋯V(Xp,Xp)]=[V(X1,X1)V(X1,X2)⋯V(X1,Xp)V(X2,X1)V(X2,X2)⋯V(X2,Xp)⋮⋮⋱⋮V(Xp,X1)V(Xp,X2)⋯V(Xp,Xp)]=[V(X1)⋯V(X1,Xp)⋮⋱⋮V(Xp,X1)⋯V(Xp)]=[V(X1)V(X1,X2)⋯V(X1,Xp)V(X2,X1)V(X2)⋯V(X2,Xp)⋮⋮⋱⋮V(Xp,X1)V(Xp,X2)⋯V(Xp)]=[V(X1)⋯C(X1,Xp)⋮⋱⋮C(Xp,X1)⋯V(Xp)]=[V(X1)C(X1,X2)⋯C(X1,Xp)C(X2,X1)V(X2)⋯C(X2,Xp)⋮⋮⋱⋮C(Xp,X1)C(Xp,X2)⋯V(Xp)]=[σ21⋯σ1p⋮⋱⋮σp1⋯σ2p]=[σ21σ12⋯σ1pσ21σ22⋯σ2p⋮⋮⋱⋮σp1σp2⋯σ2p]=[σ11σ12⋯σ1pσ21σ22⋯σ2p⋮⋮⋱⋮σp1σp2⋯σpp]=[σij]p×p=Σ
X∼D(μ,Σ)=d(μX,ΣX)=d(E[X],C[X])=d(E[X],V[X])
Z∼N(μZ,ΣZ)=n(E[Z],V[Z])
E[Z]=[E[Z1]E[Z2]⋮E[Zp]]=[E[Zi]]p×1⇒E[Zi]=∫∞−∞zifZi(zi)dzi(???)=∫∞−∞zie−12z2i√2πdzi=0⇒E[Z]=0⇒Z∼N(μZ=0,ΣZ)=n(0,V[Z])
V(Zi)=∫∞−∞(zi−μZi)2fZi(zi)dzi(???)=∫∞−∞(zi−0)2e−12z2i√2πdzi=1
V(Zi,Zj)i≠j⇒Zi,Zjare independent=0
V[Z]=[V(Z1)V(Z1,Z2)⋯V(Z1,Zp)V(Z2,Z1)V(Z2)⋯V(Z2,Zp)⋮⋮⋱⋮V(Zp,Z1)V(Zp,Z2)⋯V(Zp)]=[σ11σ12⋯σ1pσ21σ22⋯σ2p⋮⋮⋱⋮σp1σp2⋯σpp]=[σ21σ12⋯σ1pσ21σ22⋯σ2p⋮⋮⋱⋮σp1σp2⋯σ2p]???=[10⋯001⋯0⋮⋮⋱⋮00⋯1]=Ip×p=Ip=I
Z∼N(μZ,ΣZ)=n(E[Z],V[Z])=N(0,I)⇔{μZ=E[Z]=0=[0]p=[0]p×1ΣZ=V[Z]=I=Ip=Ip×p
Z=[Z1Z2⋮Zp]=[X1−μ1σ1X2−μ2σ2⋮Xp−μpσp]=[1σ10⋯001σ2⋯0⋮⋮⋱⋮00⋯1σp][X1−μ1X2−μ2⋮Xp−μp]=[1σ10⋯001σ2⋯0⋮⋮⋱⋮00⋯1σp]([X1X2⋮Xp]−[μ1μ2⋮μp])=B−1(X−μ)⇒X=BZ+μ
X=BZ+μ=T(Z)
Σ=ΣX=V[X]=V[BZ+μ]=BV[Z]B⊺=BIB⊺=BB⊺
Consider two infinitesimal volumes of p-dimensional parallelepipeds in the different Rp spaces16
Vx=[x1,x1+dx1]×[x2,x2+dx2]×⋯×[xp,xp+dxp]
and
Vz=[z1,z1+dz1]×[z2,z2+dz2]×⋯×[zp,zp+dzp]
Their relationship under linear transformation is
Vx=T(Vz)=[T(z1),T(z1)+T(dz1)]×[T(z2),T(z2)+T(dz2)]×⋯×[T(zp),T(zp)+T(dzp)]
and
dxi=∑j∂xi∂zjdzj
For examples in 2 dimension,
[dx1dx2]=[∂x1∂z1∂x1∂z2∂x2∂z1∂x2∂z2][dz1dz2] Two element infinitesimal one-directional vectors of Z transformed into another space of X are
T(dz1)=[∂x1∂z1∂x1∂z2∂x2∂z1∂x2∂z2][dz10]=[∂x1∂z1dz1∂x2∂z1dz1]
and
T(dz2)=[∂x1∂z1∂x1∂z2∂x2∂z1∂x2∂z2][0dz2]=[∂x1∂z2dz2∂x2∂z2dz2]
Their corresponding area(volume) in the space of X is
∫AxdAx=∫Axdx1dx2=∫T(Az)dx1dx2=∫Az|[T(dz1)T(dz2)]|=∫Az|∂x1∂z1dz1∂x1∂z2dz2∂x2∂z1dz1∂x2∂z2dz2|=∫Az|∂x1∂z1∂x1∂z2∂x2∂z1∂x2∂z2|dz1dz2=∫Az|J|dAz
To generalize for volumes in p dimension,
∫VxdVx=∫Vxdx1dx2⋯dxp=∫T(Vz)dx1dx2⋯dxp=∫Az|[T(dz1)T(dz2)⋯T(dzp)]|=∫Vz|[∂xi∂zjdzj]p×p|=∫Vz|[∂xi∂zj]p×p|dz1dz2⋯dzp=∫Vz|J|dVz
i.e.
∫VxdVx=∫Vz|J|dVz
where J is a Jacobian matrix
J=[∂xi∂zj]p×p=∂x∂z
or |J| is a Jacobian determinant or simply Jacobian
|J|=|∂xi∂zj|p×p=|∂x∂z|
The probability of the same event should be invariant under transformation.
∫VxfX(x)dVx=∫VxfX(x)dx1dx2⋯dxp=∫T(Vz)fX(x)dx1dx2⋯dxp=∫VzfZ(z)dVz=∫VzfZ(z)dz1dz2⋯dzp
i.e.
∫VxfX(x)dVx=∫VzfZ(z)dVz
{∫VxfX(x)dVx=∫VzfZ(z)dVz???∫VxdVx=∫Vz|J|dVz???
Z=B−1(X−μ)z=B−1(x−μ)X=BZ+μx=Bz+μJ=[∂xi∂zj]p×p=∂x∂z=B|J|=|∂xi∂zj|p×p=|∂x∂z|=|B|
∫VzfZ(z)dVz???=∫VxfX(x)dVx=∫VxdVxfX(x)???=∫Vz|J|dVzfX(x(z))=∫VzfX(x(z))|J|dVzfZ(z)⇓=fX(x(z))|J|fX(x(z))⇓=|J|−1fZ(z)???=|J|−11(2π)p/2exp{−z⊺z2}fX(x)⇓=|J|−1fZ(z(x))???,???=|B|−1fZ(B−1(x−μ))=|B|−1(2π)−p/2exp{−12[B−1(x−μ)]⊺[B−1(x−μ)]}=|B|−1/2|B|−1/2(2π)−p/2exp{−12(x−μ)⊺(B−1)⊺B−1(x−μ)}=|B|−1/2|B⊺|−1/2(2π)−p/2exp{−12(x−μ)⊺(B⊺)−1B−1(x−μ)}=|BB⊺|−1/2(2π)−p/2exp{−12(x−μ)⊺(BB⊺)−1(x−μ)}???=|Σ|−1/2(2π)−p/2exp{−12(x−μ)⊺Σ−1(x−μ)}=(|Σ|(2π)p)−1/2exp{−12(x−μ)⊺Σ−1(x−μ)}
Definition 18.2 probability density function (PDF) of multivariate normal distribution (= multivariate Gaussian distribution)
N(x∣μ,Σ)=fX(x)=(|Σ|(2π)p)−1/2exp{−12(x−μ)⊺Σ−1(x−μ)}
Definition 18.3 correlation coefficient
ρij=σij√σii√σjj=σij√σ2i√σ2j=σijσiσj=V(Xi,Xj)√V(Xi)√V(Xj)=R(Xi,Xj)
59.1 bivariate normal distribution
p=2 is the case of bivariate normal distribution
Σ=[σij]2×2=[σ11σ12σ21σ22]=[σ21σ12σ21σ22]=[σ21σ1σ2ρ12σ2σ1ρ21σ22]=[σ21σ1σ2ρσ2σ1ρσ22]
ρ12=ρ=ρ21
|Σ|=|σ21σ1σ2ρ12σ2σ1ρ21σ22|=σ21σ22(1−ρ12ρ21)=σ21σ22(1−ρ2)
[abcd]−1=1|abcd|[d−b−ca]
Σ−1=1|Σ|[σ22−σ1σ2ρ−σ2σ1ρσ21]=1σ21σ22(1−ρ2)[σ22−σ1σ2ρ−σ2σ1ρσ21]=1(1−ρ2)[1σ21−ρσ1σ2−ρσ2σ11σ22]
N(x=[x1x2]|μ=[μ1μ2],Σ=[σ21σ1σ2ρσ2σ1ρσ22])=(|Σ|(2π)p=2)−1/2exp{−12(x−μ)⊺Σ−1(x−μ)}=(σ21σ22(1−ρ2)(2π)2)−1/2exp{−12[x1−μ1x2−μ2]⊺Σ−1[x1−μ1x2−μ2]}=12πσ1σ2√1−ρ2exp{−12[x1−μ1x2−μ2]⊺1(1−ρ2)[1σ21−ρσ1σ2−ρσ2σ11σ22][x1−μ1x2−μ2]}=12πσ1σ2√1−ρ2exp{−12(1−ρ2)[x1−μ1x2−μ2]⊺[1σ21−ρσ1σ2−ρσ2σ11σ22][x1−μ1x2−μ2]}=12πσ1σ2√1−ρ2exp{−12(1−ρ2)[(x1−μ1σ1)2−2ρ(x1−μ1σ1)(x2−μ2σ2)+(x2−μ2σ2)2]}
Definition 18.4 probability density function (PDF) of bivariate normal distribution (= bivariate Gaussian distribution)
\begin{aligned} & \mathcal{N}\left(\begin{bmatrix}x_{1}\\ x_{2} \end{bmatrix} \middle| \begin{bmatrix}\mu_{1}\\ \mu_{2} \end{bmatrix},\begin{bmatrix}\sigma_{1}^{2} & \sigma_{1}\sigma_{2}\rho\\ \sigma_{2}\sigma_{1}\rho & \sigma_{2}^{2} \end{bmatrix}\right)\\ = & \dfrac{1}{2\pi\sigma_{1}\sigma_{2}\sqrt{1-\rho^{2}}}\exp\left\{ \frac{-1}{2\left(1-\rho^{2}\right)}\left[\left(\dfrac{x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}-2\rho\left(\dfrac{x_{1}-\mu_{1}}{\sigma_{1}}\right)\left(\dfrac{x_{2}-\mu_{2}}{\sigma_{2}}\right)+\left(\dfrac{x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}\right]\right\} \end{aligned}