1.4 Multivariate Random Variables
In the previous we discuss the random variables in one dimension. This chapter will introduce the concept of multivariate random variables.
1.4.3 Families of multivariate distributions
1.4.3.1 Trinomial distribution
Let n be the number of trials, and X1, X2, X3 be the number of outcomes of each of the three categories, with probabilities p1, p2, p3 respectively. The probability mass function is given by:
P(X1=x1,X2=x2,X3=x3)=n!x1!x2!x3!px11px22px33
where x1+x2+x3=n and p1+p2+p3=1.
1.4.3.2 Bivariate hypergeometric distribution
Let N be the population size, K1 and K2 be the number of items of type 1 and type 2 respectively. Let n be the sample size. The probability mass function for X1 and X2 being the number of items of type 1 and type 2 in the sample is:
P(X_1 = x_1, X_2 = x_2) = \frac{\binom{K_1}{x_1} \binom{K_2}{x_2} \binom{N - K_1 - K_2}{n - x_1 - x_2}}{\binom{N}{n}}
1.4.3.3 Multivariate normal distribution
The probability density function of a k-dimensional multivariate normal distribution with mean vector \boldsymbol{\mu} and covariance matrix \boldsymbol{\Sigma} is:
f(\mathbf{x}) = \frac{1}{(2\pi)^{k/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)
1.4.3.4 Wishart distribution
Let \mathbf{X}_1, \ldots, \mathbf{X}_n be independent random vectors from a p-variate normal distribution N_p(\mathbf{0}, \boldsymbol{\Sigma}). Then the Wishart distribution W_p(n, \boldsymbol{\Sigma}) is the distribution of the random matrix:
\mathbf{S} = \sum_{i=1}^n \mathbf{X}_i \mathbf{X}_i^T
1.4.3.5 Wilks’ lambda distribution
Wilks’ lambda (\Lambda) is used in multivariate analysis of variance (MANOVA). It is defined as the ratio of the determinant of the error sum of squares and cross-products matrix to the determinant of the total sum of squares and cross-products matrix:
\Lambda = \frac{|\mathbf{E}|}{|\mathbf{E} + \mathbf{H}|}
where \mathbf{E} is the error sum of squares and cross-products matrix and \mathbf{H} is the hypothesis sum of squares and cross-products matrix.
1.4.3.6 Hotelling’s T^2-distribution
Hotelling’s T^2 is a generalization of the t-distribution to multivariate data. It is defined as:
T^2 = n(\mathbf{\bar{x}} - \boldsymbol{\mu}_0)^T \mathbf{S}^{-1} (\mathbf{\bar{x}} - \boldsymbol{\mu}_0)
where \mathbf{\bar{x}} is the sample mean vector, \boldsymbol{\mu}_0 is the hypothesized population mean vector, \mathbf{S} is the sample covariance matrix, and n is the sample size.