1.4 Multivariate Random Variables
In the previous we discuss the random variables in one dimension. This chapter will introduce the concept of multivariate random variables.
1.4.1 Joint distributions
1.4.1.2 Sum of two independent random variables
Convolution formula:
If \(X\) and \(Y\) are independent random variables with probability density functions \(f_X(x)\) and \(f_Y(y)\), respectively, then the probability density function of \(Z = X + Y\) is given by:
\[f_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z-x) \, dx\]
1.4.3 Families of multivariate distributions
1.4.3.1 Trinomial distribution
Let \(n\) be the number of trials, and \(X_1\), \(X_2\), \(X_3\) be the number of outcomes of each of the three categories, with probabilities \(p_1\), \(p_2\), \(p_3\) respectively. The probability mass function is given by:
\[P(X_1 = x_1, X_2 = x_2, X_3 = x_3) = \frac{n!}{x_1! x_2! x_3!} p_1^{x_1} p_2^{x_2} p_3^{x_3}\]
where \(x_1 + x_2 + x_3 = n\) and \(p_1 + p_2 + p_3 = 1\).
1.4.3.2 Bivariate hypergeometric distribution
Let \(N\) be the population size, \(K_1\) and \(K_2\) be the number of items of type 1 and type 2 respectively. Let \(n\) be the sample size. The probability mass function for \(X_1\) and \(X_2\) being the number of items of type 1 and type 2 in the sample is:
\[P(X_1 = x_1, X_2 = x_2) = \frac{\binom{K_1}{x_1} \binom{K_2}{x_2} \binom{N - K_1 - K_2}{n - x_1 - x_2}}{\binom{N}{n}}\]
1.4.3.3 Multivariate normal distribution
The probability density function of a \(k\)-dimensional multivariate normal distribution with mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) is:
\[f(\mathbf{x}) = \frac{1}{(2\pi)^{k/2} |\boldsymbol{\Sigma}|^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)\]
1.4.3.4 Wishart distribution
Let \(\mathbf{X}_1, \ldots, \mathbf{X}_n\) be independent random vectors from a \(p\)-variate normal distribution \(N_p(\mathbf{0}, \boldsymbol{\Sigma})\). Then the Wishart distribution \(W_p(n, \boldsymbol{\Sigma})\) is the distribution of the random matrix:
\[\mathbf{S} = \sum_{i=1}^n \mathbf{X}_i \mathbf{X}_i^T\]
1.4.3.5 Wilks’ lambda distribution
Wilks’ lambda (\(\Lambda\)) is used in multivariate analysis of variance (MANOVA). It is defined as the ratio of the determinant of the error sum of squares and cross-products matrix to the determinant of the total sum of squares and cross-products matrix:
\[\Lambda = \frac{|\mathbf{E}|}{|\mathbf{E} + \mathbf{H}|}\]
where \(\mathbf{E}\) is the error sum of squares and cross-products matrix and \(\mathbf{H}\) is the hypothesis sum of squares and cross-products matrix.
1.4.3.6 Hotelling’s \(T^2\)-distribution
Hotelling’s \(T^2\) is a generalization of the \(t\)-distribution to multivariate data. It is defined as:
\[T^2 = n(\mathbf{\bar{x}} - \boldsymbol{\mu}_0)^T \mathbf{S}^{-1} (\mathbf{\bar{x}} - \boldsymbol{\mu}_0)\]
where \(\mathbf{\bar{x}}\) is the sample mean vector, \(\boldsymbol{\mu}_0\) is the hypothesized population mean vector, \(\mathbf{S}\) is the sample covariance matrix, and \(n\) is the sample size.