2.3 Multivariate Distributions

Multivariate distributions are used to characterize the joint distribution of a collection of N random variables X1,X2,,XN for N>1. The mathematical formulation of this joint distribution can be quite complex and typically makes use of matrix algebra. Here, we summarize some basic properties of multivariate distributions without the use of matrix algebra. In chapter 3, we show how matrix algebra can greatly simplify the description of multivariate distributions.

2.3.1 Discrete random variables

Let X1,X2,,XN be N discrete random variables with sample spaces SX1,SX2,,SXN. The likelihood that these random variables take values in the joint sample space SX1×SX2××SXN is given by the joint probability function: p(x1,x2,,xN)=Pr For N>2 it is not easy to represent the joint probabilities in a table like Table 2.3 or to visualize the distribution.

Marginal distributions for each variable X_{i} can be derived from the joint distribution as in (2.26) by summing the joint probabilities over the other variables j\neq i. For example, p(x_{1})=\sum_{x_{2}\in S_{X_{2}},\ldots,x_{N}\in S_{X_{N}}}p(x_{1},x_{2},\ldots,x_{N}). With N random variables, there are numerous conditional distributions that can be formed. For example, the distribution of X_{1} given X_{2}=x_{2},\ldots,X_{N}=x_{N} is determined using: \Pr(X_{1}=x_{1}|X_{2}=x_{2},\ldots,X_{N}=x_{N})=\frac{\Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{N}=x_{N})}{\Pr(X_{2}=x_{2},\ldots,X_{N}=x_{N})}. Similarly, the joint distribution of X_{1} and X_{2} given X_{3}=x_{3},\ldots,X_{N}=x_{N} is given by: \begin{align*} \Pr(X_{1}=x_{1},X_{2}=x_{2}|X_{3}=x_{3},\ldots,X_{N}=x_{N}) \\ =\frac{\Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{N}=x_{N})}{\Pr(X_{3}=x_{3},\ldots,X_{N}=x_{N})}. \end{align*}

2.3.2 Continuous random variables

Let X_{1},X_{2},\ldots,X_{N} be N continuous random variables each taking values on the real line. The joint pdf is a function f(x_{1},x_{2},\ldots,x_{N})\geq0 such that: \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_{1},x_{2},\ldots,x_{N})~dx_{1}~dx_{2}\cdots~dx_{N}=1. Joint probabilities of x_{11}\leq X_{1}\leq x_{12},x_{21}\leq X_{2}\leq x_{22},\ldots,x_{N1}\leq X_{N}\leq x_{N2} are computed by solving the integral equation: \begin{equation} \int_{x_{11}}^{x_{12}}\int_{x_{21}}^{x_{22}}\cdots\int_{x_{N1}}^{x_{N2}}f(x_{1},x_{2},\ldots,x_{N})~dx_{1}~dx_{2}\cdots~dx_{N}.\tag{2.53} \end{equation} For most multivariate distributions, the integral in (2.53) cannot be solved analytically and must be approximated numerically.

The marginal pdf for x_{i} is found by integrating the joint pdf with respect to the other variables. For example, the marginal pdf for x_{1} is found by solving: f(x_{1})=\int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_{1},x_{2},\ldots,x_{N})~dx_{2}\cdots~dx_{N}. Conditional pdf for a single random variable or a collection of random variables are defined in the obvious way.

2.3.3 Independence

A collection of N random variables are independent if their joint distribution factors into the product of all of the marginal distributions: \begin{align*} p(x_{1},x_{2},\ldots,x_{N}) & =p(x_{1})p(x_{2})\cdots p(x_{N})\textrm{ for }X_{i}\textrm{ discrete},\\ f\left(x_{1},x_{2},\ldots,x_{N}\right) & =f(x_{1})f(x_{2})\cdots f(x_{N})\textrm{ for }X_{i}\textrm{ continuous}. \end{align*} In addition, if N random variables are independent then any functions of these random variables are also independent.

2.3.4 Dependence concepts

In general, it is difficult to define dependence concepts for collections of more than two random variables. Dependence is typically only defined between pairwise random variables. Hence, covariance and correlation are also useful concepts when dealing with more than two random variables.

For N random variables X_{1},X_{2},\ldots,X_{N}, with mean values \mu_{i}=E[X_{i}] and variances \sigma_{i}^{2}=\mathrm{var}(X_{i}), the pairwise covariances and correlations are defined as: \begin{align*} \mathrm{cov}(X_{i},X_{j}) & =\sigma_{ij}=E[(X_{i}-\mu_{i})(X_{j}-\mu_{i})],\\ \mathrm{cov}(X_{i},X_{j}) & =\rho_{ij}=\frac{\sigma_{ij}}{\sigma_{i}\sigma_{j}}, \end{align*} for i\neq j. There are N(N-1)/2 pairwise covariances and correlations. Often, these values are summarized using matrix algebra in an N\times N covariance matrix \Sigma and an N\times N correlation matrix \mathbf{C}: \begin{align} \Sigma & =\left(\begin{array}{cccc} \sigma_{1}^{2} & \sigma_{12} & \cdots & \sigma_{1N}\\ \sigma_{12} & \sigma_{2}^{2} & \cdots & \sigma_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{1N} & \sigma_{2N} & \cdots & \sigma_{N}^{2} \end{array}\right),\tag{2.54}\\ \mathbf{C} & =\left(\begin{array}{cccc} 1 & \rho_{12} & \cdots & \rho_{1N}\\ \rho_{12} & 1 & \cdots & \rho_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ \rho_{1N} & \rho_{2N} & \cdots & 1 \end{array}\right).\tag{2.55} \end{align} These matrices are formally defined in chapter 3.

2.3.5 Linear combinations of N random variables

Many of the results for manipulating a collection of random variables generalize in a straightforward way to the case of more than two random variables. The details of the generalizations are not important for our purposes. However, the following results will be used repeatedly throughout the book.

Proposition 2.10 Let X_{1},X_{2},\ldots,X_{N} denote a collection of N random variables (discrete or continuous) with means \mu_{i}, variances \sigma_{i}^{2} and covariances \sigma_{ij}. Define the new random variable Z as a linear combination: Z=a_{1}X_{1}+a_{2}X_{2}+\cdots+a_{N}X_{N} where a_{1},a_{2},\ldots,a_{N} are constants. Then the following results hold: \begin{align} \mu_{Z} & =E[Z]=a_{1}E[X_{1}]+a_{2}E[X_{2}]+\cdots+a_{N}E[X_{N}]\tag{2.56}\\ & =\sum_{i=1}^{N}a_{i}E[X_{i}]=\sum_{i=1}^{N}a_{i}\mu_{i}.\nonumber \end{align} \begin{align} \sigma_{Z}^{2} & =\mathrm{var}(Z)=a_{1}^{2}\sigma_{1}^{2}+a_{2}^{2}\sigma_{2}^{2}+\cdots+a_{N}^{2}\sigma_{N}^{2}\tag{2.57}\\ & +2a_{1}a_{2}\sigma_{12}+2a_{1}a_{3}\sigma_{13}+\cdots+a_{1}a_{N}\sigma_{1N}\nonumber \\ & +2a_{2}a_{3}\sigma_{23}+2a_{2}a_{4}\sigma_{24}+\cdots+a_{2}a_{N}\sigma_{2N}\nonumber \\ & +\cdots+\nonumber \\ & +2a_{N-1}a_{N}\sigma_{(N-1)N}\nonumber \\ & =\sum_{i=1}^{N}a_{i}^{2}\sigma_{i}^{2}+2\sum_{i=1}^{N}\sum_{j\neq i}a_{i}a_{j}\sigma_{ij}.\nonumber \end{align}

The derivation of these results is very similar to the bivariate case and so is omitted. In addition, if all of the X_{i} are normally distributed then Z is also normally distributed with mean \mu_{Z} and variance \sigma_{Z}^{2} as described above.

The variance of a linear combination of N random variables contains N variance terms and N(N-1) covariance terms. For N=2,5,10 and 100 the number of covariance terms in \mathrm{var}(Z) is 2,\,20,\,90 and 9,900, respectively. Notice that when N is large there are many more covariance terms than variance terms in \mathrm{var}(Z).

The expression for \mathrm{var}(Z) is messy. It can be simplified using matrix algebra notation, as explained in detail in chapter 3. To preview, define the N\times1 vectors \mathbf{X}=(X_{1},\ldots,X_{N})^{\prime} and \mathbf{a}=(a_{1},\ldots,a_{N})^{\prime}. Then Z=\mathbf{a}^{\prime}\mathbf{X} and \mathrm{var}(Z)=\mathrm{var}(\mathbf{a}^{\prime}\mathbf{X})=\mathbf{a}^{\prime}\Sigma \mathbf{a}, where \Sigma is the N\times N covariance matrix, which is much more compact than (2.57).

Example 2.53 (Square-root-of-time rule for multi-period continuously compounded returns)

Let r_{t} denote the continuously compounded monthly return on an asset at times t=1,\ldots,12. Assume that r_{1},\ldots,r_{12} are independent and identically distributed (iid) N(\mu,\sigma^{2}). Recall, the annual continuously compounded return is equal to the sum of twelve monthly continuously compounded returns: r_{A}=r(12)=\sum_{t=1}^{12}r_{t}. Since each monthly return is normally distributed, the annual return is also normally distributed. The mean of r(12) is: \begin{align*} E[r(12)] & =E\left[\sum_{t=1}^{12}r_{t}\right]\\ & =\sum_{t=1}^{12}E[r_{t}]\textrm{ (by linearity of expectation)},\\ & =\sum_{t=1}^{12}\mu\textrm{ (by identical distributions)},\\ & =12\cdot\mu. \end{align*} Hence, the expected 12-month (annual) return is equal to 12 times the expected monthly return. The variance of r(12) is: \begin{align*} \mathrm{var}(r(12)) & =\mathrm{var}\left(\sum_{t=1}^{12}r_{t}\right)\\ & =\sum_{t=1}^{12}\mathrm{var}(r_{t})\textrm{ (by independence)},\\ & =\sum_{j=0}^{11}\sigma^{2}\textrm{ (by identical distributions)},\\ & =12\cdot\sigma^{2}, \end{align*} so that the annual variance is also equal to 12 times the monthly variance. Hence, the annual standard deviation is \sqrt{12} times the monthly standard deviation: \mathrm{sd}(r(12))=\sqrt{12}\sigma (this result is known as the square-root-of-time rule). Therefore, r(12)\sim N(12\mu,12\sigma^{2}).

\blacksquare

2.3.6 Covariance between linear combinations of random variables

Consider the linear combinations of two random variables: \begin{align*} Y & =aX_{1}+bX_{2},\\ Z & =cX_{3}+dX_{4}, \end{align*} where a,b,c and d are constants. The covariance between Y and Z is, \begin{align*} \mathrm{cov}(Y,Z) & =\mathrm{cov}(aX_{1}+bX_{2},cX_{3}+dX_{4})\\ & =E[((aX_{1}+bX_{2})-(a\mu_{1}+b\mu_{2}))((cX_{3}+dX_{4})-(c\mu_{3}+d\mu_{4}))]\\ & =E[(a(X_{1}-\mu_{1})+b(X_{2}-\mu_{2}))(c(X_{3}-\mu_{3})+d(X_{4}-\mu_{4}))]\\ & =acE[(X_{1}-\mu_{1})(X_{3}-\mu_{3})]+adE[(X_{1}-\mu_{1})(X_{4}-\mu_{4})]\\ & +bcE[(X_{2}-\mu_{2})(X_{3}-\mu_{3})]+bdE[(X_{2}-\mu_{2})(X_{4}-\mu_{4})]\\ & =ac\mathrm{cov}(X_{1},X_{3})+ad\mathrm{cov}(X_{1},X_{4})+bc\mathrm{cov}(X_{2},X_{3})+bd\mathrm{cov}(X_{2},X_{4}). \end{align*} Hence, covariance is additive for linear combinations of random variables. The result above extends in an obvious way to arbitrary linear combinations of random variables.