2.3 Multivariate Distributions

Multivariate distributions are used to characterize the joint distribution of a collection of $N$ random variables $X_{1},X_{2},\ldots,X_{N}$ for $N>1$ . The mathematical formulation of this joint distribution can be quite complex and typically makes use of matrix algebra. Here, we summarize some basic properties of multivariate distributions without the use of matrix algebra. In chapter 3, we show how matrix algebra can greatly simplify the description of multivariate distributions.

2.3.1 Discrete random variables

Let $X_{1},X_{2},\ldots,X_{N}$ be $N$ discrete random variables with sample spaces $S_{X_{1}},S_{X_{2}},\ldots,S_{X_{N}}$ . The likelihood that these random variables take values in the joint sample space $S_{X_{1}}\times S_{X_{2}}\times\cdots\times S_{X_{N}}$ is given by the joint probability function: $p(x_{1},x_{2},\ldots,x_{N})=\Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{N}=x_{N}).$ For $N>2$ it is not easy to represent the joint probabilities in a table like Table 2.3 or to visualize the distribution.

Marginal distributions for each variable $X_{i}$ can be derived from the joint distribution as in (2.26) by summing the joint probabilities over the other variables $j\neq i$ . For example, $p(x_{1})=\sum_{x_{2}\in S_{X_{2}},\ldots,x_{N}\in S_{X_{N}}}p(x_{1},x_{2},\ldots,x_{N}).$ With $N$ random variables, there are numerous conditional distributions that can be formed. For example, the distribution of $X_{1}$ given $X_{2}=x_{2},\ldots,X_{N}=x_{N}$ is determined using: $\Pr(X_{1}=x_{1}|X_{2}=x_{2},\ldots,X_{N}=x_{N})=\frac{\Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{N}=x_{N})}{\Pr(X_{2}=x_{2},\ldots,X_{N}=x_{N})}.$ Similarly, the joint distribution of $X_{1}$ and $X_{2}$ given $X_{3}=x_{3},\ldots,X_{N}=x_{N}$ is given by: $\begin{align*} \Pr(X_{1}=x_{1},X_{2}=x_{2}|X_{3}=x_{3},\ldots,X_{N}=x_{N}) \\ =\frac{\Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{N}=x_{N})}{\Pr(X_{3}=x_{3},\ldots,X_{N}=x_{N})}. \end{align*}$

2.3.2 Continuous random variables

Let $X_{1},X_{2},\ldots,X_{N}$ be $N$ continuous random variables each taking values on the real line. The joint pdf is a function $f(x_{1},x_{2},\ldots,x_{N})\geq0$ such that: $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_{1},x_{2},\ldots,x_{N})~dx_{1}~dx_{2}\cdots~dx_{N}=1.$ Joint probabilities of $x_{11}\leq X_{1}\leq x_{12},x_{21}\leq X_{2}\leq x_{22},\ldots,x_{N1}\leq X_{N}\leq x_{N2}$ are computed by solving the integral equation: $\begin{equation} \int_{x_{11}}^{x_{12}}\int_{x_{21}}^{x_{22}}\cdots\int_{x_{N1}}^{x_{N2}}f(x_{1},x_{2},\ldots,x_{N})~dx_{1}~dx_{2}\cdots~dx_{N}.\tag{2.53} \end{equation}$ For most multivariate distributions, the integral in (2.53) cannot be solved analytically and must be approximated numerically.

The marginal pdf for $x_{i}$ is found by integrating the joint pdf with respect to the other variables. For example, the marginal pdf for $x_{1}$ is found by solving: $f(x_{1})=\int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_{1},x_{2},\ldots,x_{N})~dx_{2}\cdots~dx_{N}.$ Conditional pdf for a single random variable or a collection of random variables are defined in the obvious way.

2.3.3 Independence

A collection of $N$ random variables are independent if their joint distribution factors into the product of all of the marginal distributions: $\begin{align*} p(x_{1},x_{2},\ldots,x_{N}) & =p(x_{1})p(x_{2})\cdots p(x_{N})\textrm{ for }X_{i}\textrm{ discrete},\\ f\left(x_{1},x_{2},\ldots,x_{N}\right) & =f(x_{1})f(x_{2})\cdots f(x_{N})\textrm{ for }X_{i}\textrm{ continuous}. \end{align*}$ In addition, if $N$ random variables are independent then any functions of these random variables are also independent.

2.3.4 Dependence concepts

In general, it is difficult to define dependence concepts for collections of more than two random variables. Dependence is typically only defined between pairwise random variables. Hence, covariance and correlation are also useful concepts when dealing with more than two random variables.

For $N$ random variables $X_{1},X_{2},\ldots,X_{N}$ , with mean values $\mu_{i}=E[X_{i}]$ and variances $\sigma_{i}^{2}=\mathrm{var}(X_{i})$ , the pairwise covariances and correlations are defined as: $\begin{align*} \mathrm{cov}(X_{i},X_{j}) & =\sigma_{ij}=E[(X_{i}-\mu_{i})(X_{j}-\mu_{i})],\\ \mathrm{cov}(X_{i},X_{j}) & =\rho_{ij}=\frac{\sigma_{ij}}{\sigma_{i}\sigma_{j}}, \end{align*}$ for $i\neq j$ . There are $N(N-1)/2$ pairwise covariances and correlations. Often, these values are summarized using matrix algebra in an $N\times N$ covariance matrix $\Sigma$ and an $N\times N$ correlation matrix $\mathbf{C}$ : $\begin{align} \Sigma & =\left(\begin{array}{cccc} \sigma_{1}^{2} & \sigma_{12} & \cdots & \sigma_{1N}\\ \sigma_{12} & \sigma_{2}^{2} & \cdots & \sigma_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ \sigma_{1N} & \sigma_{2N} & \cdots & \sigma_{N}^{2} \end{array}\right),\tag{2.54}\\ \mathbf{C} & =\left(\begin{array}{cccc} 1 & \rho_{12} & \cdots & \rho_{1N}\\ \rho_{12} & 1 & \cdots & \rho_{2N}\\ \vdots & \vdots & \ddots & \vdots\\ \rho_{1N} & \rho_{2N} & \cdots & 1 \end{array}\right).\tag{2.55} \end{align}$ These matrices are formally defined in chapter 3.

2.3.5 Linear combinations of $N$ random variables

Many of the results for manipulating a collection of random variables generalize in a straightforward way to the case of more than two random variables. The details of the generalizations are not important for our purposes. However, the following results will be used repeatedly throughout the book.

Proposition 2.10 Let $X_{1},X_{2},\ldots,X_{N}$ denote a collection of $N$ random variables (discrete or continuous) with means $\mu_{i}$ , variances $\sigma_{i}^{2}$ and covariances $\sigma_{ij}$ . Define the new random variable $Z$ as a linear combination: $Z=a_{1}X_{1}+a_{2}X_{2}+\cdots+a_{N}X_{N}$ where $a_{1},a_{2},\ldots,a_{N}$ are constants. Then the following results hold: $\begin{align} \mu_{Z} & =E[Z]=a_{1}E[X_{1}]+a_{2}E[X_{2}]+\cdots+a_{N}E[X_{N}]\tag{2.56}\\ & =\sum_{i=1}^{N}a_{i}E[X_{i}]=\sum_{i=1}^{N}a_{i}\mu_{i}.\nonumber \end{align}$ $\begin{align} \sigma_{Z}^{2} & =\mathrm{var}(Z)=a_{1}^{2}\sigma_{1}^{2}+a_{2}^{2}\sigma_{2}^{2}+\cdots+a_{N}^{2}\sigma_{N}^{2}\tag{2.57}\\ & +2a_{1}a_{2}\sigma_{12}+2a_{1}a_{3}\sigma_{13}+\cdots+a_{1}a_{N}\sigma_{1N}\nonumber \\ & +2a_{2}a_{3}\sigma_{23}+2a_{2}a_{4}\sigma_{24}+\cdots+a_{2}a_{N}\sigma_{2N}\nonumber \\ & +\cdots+\nonumber \\ & +2a_{N-1}a_{N}\sigma_{(N-1)N}\nonumber \\ & =\sum_{i=1}^{N}a_{i}^{2}\sigma_{i}^{2}+2\sum_{i=1}^{N}\sum_{j\neq i}a_{i}a_{j}\sigma_{ij}.\nonumber \end{align}$

The derivation of these results is very similar to the bivariate case and so is omitted. In addition, if all of the $X_{i}$ are normally distributed then $Z$ is also normally distributed with mean $\mu_{Z}$ and variance $\sigma_{Z}^{2}$ as described above.

The variance of a linear combination of $N$ random variables contains $N$ variance terms and $N(N-1)$ covariance terms. For $N=2,5,10$ and 100 the number of covariance terms in $\mathrm{var}(Z)$ is $2,\,20,\,90$ and $9,900$ , respectively. Notice that when $N$ is large there are many more covariance terms than variance terms in $\mathrm{var}(Z)$ .

The expression for $\mathrm{var}(Z)$ is messy. It can be simplified using matrix algebra notation, as explained in detail in chapter 3. To preview, define the $N\times1$ vectors $\mathbf{X}=(X_{1},\ldots,X_{N})^{\prime}$ and $\mathbf{a}=(a_{1},\ldots,a_{N})^{\prime}$ . Then $Z=\mathbf{a}^{\prime}\mathbf{X}$ and $\mathrm{var}(Z)=\mathrm{var}(\mathbf{a}^{\prime}\mathbf{X})=\mathbf{a}^{\prime}\Sigma \mathbf{a}$ , where $\Sigma$ is the $N\times N$ covariance matrix, which is much more compact than (2.57).

Example 2.53 (Square-root-of-time rule for multi-period continuously compounded returns)

Let $r_{t}$ denote the continuously compounded monthly return on an asset at times $t=1,\ldots,12.$ Assume that $r_{1},\ldots,r_{12}$ are independent and identically distributed (iid) $N(\mu,\sigma^{2}).$ Recall, the annual continuously compounded return is equal to the sum of twelve monthly continuously compounded returns: $r_{A}=r(12)=\sum_{t=1}^{12}r_{t}.$ Since each monthly return is normally distributed, the annual return is also normally distributed. The mean of $r(12)$ is: $\begin{align*} E[r(12)] & =E\left[\sum_{t=1}^{12}r_{t}\right]\\ & =\sum_{t=1}^{12}E[r_{t}]\textrm{ (by linearity of expectation)},\\ & =\sum_{t=1}^{12}\mu\textrm{ (by identical distributions)},\\ & =12\cdot\mu. \end{align*}$ Hence, the expected 12-month (annual) return is equal to 12 times the expected monthly return. The variance of $r(12)$ is: $\begin{align*} \mathrm{var}(r(12)) & =\mathrm{var}\left(\sum_{t=1}^{12}r_{t}\right)\\ & =\sum_{t=1}^{12}\mathrm{var}(r_{t})\textrm{ (by independence)},\\ & =\sum_{j=0}^{11}\sigma^{2}\textrm{ (by identical distributions)},\\ & =12\cdot\sigma^{2}, \end{align*}$ so that the annual variance is also equal to 12 times the monthly variance. Hence, the annual standard deviation is $\sqrt{12}$ times the monthly standard deviation: $\mathrm{sd}(r(12))=\sqrt{12}\sigma$ (this result is known as the square-root-of-time rule). Therefore, $r(12)\sim N(12\mu,12\sigma^{2})$ .

$\blacksquare$

2.3.6 Covariance between linear combinations of random variables

Consider the linear combinations of two random variables: $\begin{align*} Y & =aX_{1}+bX_{2},\\ Z & =cX_{3}+dX_{4}, \end{align*}$ where $a,b,c$ and $d$ are constants. The covariance between $Y$ and $Z$ is, $\begin{align*} \mathrm{cov}(Y,Z) & =\mathrm{cov}(aX_{1}+bX_{2},cX_{3}+dX_{4})\\ & =E[((aX_{1}+bX_{2})-(a\mu_{1}+b\mu_{2}))((cX_{3}+dX_{4})-(c\mu_{3}+d\mu_{4}))]\\ & =E[(a(X_{1}-\mu_{1})+b(X_{2}-\mu_{2}))(c(X_{3}-\mu_{3})+d(X_{4}-\mu_{4}))]\\ & =acE[(X_{1}-\mu_{1})(X_{3}-\mu_{3})]+adE[(X_{1}-\mu_{1})(X_{4}-\mu_{4})]\\ & +bcE[(X_{2}-\mu_{2})(X_{3}-\mu_{3})]+bdE[(X_{2}-\mu_{2})(X_{4}-\mu_{4})]\\ & =ac\mathrm{cov}(X_{1},X_{3})+ad\mathrm{cov}(X_{1},X_{4})+bc\mathrm{cov}(X_{2},X_{3})+bd\mathrm{cov}(X_{2},X_{4}). \end{align*}$ Hence, covariance is additive for linear combinations of random variables. The result above extends in an obvious way to arbitrary linear combinations of random variables.