# 8 📝 Bivariate Discrete Random Variables

## 8.1 Joint Probability Functions

**Definition 5.1 **Let \(X\) and \(Y\) be a pair of discrete random variables

Their **joint probability mass function** (joint PMF) expresses the
probability that simultaneously \(X\) takes on the specific
value \(x\) and \(Y\) takes on the specific value \(y\).

The joint PMF has two essential properties:

The value of the Joint PMF is always non-negative \[p_{X,Y}\left( x,y\right) \geq 0 \text{ for all possible pairs }\left(x,y\right)\]

The sum over all combinations of \(x\) and \(y\) values is equal to one \[\sum_{x}\sum_{y}\Pr ( \left\{ X=x\cap Y=y\right\}) =1\]

**Definition 8.1 (Marginal probability Mass functions)**The probability (mass) function of the

*discrete*random variable \(X\) is called its marginal probability (mass) function. It is obtained by summing the joint probabilities relating to pairs \((X,Y)\) over all possible values of \(Y\): \[\begin{equation*} p_{X}(x)=\sum_{y}p_{X,Y}(x,y). \end{equation*}\]

Similarly, the probability (mass) function of the discrete random variable \(Y\) is called its marginal probability (mass) function. It is obtained by summing the joint probabilities relating to pairs \((X,Y)\) over all possible values of \(X\): \[\begin{equation*} p_{Y}(y)=\sum_{x}p_{X,Y}(x,y). \end{equation*}\]

**Example 5.2 (caplets) **Two caplets are selected at random from a bottle containing three
aspirins, two sedatives and two placebo caplets. We are assuming that the caplets are well mixed and that each has an equal chance of being selected.

Let \(X\) and \(Y\) denote, respectively, the numbers of aspirin caplets, and the number of sedative caplets, included among the two caplets drawn from the bottle.

Number of sets of 2 caplets out of 7:

\[{7 \choose 2} = \frac{7!}{2!\times (7-2)!} = \frac{7!}{2! \times 5!} = \frac{6\times 7}{2} = 3\times 7 = 21\]

The Joint Probabilities can be found with Combinatorial Formulae:

- \(p_{X,Y}(0, 0) = {3 \choose 0}{2 \choose 0}{2 \choose 2} \left/ 21 \right. = 1/21\)
- \(p_{X,Y}(1, 0) = {3 \choose 1}{2 \choose 0}{2 \choose 1} \left/ 21 \right. = 6/21\)
- \(p_{X,Y}(2, 0) = {3 \choose 2}{2 \choose 0}{2 \choose 0} \left/ 21 \right. = 3/21\)
- \(p_{X,Y}(0, 1) = {3 \choose 0}{2 \choose 1}{2 \choose 1} \left/ 21 \right. = 4/21\)
- \(p_{X,Y}(1, 1) = {3 \choose 1}{2 \choose 1}{2 \choose 0} \left/ 21 \right. = 6/21\)
- \(p_{X,Y}(2, 1) = 0\) since \(2+1 = 3 > 2\)
- \(p_{X,Y}(0, 2) = {3 \choose 0}{2 \choose 2}{2 \choose 0} \left/ 21 \right. = 1/21\)
- \(p_{X,Y}(1, 2) = 0\) since \(1+2 = 3 > 2\)
- \(p_{X,Y}(2, 2) = 0\) since \(2+2 = 4 > 2\)

Notice that we can deduce the following Joint PMF: \[P(X=x \cap Y=y) = p(x, y) = \left\{ \begin{array}{ll} {3 \choose x}{2 \choose y}{2 \choose 2-x-y} \left/{7 \choose 2}\right. & x+y\leq 2\\ 0 &otherwise\end{array}\right.\]

Tabulating the *joint* probabilities as follows,
we can easily work out the marginal
probabilities

\(x\) | \(0\) | \(1\) | \(2\) | \(\Pr \left\{ Y=y\right\}\) | |
---|---|---|---|---|---|

\(y\) | |||||

\(0\) | \(1/21\) | \(6/21\) | \(3/21\) | \(10/21\) | |

\(1\) | \(4/21\) | \(6/21\) | \(0\) | \(10/21\) | |

\(2\) | \(1/21\) | \(0\) | \(0\) | \(1/21\) | |

\(\Pr\left\{X=x\right\}\) | \(6/21\) | \(12/21\) | \(3/21\) | \(1\) |

**Example 5.3 (Empirical Example) **Two production lines manufacture a certain type of item.
Suppose that the capacity (on any given day) is 5 items for *Line I* and 3
items for *Line II*.

Assume that the number of items actually produced by either production line varies from one day to the next.

Let \((X,Y)\) represent the 2-dimensional random variable yielding the number of items produced by and , respectively, on any one day.

In practical applications of this type the joint probability (mass) function \(\Pr(\{X=x\cap Y=y\})\) is unknown more often than not!

The joint probability (mass) function \(\Pr(\{X=x\cap Y=y\})\) for all possible values of \(x\) and \(y\) can be approximated however.

By the observing the long-run relative frequency with which different numbers of items are actually produced by either production line.

\(x\) | \(0\) | \(1\) | \(2\) | \(3\) | \(4\) | \(5\) | \(\Pr \left\{ Y=y\right\}\) | |
---|---|---|---|---|---|---|---|---|

\(y\) | ||||||||

\(0\) | \(0\) | \(0.01\) | \(0.03\) | \(0.05\) | \(0.07\) | \(0.09\) | \(0.25\) | |

\(1\) | \(0.01\) | \(0.02\) | \(0.04\) | \(0.05\) | \(0.06\) | \(0.08\) | \(0.26\) | |

\(2\) | \(0.01\) | \(0.03\) | \(0.05\) | \(0.05\) | \(0.05\) | \(0.06\) | \(0.25\) | |

\(3\) | \(0.01\) | \(0.02\) | \(0.04\) | \(0.06\) | \(0.06\) | \(0.05\) | \(0.24\) | |

${ X=x} $ | \(0.03\) | \(0.08\) | \(0.16\) | \(0.21\) | \(0.24\) | \(0.28\) | \(1\) |

## 8.2 Conditional Probability

Recall that the *conditional* probability mass function of the discrete

random variable \(Y\), **given** that the random variable \(X\) takes the
value \(x\), is given by:
\[\begin{equation*}
p_{Y|X}\left( y|x\right) =\frac{\Pr \left\{ X=x\cap Y=y\right\} }{%
P_{X}\left( X=x\right) }
\end{equation*}\]

Note this is a probability mass function for \(y,\) with \(x\) viewed as fixed. Similarly:

**Definition 8.2**The

*conditional*probability mass function of the

*discrete*random variable \(X\),

**given**that the random variable \(Y\) takes the value \(y\), is given by: \[\begin{equation*} p_{X|Y}\left( x|y\right) =\frac{\Pr \left\{ X=x\cap Y=y\right\}}{ P_{Y}\left( Y=y\right)} \end{equation*}\]

Note this is a probability mass function for \(x,\) with \(y\) viewed as fixed.

### 8.2.1 Independence

Two discrete random variables \(X\) and \(Y\) are

**independent**if \[\begin{eqnarray*} p_{X,Y}(x,y) &=&p_{X}(x)p_{Y}(y)\qquad \qquad \text{(discrete)} \\ %f_{X,Y}(x,y) &=&f_{X}(x)f_{Y}(y)\qquad \qquad \text{(continuous)} \end{eqnarray*}\] for values of \(x\) and \(y.\)Note that independence also implies that \[\begin{eqnarray*} p_{X|Y}(x|y) &=&p_{X}(x)\text{ and }p_{Y|X}(y|x)=p_{Y}(y)\qquad \text{ (discrete)} \\ %f_{X|Y}(x|y) &=&f_{X}(x)\text{ and }f_{Y|X}(y|x)=f_{Y}(y)\qquad \text{ %(continuous)} \end{eqnarray*}\] for values of \(x\) and \(y\).

**Example 1.1**

## 8.3 Expectations

**Definition 5.2**Let \(h(x,y)\) be a function of \(x\) and \(y\). We define the

**expected value**of \(h\left( X,Y\right)\) as \[\begin{equation*} E\left[ h\left( X,Y\right) \right] =\sum_{y}\sum_{x}h\left( x,y\right) p_{X,Y}\left( x,y\right) \end{equation*}\]

**Example 4.2**

**Definition 8.3**The

**conditional expectation**of \(h\left( X,Y\right)\) \(Y=y\) is defined as: \[\begin{equation*} E\left[ h\left( X,Y\right) |y\right] =\sum_{x}h\left( x,y\right) p_{X|Y}\left( x|y\right). \end{equation*}\]

Equivalently, the **conditional expectation }of \(h\left( X,Y\right)\) \(X=x\) is defined as: \[\begin{equation*} E\left[ h\left( X,Y\right) |x\right] =\sum_{y}h\left( x,y\right) p_{Y|X}\left( y|x\right). \end{equation*}\]

**Example 4.4**

### 8.3.1 Iterated Expectations

**Definition 8.4 (law of the iterated expectation)**The

*law of the iterated expectation*is often stated in the form: \[\begin{equation*} E[h(X,Y)]=E[E[h(X,Y)|Y]]=E[E[h(X,Y)|X]] \end{equation*}\]

This notation emphasises that whenever we write down \(E[\cdot]\) for an expectation we are taking that expectation with respect to the distribution implicit in the formulation of the argument.

The above formula is perhaps more easily understood using the more explicit notation: \[\begin{align*} E_{(X,Y)}[h(X,Y)]&=E_{(Y)}[E_{(X|Y)}[h(X,Y)]]\\ &=E_{(X)}[E_{(Y|X)}[h(X,Y)]] \end{align*}\]

This notation makes it clear what distribution is being used to evaluate the expectation, the joint, the marginal or the conditional.

## 8.4 Covariance and Correlation

**Definition 8.5**Let \(X\) and \(Y\) be two discrete random variables. The

**covariance} between \(X\) and**}\(Y\) is given by$ E$ when \[\begin{equation*} h\left(X,Y\right) = \left(X-E\left[ X\right] \right) \left( Y-E\left[ Y\right] \right), \end{equation*}\] i.e. \(Cov\left(X,Y\right) =E\left[ \left(X-E\left[ X\right] \right) \left(Y-E\left[ Y\right] \right) \right]\)

Alternative formula^{2} for \(Cov(X,Y)\) is
\[\begin{equation}
\boxed{Cov\left( X,Y\right) =E\left[ XY\right] -E\left[ X\right] E\left[ Y\right]\ .} \label{Cov}
\end{equation}\]

So, to compute the covariance from a table describing the joint behaviour of \(X\) and \(Y\), you have to:

- compute the joint expectation \(E[XY]\)—you get it making use of the joint probability;
- compute \(E[X]\) and \(E[Y]\)—you get using the marginal probability for \(X\) and \(Y\);
- combine these expected values as in formula ().

See example on page 13 for an illustrative computation.

### 8.4.1 Some Properties of Covariances

The Cauchy-Schwartz Inequality states \[(E\left[ XY\right])^2\leq E\left[ X^2\right]E\left[ Y^2\right],\] with equality if, and only if, \(\Pr(Y=cX)=1\) for some constant \(c\).

Let \(h(a)=E[(Y-aX)^2]\) where \(a\) is any number. Then \[0\leq h(a)=E[(Y-aX)^2]=E[X^2]a^2-2E[XY]a+E[Y^2]\,.\]

This is a quadratic in \(a\), and

- if \(h(a)>0\) the roots are real and \(4(E[XY])^2-4E[X^2]E[Y^2]<0\),
- if \(h(a)=0\) for some \(a=c\) then \(E[(Y-cX)^2]=0\), which implies that \(\Pr(Y-cX=0)=1\).

*Remark.*If two random variables are independent, their covariance is equal to zero. Note that the converse is not necessarily true: a zero covariance between two random variables does not imply that the variables are independent. This asymmetry

^{3}follows because **the covariance is a `measure’ of linear dependence.}

**Example 8.1**Let us consider two discrete random variable \(X\) and \(Y\), such that \[P(\{X=0\})=P(\{X=1\})=P(\{X=-1\})=\frac{1}{3},\] while \(Y=0\) if \(X\neq 0\) and \(Y=1\), if \(X=0\). So we have \(E[X]=0\) and \(XY=0\). This implies \[Cov(X,Y) = E[XY] -E[X]E[Y] =0,\] although \(X\) and \(Y\) are NOT independent: they are related in a nonlinear way.

Building on this remark, we have \(Cov(X,Y)>0\) if

large values of \(X\) tend to be associated with large values of \(Y\)

small values of \(X\) tend to be associated with small values of \(Y\)

\(Cov(X,Y)<0\) if

large values of \(X\) tend to be associated with values of \(Y\)

small values of \(X\) tend to be associated with values of \(Y\)

When \(Cov(X,Y)=0\), \(X\) and \(Y\) are said to be uncorrelated.

If \(X\) and \(Y\) are two random variables (either discrete or continuous) with \(Cov(X,Y) \neq 0\), then: \[\begin{equation} Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X,Y) \label{FullVar} \end{equation}\]

, where we read that in the case of independent random variables \(X\) and \(Y\) we have \[Var(X + Y) = Var(X) + Var(Y),\] which trivially follows from ()—indeed, for independent random variables, \(Cov(X,Y)\equiv 0\).

- The covariance depends upon the unit of measurement.

### 8.4.2 A remark

- If we scale \(X\) and \(Y\), the covariance changes: For \(a,b>0\)% \[\begin{equation*} Cov\left( aX,bY\right) =abCov\left( X,Y\right) \end{equation*}\]

Thus, we introduce the **correlation** between \(X\) and \(Y\) is
\[\begin{equation*}
corr\left( X,Y\right) =\frac{Cov\left( X,Y\right) }{\sqrt{Var\left( X\right)
Var\left( Y\right) }}
\end{equation*}\]

which depend upon the unit of measurement.