Chapter 4 Conditional distributions

Nike and Adidas have opened up new neighboring sports stores which have found themselves in direct competition with each other. Let \(S_{\text{Nike}}\) be the total sales in the Nike store, and let \(S_{\text{Adidas}}\) be the total sales in the Adidas store. Phil Knight, the co-founder of Nike, would like to know the sales of the Adidas store in order to make business decisions regarding his own store. However Phil is not privy to this information. Phil only knows the exact sales of the Nike store. Given the information about \(S_{\text{Nike}}\), what can be said about \(S_{\text{Adidas}}\)?

4.1 Conditional Probabilities and Discrete Conditional Distributions

The problem described above concerning sales in a Nike and Adidas store is in a continuous setting, that is, \(S_1\) and \(S_2\) are both continuous random variables. We have encountered this type of conditional problem in the context of probabilities. In this section, we recall this theory and extend it to discrete random variables.

Consider two events \(A\) and \(B\) such that \(P(B)>0\). The probability of event \(A\) happening given that \(B\) has already happened is called the conditional probability of \(A\) given \(B\), and is denoted \(P(A \mid B)\). The conditional probability can be calculated by the formula \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}.\]


Why does Definition 4.1.1 specify that \(P(B)>0\)? The reason for this has both a real-world interpretation and a mathematical one. The statement that \(P(B)=0\) is equivalent to \(B\) being an impossible event. If \(B\) was an impossible event, then the assumption that \(B\) has already happened in Definition 4.1.1 would be a contradiction. Mathematically if \(P(B)=0\), then we are unable to divide by \(P(B)\) in the formula \(P(A \mid B) = \frac{P(A \, \cap \, B)}{P(B)}\). Therefore we must specify that \(P(B)>0\).

Consider the cafe from Example 3.1.2, where the owner has collected data to determine the joint probabilities of the temperature \(X\) in degrees Celsius during winter and the number of customers \(Y\) in the cafe each day. The joint probability table is

The owner would like to make a decision about how many staff are on shift today. Given that it is going to be \(20^\circ C\), what is the probability that there are \(75\) customers?



In mathematical language, the question is asking us to calculate \(P(Y=75 \mid X=20)\). Using the formula of Definition 4.1.1, we know that \(P(Y=75 \mid X=20) = \frac{P(X=20 \, \cap \, Y=75)}{P(X=20)}\). Calculate that

\[\begin{align*} P(X=20) &= P(X=20, Y=15) + P(X=20, Y=75) + P(X=20, Y=150) \\[3pt] &= 0.04 + 0.05 + 0.01 \\[3pt] &= 0.1 \end{align*}\]

Therefore

\[\begin{align*} P(Y=75 \mid X=20) &= \frac{P(X=20 \cap Y=75)}{P(X=20)} \\[3pt] &= \frac{0.05}{0.1} \\[3pt] &= \frac{1}{2} \end{align*}\]


Can we extend this idea of conditional probability to the language of discrete random variables?

Recall that for a random variable \(X\), the probability \(P(X=x)\) is given by the evaluation of the probability mass function \(p_X(x)\) of \(X\). Similarly for two random variables \(X\) and \(Y\), the probability \(P(X=x, \, Y=y)\) is given by the evaluation of the joint probability mass function \(p_{X,Y}(x,y)\). Converting Definition 4.1.1 into these terms leads us to the following.

Consider two discrete random variables \(X\) and \(Y\). Let \(p_{X,Y}(x,y)\) be the joint PMF of \(X\) and \(Y\), and \(p_Y(y)\) be the marginal PMF of \(Y\). The conditional probability mass function of \(X\) given \(Y\), denoted \(p_{X|Y}(x \mid y)\), is the function

\[p_{X|Y}(x|y) = \begin{cases} \frac{p_{X,Y}(x,y)}{p_Y(y)}, & \text{if } p_Y(y)>0, \\[3pt] 0, & \text{if } p_Y(y)=0. \end{cases}\]

The conditional PMF \(p_{Y|X}(y \mid x)\) is defined similarly.

The value \(p_{X|Y}(x|y)\) is equal to the probability \(P(X=x \mid Y=y)\).

Calculate the conditional PMF of \(X\) given \(Y\) for the random variables given in Example 4.1.2.



By applying Definition 4.1.3, calculate that

\[\begin{align*} p_{Y|X}(15|0) &= \frac{p_{X,Y}(15,0)}{p_X(0)} = \frac{0.07}{0.07 + 0.11 + 0.01} = 0.368, \\[5pt] p_{Y|X}(75|0) &= \frac{p_{X,Y}(75,0)}{p_X(0)} = \frac{0.11}{0.19} = 0.579, \\[5pt] p_{Y|X}(150|0) &= \frac{p_{X,Y}(150,0)}{p_X(0)} = \frac{0.01}{0.19} = 0.053, \\[5pt] p_{Y|X}(15|10) &= \frac{p_{X,Y}(15,10)}{p_X(10)} = \frac{0.23}{0.23 + 0.43 + 0.05} = 0.323, \\[5pt] p_{Y|X}(75|10) &= \frac{p_{X,Y}(75,10)}{p_X(10)} = \frac{0.43}{0.71} = 0.606, \\[5pt] p_{Y|X}(150|10) &= \frac{p_{X,Y}(150,10)}{p_X(10)} = \frac{0.05}{0.71} = 0.071, \\[5pt] p_{Y|X}(15|20) &= \frac{p_{X,Y}(15,20)}{p_X(20)} = \frac{0.04}{0.04 + 0.05 + 0.01} = 0.4, \\[5pt] p_{Y|X}(75|20) &= \frac{p_{X,Y}(75,20)}{p_X(20)} = \frac{0.05}{0.1} = 0.5, \\[5pt] p_{Y|X}(150|20) &= \frac{p_{X,Y}(150,20)}{p_X(20)} =\frac{0.01}{0.1} = 0.1. \end{align*}\]
Note that the calculation of \(p_{X|Y}(75|20)\) is consistent with the solution of Example 4.1.2.


Let \(y\) be some fixed outcome for the random variable \(Y\). The sum of conditional probabilities \(p_{X|Y}(x|y)\) over all values of \(x\) will sum to \(1\) since this is a complete set of possible outcomes. This can be verified for the solution to Example 4.1.4, for example:

\[p_{Y|X}(15|0) + p_{Y|X}(75|0) + p_{Y|X}(150|0) = 0.368 + 0.579 + 0.053 = 1.\]

Check that the sum of values \(p(x \mid 10)\) and the sum of values \(p(x \mid 20)\) respectively over all possible outcomes \(x\) equals \(1\).

Consider two discrete random variables \(X\) and \(Y\). The conditional CDF of \(X\) given \(Y=y\) is \[F_{X|Y}(x|y) = \sum_{x' \leq x} p_{X|Y}(x'|y).\]

It follows from Definition 4.1.5 that \(F_{X|Y}(x|y) = P(X \leq x|Y=y)\).

4.2 Continuous Conditional Distributions

This theory cannot be extended to the continuous case directly since for a continuous random variable \(Y\), and for any fixed value \(y\), one has \(P_Y(Y=y)=0\).

Consider two continuous random variables \(X\) and \(Y\). Let \(f_{X,Y}\) be the joint p.d.f. of \(X\) and \(Y\), and \(f_Y(y)\) be the marginal p.d.f. of \(Y\).

The conditional probability density function of \(X\) given that \(Y=y\) is defined by

\[f_{X|Y}(x|y) = \begin{cases} \frac{f_{X,Y}(x,y)}{f_Y(y)}, & \text{if } f_Y(y)>0, \\[5pt] 0, & \text{otherwise.} \end{cases}\]

Consider two continuous random variables \(X\) and \(Y\). The cumulative conditional probability distribution function of \(X\) given \(Y=y\) is defined by

\[ F_{X|Y}(x|y) = P(X \leq x|Y=y) = \int_{-\infty}^x f_{X|Y}(u|y) \,du.\]

Note that conditional p.d.f.’s are themselves probability density functions and thus have all the associated properties.

Consider two random variables \(X,Y\) whose joint p.d.f. is

\[ f_{X,Y}(x,y) = \begin{cases} 24x(1-x-y), & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[5pt] 0, & \text{otherwise.} \end{cases}\]

Find the conditional p.d.f. of \(X\) given \(Y=\frac{1}{2}\).

In Example 2.2.3, we found

\[f_Y(y) = \begin{cases} 4(1-y)^3, & 0 \leq y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases}\]
Therefore by Definition 4.1.4,
\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \begin{cases} \frac{24x(1-x-y)}{4(1-y)^3}, & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases} \end{align*}\]
Setting \(y=\frac{1}{2}\) obtain
\[\begin{align*} f_{X|Y}\left(x \bigg| \frac{1}{2}\right) &= \frac{f_{X,Y}(x,\frac{1}{2})}{f_Y(\frac{1}{2})} \\[5pt] &= \begin{cases} \frac{24x \left( \frac{1}{2}-x \right)}{4 \left( \frac{1}{2} \right)^3}, & \text{if } 0 \leq x \leq \frac{1}{2}, \\[3pt] 0, & \text{otherwise,} \end{cases} \\[5pt] &= \begin{cases} 48x \left( \frac{1}{2}-x \right) , & \text{if } 0 \leq x \leq \frac{1}{2}, \\[3pt] 0, & \text{otherwise.} \end{cases} \end{align*}\]

ADD CONDITIONAL DISTRIBUTION APP

4.3 Independence

Recall the statement of Definition 2.5.1 that states two random variables \(X,Y\) are independent if \(f_{X,Y}(x,y) = f_{X}(x)f_{Y}(y)\) for all \(x,y \in \mathbb{R}\). It follows that for independent continuous random variables \(X\) and \(Y\), then for any \(y\) such that \(f_Y(y)>0\):

\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \frac{f_X(x)f_Y(y)}{f_Y(y)} \\[5pt] &= f_X(x), \qquad \forall\,x\in\mathbb R. \end{align*}\]

This is to say that the conditional probability density function \(f_{X|Y}\) is equal to the probability density function \(f_X\), and does not depend on \(y\) at all. This makes sense: the information on the outcome of \(Y\) is independent of \(X\).

4.4 Conditional Expectation

Let \(X\) and \(Y\) be random variables. One can ask about the expectation of \(X\), that is the average value of \(X\) over infinitely many trials, given the outcome of \(Y\).

The conditional expectation of \(X\) given \(Y=y\), is defined by

\[E[X|Y=y] = \begin{cases} \sum\limits_x xp_{X|Y}(x|y), & \text{if $X$ is discrete,} \\[3pt] \int_{-\infty}^\infty xf_{X|Y}(x|y) \,dx, & \text{if $X$ is continuous.} \end{cases} \]

Many of the properties of expectation such as linearity are inherited by conditional expectation.

Consider two continuous random variables \(X,Y\) whose joint p.d.f. is

\[ f_{X,Y}(x,y)= \begin{cases} \frac{1}{y} e^{-(\frac{x}{y}+y)}, & 0<x,y<\infty, \\[3pt] 0 & \text{otherwise.}\end{cases} \]

For some fixed value \(y>0\), find \(E[X|Y=y]\).

Calculating the marginal p.d.f. of \(Y\):

\[\begin{align*} f_Y(y) &= \int_{-\infty}^\infty f_{X,Y}(x,y) \,dx \\[3pt] &= \int_0^\infty \frac{1}{y}e^{-(\frac{x}{y}+y)} \,dx \\[3pt] &= e^{-y} \end{align*}\]

Hence, for \(y>0\), the conditional probability density function is

\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \begin{cases} \frac{1}{y}e^{-x/y} & \text{if } x>0,\\[3pt] 0, & \text{if } x \leq 0. \end{cases} \end{align*}\]

Therefore by Definition 4.3.1 the conditional expectation of \(X\) is

\[\begin{align*} E[X|Y=y] &= \int_{-\infty}^\infty x f_{X|Y}(x|y) \,dx \\[3pt] &= \int_0^\infty\frac{x}{y}e^{-x/y} \,dx \\[3pt] &=y. \end{align*}\]

4.5 Conditional Variance