Chapter 4 Conditional Distributions

Nike and Adidas have opened up new neighboring sports stores which have found themselves in direct competition with each other. Let \(S_{\text{Nike}}\) be the total sales in the Nike store, and let \(S_{\text{Adidas}}\) be the total sales in the Adidas store. Phil Knight, the co-founder of Nike, would like to know the sales of the Adidas store in order to make business decisions regarding his own store. However Phil is not privy to this information. Phil only knows the exact sales of the Nike store. Given the information about \(S_{\text{Nike}}\), what can be said about \(S_{\text{Adidas}}\)?

4.1 Conditional Probabilities and Discrete Conditional Distributions

The problem described above concerning sales in a Nike and Adidas store is in a continuous setting, that is, \(S_1\) and \(S_2\) are both continuous random variables. We have encountered this type of conditional problem in the context of probabilities. In this section, we recall this theory and extend it to discrete random variables.

Consider two events \(A\) and \(B\) such that \(P(B)>0\). The probability of event \(A\) happening given that \(B\) has already happened is called the conditional probability of \(A\) given \(B\), and is denoted \(P(A \mid B)\). The conditional probability can be calculated by the formula \[ P(A \mid B) = \frac{P(A \cap B)}{P(B)}.\]

Why does Definition 4.1.1 specify that \(P(B)>0\)? The reason for this has both a real-world interpretation and a mathematical one. The statement that \(P(B)=0\) is equivalent to \(B\) being an impossible event. If \(B\) was an impossible event, then the assumption that \(B\) has already happened in Definition 4.1.1 would be a contradiction. Mathematically if \(P(B)=0\), then we are unable to divide by \(P(B)\) in the formula \(P(A \mid B) = \frac{P(A \, \cap \, B)}{P(B)}\). Therefore we must specify that \(P(B)>0\).

Consider the cafe from Example 3.1.2, where the owner has collected data to determine the joint probabilities of the temperature \(X\) in degrees Celsius during winter and the number of customers \(Y\) in the cafe each day. The joint probability table is

The owner would like to make a decision about how many staff are on shift today. Given that it is going to be \(20^\circ C\), what is the probability that there are \(75\) customers?



In mathematical language, the question is asking us to calculate \(P(Y=75 \mid X=20)\). Using the formula of Definition 4.1.1, we know that \(P(Y=75 \mid X=20) = \frac{P(X=20 \, \cap \, Y=75)}{P(X=20)}\). Calculate that

\[\begin{align*} P(X=20) &= P(X=20, Y=15) + P(X=20, Y=75) + P(X=20, Y=150) \\[3pt] &= 0.04 + 0.05 + 0.01 \\[3pt] &= 0.1 \end{align*}\]

Therefore

\[\begin{align*} P(Y=75 \mid X=20) &= \frac{P(X=20 \cap Y=75)}{P(X=20)} \\[3pt] &= \frac{0.05}{0.1} \\[3pt] &= \frac{1}{2} \end{align*}\]

Can we extend this idea of conditional probability to the language of discrete random variables?

Recall that for a random variable \(X\), the probability \(P(X=x)\) is given by the evaluation of the probability mass function \(p_X(x)\) of \(X\). Similarly for two random variables \(X\) and \(Y\), the probability \(P(X=x, \, Y=y)\) is given by the evaluation of the joint probability mass function \(p_{X,Y}(x,y)\). Converting Definition 4.1.1 into these terms leads us to the following.

Consider two discrete random variables \(X\) and \(Y\). Let \(p_{X,Y}(x,y)\) be the joint PMF of \(X\) and \(Y\), and \(p_Y(y)\) be the marginal PMF of \(Y\). The conditional probability mass function of \(X\) given that \(Y=y\), denoted \(p_{X|Y}(x \mid y)\), is the function

\[p_{X|Y}(x|y) = \begin{cases} \frac{p_{X,Y}(x,y)}{p_Y(y)}, & \text{if } p_Y(y)>0, \\[3pt] 0, & \text{if } p_Y(y)=0. \end{cases}\]

The conditional PMF \(p_{Y|X}(y \mid x)\) is defined similarly.

The value \(p_{X|Y}(x|y)\) is equal to the probability \(P(X=x \mid Y=y)\).

Calculate the conditional PMF of \(Y\) given \(X\) for the random variables given in Example 4.1.2.



By applying Definition 4.1.3, calculate that

\[\begin{align*} p_{Y|X}(15|0) &= \frac{p_{X,Y}(15,0)}{p_X(0)} = \frac{0.07}{0.07 + 0.11 + 0.01} = 0.368, \\[5pt] p_{Y|X}(75|0) &= \frac{p_{X,Y}(75,0)}{p_X(0)} = \frac{0.11}{0.19} = 0.579, \\[5pt] p_{Y|X}(150|0) &= \frac{p_{X,Y}(150,0)}{p_X(0)} = \frac{0.01}{0.19} = 0.053, \\[5pt] p_{Y|X}(15|10) &= \frac{p_{X,Y}(15,10)}{p_X(10)} = \frac{0.23}{0.23 + 0.43 + 0.05} = 0.323, \\[5pt] p_{Y|X}(75|10) &= \frac{p_{X,Y}(75,10)}{p_X(10)} = \frac{0.43}{0.71} = 0.606, \\[5pt] p_{Y|X}(150|10) &= \frac{p_{X,Y}(150,10)}{p_X(10)} = \frac{0.05}{0.71} = 0.071, \\[5pt] p_{Y|X}(15|20) &= \frac{p_{X,Y}(15,20)}{p_X(20)} = \frac{0.04}{0.04 + 0.05 + 0.01} = 0.4, \\[5pt] p_{Y|X}(75|20) &= \frac{p_{X,Y}(75,20)}{p_X(20)} = \frac{0.05}{0.1} = 0.5, \\[5pt] p_{Y|X}(150|20) &= \frac{p_{X,Y}(150,20)}{p_X(20)} =\frac{0.01}{0.1} = 0.1. \end{align*}\]
Note that the calculation of \(p_{Y|X}(75|20)\) is consistent with the solution of Example 4.1.2.

Let \(y\) be some fixed outcome for the random variable \(Y\). The sum of conditional probabilities \(p_{X|Y}(x|y)\) over all values of \(x\) will sum to \(1\) since this is a complete set of possible outcomes. This can be verified for the solution to Example 4.1.4, for example:

\[p_{Y|X}(15|0) + p_{Y|X}(75|0) + p_{Y|X}(150|0) = 0.368 + 0.579 + 0.053 = 1.\]

Check that the sum, over all possible outcomes \(x\), of values \(p(x \mid 10)\) and values \(p(x \mid 20)\) respectively, both equal \(1\).

Consider two discrete random variables \(X\) and \(Y\). The conditional CDF of \(X\) given \(Y=y\) is \[F_{X|Y}(x|y) = \sum_{x' \leq x} p_{X|Y}(x'|y).\]

It follows from Definition 4.1.5 that \(F_{X|Y}(x|y) = P(X \leq x|Y=y)\).

4.2 Continuous Conditional Distributions

Consider the Nike versus Adidas example at the opening of the chapter. This is a conditional probability problem but now in the continuous setting. For two continuous random variables \(X\) and \(Y\), the formula \(P(Y=y \mid X=x) = \frac{P(X=x \, \cap \, Y=y)}{P(X=x)}\) breaks down because \(P(X=x) = 0\) for a fixed value \(x\) since \(X\) is continuous.

The theory of discrete random variables in Section 4.1 motivates the definition of conditional random variables in the continuous setting.

Consider two continuous random variables \(X\) and \(Y\). Let \(f_{X,Y}\) be the joint PDF of \(X\) and \(Y\), and \(f_Y(y)\) be the marginal PDF of \(Y\). The conditional PDF of \(X\) given that \(Y=y\) is defined by

\[f_{X|Y}(x|y) = \begin{cases} \frac{f_{X,Y}(x,y)}{f_Y(y)}, & \text{if } f_Y(y)>0, \\[5pt] 0, & \text{if } f_Y(y)=0. \end{cases}\]

This definition avoids the above problem that \(f_Y(y) \neq P(Y=y)\).

Consider the two random variables \(X,Y\) from Example 2.1.6 governing scores in a game played between Abbie and Bertie. The joint PDF is

\[ f_{X,Y}(x,y) = \begin{cases} 24x(1-x-y), & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[5pt] 0, & \text{otherwise.} \end{cases}\]

Find the conditional PDF of Abbie’s score given that Bertie scored \(\frac{1}{2}\).



In mathematical language, the question asks us to calculate \(f_{X \mid Y}\left( x \mid \frac{1}{2} \right)\). In Example 2.1.8, we found

\[f_Y(y) = \begin{cases} 4(1-y)^3, & 0 \leq y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases}\]
Therefore by Definition 4.1.4,
\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \begin{cases} \frac{24x(1-x-y)}{4(1-y)^3}, & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases} \\[5pt] &= \begin{cases} \frac{6x(1-x-y)}{(1-y)^3}, & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases} \end{align*}\]
Setting \(y=\frac{1}{2}\) obtain
\[\begin{align*} f_{X|Y}\left(x \bigg| \frac{1}{2}\right) &= \frac{f_{X,Y}(x,\frac{1}{2})}{f_Y(\frac{1}{2})} \\[5pt] &= \begin{cases} \frac{6x \left( \frac{1}{2}-x \right)}{\left( \frac{1}{2} \right)^3}, & \text{if } 0 \leq x \leq \frac{1}{2}, \\[3pt] 0, & \text{otherwise,} \end{cases} \\[5pt] &= \begin{cases} 48x \left( \frac{1}{2}-x \right) , & \text{if } 0 \leq x \leq \frac{1}{2}, \\[3pt] 0, & \text{otherwise.} \end{cases} \end{align*}\]

Let \(y\) be some fixed outcome for the random variable \(Y\). It can be shown that \(\int_{\mathbb{R}} \, f_{X \mid Y} (x \mid y) \,dx =1\). This is the continuous analogous to the result for discrete random variable that \(\sum\limits_{x} f_{X \mid Y} (x \mid y) =1\) for a fixed value \(y\).

Let \(S\) be a subset of \(\mathbb{R}\). What is the probability that the random variable \(X\) belongs to \(S\) given that \(Y=y\)? This can be calculated using the conditional PDF \(f_{X|Y}(x|y)\):

\[P(X \in S \mid Y=y) = \int_S f_{X \mid Y}(x \mid y) \, dx.\]

This leads us to the following definition.

Consider two continuous random variables \(X\) and \(Y\). The conditional CDF of \(X\) given \(Y=y\) is

\[ F_{X|Y}(x|y) = \int_{-\infty}^x f_{X|Y}(u|y) \,du.\]

It follows from Definition 4.2.3 that \(F_{X|Y}(x|y) = P(X \leq x|Y=y)\).

4.3 Independence

Recall the statement of independent random variables Definition 2.3.1 that states two random variables \(X,Y\) are independent if
\[f_{X,Y}(x,y) = f_{X}(x)f_{Y}(y) \qquad \text{for all } x,y.\]

Substituting this identify into Definition 4.2.1, gives an equivalent condition for two random variables to be independent. Specifically for any \(y\) such that \(f_Y(y)>0\):

\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \frac{f_X(x)f_Y(y)}{f_Y(y)} \\[5pt] &= f_X(x), \qquad \forall\,x\in\mathbb R. \end{align*}\]

``

This is to say that the conditional probability density function \(f_{X|Y}(x \mid y)\) is equal to the probability density function \(f_X(x)\), and does not depend on \(y\) at all. This makes sense: the information on the outcome of \(Y\) is irrelevant to the outcome of \(X\) since both are independent.

Consider again the game played by Annie and Bertie in Example 4.2.2, the scores of which are governed by the random variables \(X\) and \(Y\). We saw in Example 2.3.3 that \(X\) and \(Y\) are dependent. Verify this using the new equivalent definition for independence.



In Example 4.2.2, we calculated that the conditional PDF \(f_{X \mid Y}(x \mid y)\) is given by

\[f_{X|Y}(x|y) = \begin{cases} \frac{6x(1-x-y)}{(1-y)^3}, & \text{if } x,y \geq 0 \text{ and } x+y \leq 1, \\[3pt] 0, & \text{otherwise.} \end{cases}\]

From Example 2.1.8:

\[f_{X}(x) = \begin{cases} 12x(1-x)^2, & \text{if } 0 \leq x \leq 1, \\ 0, & \text{otherwise.} \end{cases}\]
Clearly \(f_{X|Y}(x|y) \neq f_{X}(x)\) so \(X\) and \(Y\) are dependent.

Note the result of Example 4.3.1 could have been deduced solely from the fact that the expression for \(f_{X|Y}(x|y)\) contains \(y\): without looking at \(f_{X}(x)\) explicitly we know it will contain no reference to \(y\) and so cannot be equal to \(f_{X|Y}(x|y)\).

Similarly independence is equivalent to the condition that \(f_{Y|X}(y|x) = f_{Y}(y)\). This criteria is deduced in an analogous fashion to the above identity \(f_{X|Y}(x|y) = f_{X}(x)\).

4.4 Conditional Expectation

In the Nike versus Adidas example at the opening of the chapter, two random variables were considered: \(S_{\text{Nike}}\) and \(S_{\text{Adidas}}\). One piece of information that we could give to Phil Knight is the expectation of \(S_{\text{Adidas}}\). However this doesn’t take into account all the information at our disposable. We know the value that the random variable \(S_{\text{Nike}}\) takes. The value we really want to calculate is the expectation of \(S_{\text{Adidas}}\) given the value the \(S_{\text{Nike}}\).

Posing this question in a general setting, motivates the following definition:

The conditional expectation of \(X\) given \(Y=y\), is defined by \[\mathbb{E}[X|Y=y] = \begin{cases} \sum\limits_x xp_{X|Y}(x|y), & \text{if $X$ is discrete,} \\[7pt] \int_{-\infty}^\infty xf_{X|Y}(x|y) \,dx, & \text{if $X$ is continuous.} \end{cases} \] where either \(p_{X|Y}(x|y)\) is the conditional PMF of \(X\) given \(Y\), or \(f_{X|Y}(x|y)\) is the conditional PDF of \(X\) given \(Y\).

Note that this definition covers both discrete and continuous random variables.

You are seeking to buy a car, and want to estimate the cost. You find an article online that has collected data on cars listed across various sites. The article models the cost of the car by a random variable \(X\) and the age of the car by a random variable \(Y\). The probability of finding a car with \(X=x\) and \(Y=y\) is governed by the joint PDF \[ f_{X,Y}(x,y)= \begin{cases} \frac{1}{y} e^{-(\frac{x}{y}+y)}, & 0<x,y<\infty, \\[3pt] 0, & \text{otherwise.}\end{cases} \]

For some fixed age of car \(y>0\), find the expected cost.



Mathematically the question is asking us to calculate \(E[X|Y=y]\).

In general it is enough to consider only the region on which \(f_{X,Y}(x,y)\) is non-zero.

First we want to calculate \(f_{X \mid Y}(x \mid y)\). To do so, first calculate the marginal distribution \(f_Y(y)\):

\[\begin{align*} f_Y(y) &= \int_{-\infty}^\infty f_{X,Y}(x,y) \,dx \\[3pt] &= \int_0^\infty \frac{1}{y}e^{-(\frac{x}{y}+y)} \,dx \\[3pt] &= \left[ - e^{-\frac{x}{y}-y} \right]_{0}^{\infty} \\[3pt] &= e^{-y} \end{align*}\]

Hence, for \(y>0\), the conditional probability density function is

\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \begin{cases} \frac{\frac{1}{y}e^{-\left(\frac{x}{y}+y\right)}}{e^{-y}} & \text{if } x>0,\\[3pt] 0, & \text{if } x \leq 0. \end{cases} \\[5pt] &= \begin{cases} \frac{1}{y}e^{-\frac{x}{y}} & \text{if } x>0,\\[3pt] 0, & \text{if } x \leq 0. \end{cases} \end{align*}\]

Therefore by Definition 4.3.1 the conditional expectation of \(X\) is

\[\begin{align*} E[X|Y=y] &= \int_{-\infty}^\infty x f_{X|Y}(x|y) \,dx \\[3pt] &= \int_0^\infty\frac{x}{y}e^{-\frac{x}{y}} \,dx \\[3pt] &= \left[ -x e^{-\frac{x}{y}} \right]_{0}^{\infty} + \int_{0}^{\infty} e^{-\frac{x}{y}} \,dx \\[3pt] &= 0 + \left[ -ye^{-\frac{x}{y}} \right]_{0}^{\infty} \\[3pt] &=y. \end{align*}\] where the third equality follows from integration by parts.

Critique the model that the article uses in Example 4.4.2.

Many of the nice properties of expectation such as linearity are inherited by conditional expectation.

Let \(a\) be a real number, and \(X,Y,Z\) be random variables. Each of the following rules pertaining to conditional expectation given \(Z=z\) hold in general

\[\begin{align*} \mathbb{E}[aX \mid Z=z] &= a \mathbb{E}[X \mid Z=z], \\[5pt] \mathbb{E}[X+Y \mid Z=z] &= \mathbb{E}[X \mid Z=z] + \mathbb{E}[Y \mid Z=z]. \end{align*}\]

Similarly to the case of joint expectation in Definition 3.1.1, the definition of conditional expectation can extend to functions of random variables:

Let \(X,Y\) be continuous random variables, with conditional PDF \(f_{X \mid Y}\). Then, for a function \(g(X)\) of \(X\), we have \[\mathbb{E}[g(X)\mid Y=y] = \int_{-\infty}^{\infty} g(x) f_{X\mid Y}(x\mid y) \,dx.\]

Consider the random variables \(X\) and \(Y\) from Example 4.4.2. Calculate \(\mathbb{E}[X^2 \mid Y=y]\).



In Example 4.4.2, we found

\[\begin{align*} f_{X|Y}(x|y) &= \frac{f_{X,Y}(x,y)}{f_Y(y)} \\[5pt] &= \begin{cases} \frac{\frac{1}{y}e^{-\left(\frac{x}{y}+y\right)}}{e^{-y}} & \text{if } x>0,\\[3pt] 0, & \text{if } x \leq 0. \end{cases} \\[5pt] &= \begin{cases} \frac{1}{y}e^{-\frac{x}{y}} & \text{if } x>0,\\[3pt] 0, & \text{if } x \leq 0. \end{cases} \end{align*}\]

Therefore applying Theorem 4.4.4

\[\begin{align*} \mathbb{E} \left[ X^2 \mid Y=y \right] &= \int_{-\infty}^{\infty} x^2 f_{X\mid Y}(x \mid y) \,dx \\[5pt] &= \int_{0}^{\infty} \frac{x^2}{y} e^{- \frac{x}{y}} \,dx \\[5pt] &= \left[ -x^2 e^{-\frac{x}{y}} \right]_0^\infty + \int_0^\infty 2x e^{-\frac{x}{y}} \,dx \\[5pt] &= 2 \int_0^\infty x e^{-\frac{x}{y}} \,dx \\[5pt] &= 2 \left[ -xye^{-\frac{x}{y}} \right]_0^\infty + 2 \int_0^\infty y e^{-\frac{x}{y}} \,dx \\[5pt] &= 2 \left[ -y^2 e^{-\frac{x}{y}} \right]_0^\infty \\[5pt] &= 2y^2. \end{align*}\]

Note that integration by parts has been used twice in the evaluation of the integral.

4.5 Conditional Variance

Considering again the opening Nike versus Adidas example, it would be sensible to provide Phil Knight with the variance of \(S_{\text{Adidas}}\). However calculating the variance of \(S_{\text{Adidas}}\) doesn’t utilise our knowledge of the \(S_{\text{Nike}}\) random variable. What we really want to calculate is the variance of \(S_{\text{Adidas}}\) given the value of \(S_{\text{Nike}}\).

Moving this idea to the general setting leads to the following definition:

The conditional variance of \(X\) given \(Y=y\), is defined by \[\text{Var}(X \mid Y=y) = \mathbb{E} \left[ \big(X-\mathbb{E}[X \mid Y=y] \big)^2 \mid Y=y \right]\]

Note that both of the expectations that appear on the right hand side of Definition 4.5.1 are conditional expectations. Calculating the conditional variance using this expression can be computationally difficult. The following proposition offers an alternative method by which to calculate conditional variance.

An equivalent definition of conditional variance is \[\text{Var}(X \mid Y=y) = \mathbb{E} [X^2 \mid Y=y] - \left( \mathbb{E}[X \mid Y=y] \right)^2.\]

The right hand side of Proposition 4.5.2 can be calculated using Theorem 4.4.4.

Consider again the random variables from Example 4.4.2. Calculate the conditional variance of \(X\) given \(Y\).



From Example 4.4.2 and Example 4.4.5, we know

\[\begin{align*} \mathbb{E}[X \mid Y=y] &= y, \\[3pt] \mathbb{E}[X^2 \mid Y=y] &= 2y^2. \end{align*}\]

Therefore by Proposition 4.5.2:

\[\begin{align*} \text{Var}(X \mid Y=y) &= \mathbb{E}[X^2 \mid Y=y ] - \left( \mathbb{E}[X \mid Y] \right)^2 \\[3pt] &= 2y^2 -(y)^2 \\[3pt] &= y^2. \end{align*}\]

There is an interesting result linking the variance to a random variable \(X\) to an expression in terms of the conditional expectations and variances of \(X\) given a second random variable \(Y\). This result is known as the Law of Total Variance:

Let \(X,Y\) be two random variables. Then \[\text{Var}(X) = \mathbb{E}\big[ \text{Var}(X \mid Y=y) \big] + \text{Var} \big( \mathbb{E}[X \mid Y=y] \big).\]

The Law of Total Variance states that the variance of \(X\) decomposes as the sum of the expected variance of \(X\) given \(Y\) and the variance of the expectation of \(X\) given \(Y\). This is a particularly powerful result that will be explored further in MATH1058: Statistical Models and Methods.