In this section, we consider further the joint behaviour of two random variables X and Y, and in particular, studying the conditional distribution of one random variable given the other. We start with discrete random variables and then move onto continuous random variables.
Conditional distribution
Recall that for any two events E and F such that P(F)>0, we defined in Section 4.6 that
P(E|F)=P(E∩F)P(F).
Can we extend this idea to random variables?
Conditional p.m.f.
If
X and
Y are discrete random variables, the
conditional probability mass function of
X given
Y=y is
pX|Y(x|y)=P(X=x|Y=y)={pX,Y(x,y)pY(y)if pY(y)>00otherwise.
where pX,Y(x,y) is the joint p.m.f. of X and Y and pY(y) is the marginal p.m.f. of Y for any x and y such that pY(y)>0.
Note that
- Conditional probabilities are non-negative:
P(X=x|Y=y)=pX,Y(x,y)pY(y)≥0.
- The sum of conditional probabilities over all values of x for some fixed value of y is 1:
∑xP(X=x|Y=y)=∑xpX,Y(x,y)pY(y)=1pY(y)∑xpX,Y(x,y)=1pY(y)pY(y)=1.
This implies that P(X=x|Y=y) is itself a p.m.f.
Conditional c.d.f. (discrete random variable)
If
X and
Y are discrete random variables, the
conditional (cumulative) probability distribution function of
X given
Y=y is
FX|Y(x|y)=P(X≤x|Y=y)=∑x′≤xpX|Y(x′|y).
Suppose the joint p.m.f. of X and Y is given by the following probability table.
x=0 |
0 |
142 |
242 |
342 |
x=1 |
242 |
342 |
442 |
542 |
x=2 |
442 |
542 |
642 |
742 |
Determine the conditional p.m.f. of Y given X=1.
pY|X(y|x=1)=pX,Y(x=1,y)pX(x=1)=pX,Y(x=1,y)14/42.
The conditional p.m.f. of Y given X=1 is therefore
pY|X(y|x=1)=⎧⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪⎨⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪⎩2/4214/42=214,if y=0,3/4214/42=314,if y=1,4/4214/42=414,if y=2,5/4214/42=514,if y=3.
We cannot extend this idea to the continuous case directly since for a continuous random variable Y, and for any fixed value y, one has PY(Y=y)=0.
Conditional p.d.f.
If
X and
Y have a joint p.d.f.
fX,Y, then the
conditional probability density function of
X, given that
Y=y, is defined by
fX|Y(x|y)=⎧⎪⎨⎪⎩fX,Y(x,y)fY(y),if fY(y)>0,0,otherwise.
Conditional c.d.f. (continuous random variable)
Furthermore, we can define the conditional (cumulative) probability distribution function of X, given Y=y, as
FX|Y(x|y)=P(X≤x|Y=y)=∫x−∞fX|Y(u|y)du.
Suppose that the joint p.d.f. of
X and
Y is given by
fX,Y(x,y)={24x(1−x−y),if x≥0,y≥0,x+y≤1,0,otherwise.
Find
- the conditional p.d.f. of X given Y=y;
- the conditional p.d.f. of X given Y=12.
- In Section 6.2, Example 6.2.3, we found
fY(y)={4(1−y)3,0≤y≤1,0,otherwise.
Therefore,
fX|Y(x|y)=fX,Y(x,y)fY(y)=⎧⎨⎩24x(1−x−y)4(1−y)3,if x≥0,y≥0,x+y≤1,0,otherwise.
- Therefore setting y=1/2,
fX|Y(x∣∣∣12)=fX,Y(x,12)fY(12)=⎧⎨⎩24x(1/2−x)4(1/2)3=48x(12−x),if 0≤x≤120,otherwise.
Note that conditional pdf’s are themselves pdf’s and have all the properties associated with pdf’s.
Conditional expectation
Conditional Expectation
The
conditional expectation of
X, given
Y=y, is defined by
E[X|Y=y]={∑xxpX|Y(x|y),if X is discrete,∫∞−∞xfX|Y(x|y)dx,if X is continuous.
Since fX|Y(x|y)=fX,Y(x,y)fY(y), then fX,Y(x,y)=fX|Y(x|y)fY(y). Consequently, we can reconstruct the joint p.d.f. (p.m.f.) if we are given either:
Suppose that the joint p.d.f. of
X and
Y is given by
fX,Y(x,y)={e−(xy+y)y−10<x,y<∞,0otherwise.
For y>0, find
- P(X>1|Y=y);
- E[X|Y=y].
Attempt Example 12.2.2 and then watch Video 20 for the solutions.
Video 20: Conditional Distribution and Expectation
Solution to Example 12.2.2.
- For y>0,
fY(y)=∫∞−∞fX,Y(x,y)dx=∫∞0e−(xy+y)y−1dx=y−1e−y[−1ye−xy]∞0=e−y
That is, the marginal distribution of Y is Y∼Exp(1).
Hence, for y>0,
fX|Y(x|y)=fX,Y(x,y)fY(y)={e−x/yy−1if x>0,0,if x≤0.
Therefore the conditional distribution of X|Y=y is Exp(1/y).
Thus,
P(X>1|Y=y)=∫∞1fX|Y(x|y)dx=∫∞1e−x/yy−1dx=e−1/y,
which is the probability an Exp(1/y) random variable takes a value greater than 1.
- Furthermore
E[X|Y=y]=∫∞−∞xfX|Y(x|y)dx=∫∞0xye−x/ydx=y.
As expected since if W∼Exp(1/θ), then E[W]=θ.
In
Example 12.2.2, we can write down the joint distribution of
X and
Y as
Y∼Exp(1);X|Y=y∼Exp(1/y).
Many joint distributions are constructed in a similar manner, the marginal distribution of the first random variable along with the conditional distribution of the second random variable with respect to the first random variable. Such constructions are particularly common in Bayesian statistics. It enables us to understand key properties of the distribution such as conditional means and also to simulate values from the joint distribution.
The marginal distribution of
X does not take a nice form with
fX(x)→∞ as
x↓0, see Figure
12.1.
We consider the link between the conditional expectation of X given Y and the expectation of (the marginal distribution of) X.
Tower Property.
Let
X and
Y be a continuous bivariate distribution with joint p.d.f.,
fX,Y(x,y). Then
E[X]=∫∞−∞E[X|Y=y]fY(y)dy.
Note that
E[X]=∫∞−∞∫∞−∞xfX,Y(x,y)dxdy=∫∞−∞[∫∞−∞xfX|Y(x|y)fY(y)dx]dy=∫∞−∞[∫∞−∞xfX|Y(x|y)dx]fY(y)dy=∫∞−∞E[X|Y=y]fY(y)dy,
as required.
Therefore in
Example 12.2.2,
E[X]=∫∞−∞E[X|Y=y]fY(y)dy,=∫∞0yfY(y)dy=E[Y]=1.
This is far simpler than trying to obtain the marginal distribution of X to compute E[X].
Independent random variables
Recall the definition of independence for random variables given in
Section 6.4. If
X and
Y are independent continuous random variables, then for any
y such that
fY(y)>0:
fX|Y(x|y)=fX,Y(x,y)fY(y)=fX(x)fY(y)fY(y)=fX(x),for all x∈R.