Chapter 12 Conditional Distribution and Conditional Expectation

In this section, we consider further the joint behaviour of two random variables X and Y, and in particular, studying the conditional distribution of one random variable given the other. We start with discrete random variables and then move onto continuous random variables.

12.1 Conditional distribution

Recall that for any two events E and F such that P(F)>0, we defined in Section 4.6 that P(E|F)=P(EF)P(F).

Can we extend this idea to random variables?

Conditional p.m.f.

If X and Y are discrete random variables, the conditional probability mass function of X given Y=y is
pX|Y(x|y)=P(X=x|Y=y)={pX,Y(x,y)pY(y)if pY(y)>00otherwise.

where pX,Y(x,y) is the joint p.m.f. of X and Y and pY(y) is the marginal p.m.f. of Y for any x and y such that pY(y)>0.

Note that

  • Conditional probabilities are non-negative:
    P(X=x|Y=y)=pX,Y(x,y)pY(y)0.
  • The sum of conditional probabilities over all values of x for some fixed value of y is 1:
    xP(X=x|Y=y)=xpX,Y(x,y)pY(y)=1pY(y)xpX,Y(x,y)=1pY(y)pY(y)=1.
    This implies that P(X=x|Y=y) is itself a p.m.f.

Conditional c.d.f. (discrete random variable)

If X and Y are discrete random variables, the conditional (cumulative) probability distribution function of X given Y=y is
FX|Y(x|y)=P(Xx|Y=y)=xxpX|Y(x|y).


Suppose the joint p.m.f. of X and Y is given by the following probability table.

X/Y y=0 y=1 y=2 y=3
x=0 0 142 242 342
x=1 242 342 442 542
x=2 442 542 642 742

Determine the conditional p.m.f. of Y given X=1.

pY|X(y|x=1)=pX,Y(x=1,y)pX(x=1)=pX,Y(x=1,y)14/42.

The conditional p.m.f. of Y given X=1 is therefore

pY|X(y|x=1)={2/4214/42=214,if y=0,3/4214/42=314,if y=1,4/4214/42=414,if y=2,5/4214/42=514,if y=3.

We cannot extend this idea to the continuous case directly since for a continuous random variable Y, and for any fixed value y, one has PY(Y=y)=0.

Conditional p.d.f.

If X and Y have a joint p.d.f. fX,Y, then the conditional probability density function of X, given that Y=y, is defined by
fX|Y(x|y)={fX,Y(x,y)fY(y),if fY(y)>0,0,otherwise.

Conditional c.d.f. (continuous random variable)

Furthermore, we can define the conditional (cumulative) probability distribution function of X, given Y=y, as FX|Y(x|y)=P(Xx|Y=y)=xfX|Y(u|y)du.


Suppose that the joint p.d.f. of X and Y is given by
fX,Y(x,y)={24x(1xy),if x0,y0,x+y1,0,otherwise.

Find

  1. the conditional p.d.f. of X given Y=y;
  2. the conditional p.d.f. of X given Y=12.
  1. In Section 6.2, Example 6.2.3, we found
    fY(y)={4(1y)3,0y1,0,otherwise.
    Therefore,
    fX|Y(x|y)=fX,Y(x,y)fY(y)={24x(1xy)4(1y)3,if x0,y0,x+y1,0,otherwise.

  2. Therefore setting y=1/2,
    fX|Y(x|12)=fX,Y(x,12)fY(12)={24x(1/2x)4(1/2)3=48x(12x),if 0x120,otherwise.


Note that conditional pdf’s are themselves pdf’s and have all the properties associated with pdf’s.

12.2 Conditional expectation

Conditional Expectation

The conditional expectation of X, given Y=y, is defined by
E[X|Y=y]={xxpX|Y(x|y),if X is discrete,xfX|Y(x|y)dx,if X is continuous.

Since fX|Y(x|y)=fX,Y(x,y)fY(y), then fX,Y(x,y)=fX|Y(x|y)fY(y). Consequently, we can reconstruct the joint p.d.f. (p.m.f.) if we are given either:

  • the conditional p.d.f. (p.m.f.) of X given Y=y and the marginal p.d.f. (p.m.f.) of Y;

  • the conditional p.d.f. (p.m.f.) of Y given X=x and the marginal p.d.f. (p.m.f.) of X.


Suppose that the joint p.d.f. of X and Y is given by
fX,Y(x,y)={e(xy+y)y10<x,y<,0otherwise.

For y>0, find

  1. P(X>1|Y=y);
  2. E[X|Y=y].

Attempt Example 12.2.2 and then watch Video 20 for the solutions.

Video 20: Conditional Distribution and Expectation

Solution to Example 12.2.2.
  1. For y>0,
    fY(y)=fX,Y(x,y)dx=0e(xy+y)y1dx=y1ey[1yexy]0=ey
    That is, the marginal distribution of Y is YExp(1).
    Hence, for y>0,
    fX|Y(x|y)=fX,Y(x,y)fY(y)={ex/yy1if x>0,0,if x0.
    Therefore the conditional distribution of X|Y=y is Exp(1/y).
    Thus,
    P(X>1|Y=y)=1fX|Y(x|y)dx=1ex/yy1dx=e1/y,
    which is the probability an Exp(1/y) random variable takes a value greater than 1.
  2. Furthermore
    E[X|Y=y]=xfX|Y(x|y)dx=0xyex/ydx=y.
    As expected since if WExp(1/θ), then E[W]=θ.


In Example 12.2.2, we can write down the joint distribution of X and Y as
YExp(1);X|Y=yExp(1/y).

Many joint distributions are constructed in a similar manner, the marginal distribution of the first random variable along with the conditional distribution of the second random variable with respect to the first random variable. Such constructions are particularly common in Bayesian statistics. It enables us to understand key properties of the distribution such as conditional means and also to simulate values from the joint distribution.

The marginal distribution of X does not take a nice form with fX(x) as x0, see Figure 12.1.
Plot of the p.d.f. of $X$.

Figure 12.1: Plot of the p.d.f. of X.


We consider the link between the conditional expectation of X given Y and the expectation of (the marginal distribution of) X.

Tower Property.

Let X and Y be a continuous bivariate distribution with joint p.d.f., fX,Y(x,y). Then
E[X]=E[X|Y=y]fY(y)dy.
Note that
E[X]=xfX,Y(x,y)dxdy=[xfX|Y(x|y)fY(y)dx]dy=[xfX|Y(x|y)dx]fY(y)dy=E[X|Y=y]fY(y)dy,

as required.

Therefore in Example 12.2.2,
E[X]=E[X|Y=y]fY(y)dy,=0yfY(y)dy=E[Y]=1.

This is far simpler than trying to obtain the marginal distribution of X to compute E[X].

12.3 Independent random variables

Recall the definition of independence for random variables given in Section 6.4. If X and Y are independent continuous random variables, then for any y such that fY(y)>0:
fX|Y(x|y)=fX,Y(x,y)fY(y)=fX(x)fY(y)fY(y)=fX(x),for all xR.

Student Exercise

Attempt the exercise below.


Suppose that the joint probability density function of X and Y is
fX,Y(x,y)={3y(x+14y)0x,y10otherwise.

Find

  1. the conditional probability density function of X given Y=y, where y(0,1],
  2. E[X|Y=y], for y(0,1],
  3. P(X>12|Y=1).
Solution to Exercise 12.1.
  1. For y[0,1],
    fY(y)=013y(x+14y)dx=[3y(12x2+14xy)]01=3y(12+14y)=34y(2+y).
    Hence, for y(0,1],
    fX|Y(x|y)={fX,Y(x,y)fY(y),0x1,0otherwise,={3y(x+14y)34y(2+y),0x1,0otherwise,={4x+y2+y,0x1,0otherwise.
  2. For y(0,1],
    E[X|Y=y]=01xfX|Y(x|y)dx=01x4x+y2+ydx=12+y01x(4x+y)dx=12+y[43x3+12x2y]01=12+y(43+12y)=8+3y6(2+y).
  3. Fixing Y=1,
    P(X>12|Y=1)=1/21fX|Y(x|y=1)dx=131/21(4x+1)dx=13[2x2+x]1/21=13(2+12412)=23.