Chapter 4 Independence and Conditional Statistics

4.1 Independence

Recall the statement of independent random variables Definition 2.3.1 that states two random variables X,Y are independent if
fX,Y(x,y)=fX(x)fY(y)for all x,y.

Substituting this identify into Definition 3.2.1, gives an equivalent condition for two random variables to be independent. Specifically for any y such that fY(y)>0:

fX|Y(x|y)=fX,Y(x,y)fY(y)=fX(x)fY(y)fY(y)=fX(x),xR.

This is to say that the conditional probability density function fX|Y(xy) is equal to the probability density function fX(x), and does not depend on y at all. This makes sense: the information on the outcome of Y is irrelevant to the outcome of X since both are independent.

Consider again the game played by Annie and Bertie in Example 3.2.2, the scores of which are governed by the random variables X and Y. We saw in Example 2.3.3 that X and Y are dependent. Verify this using the new equivalent definition for independence.



In Example 4.2.2, we calculated that the conditional PDF fXY(xy) is given by

fX|Y(x|y)={6x(1xy)(1y)3,if x,y0 and x+y1,0,otherwise.

From Example 2.1.8:

fX(x)={12x(1x)2,if 0x1,0,otherwise.
Clearly fX|Y(x|y)fX(x) so X and Y are dependent.

Note the result of Example 4.1.1 could have been deduced solely from the fact that the expression for fX|Y(x|y) contains y: without looking at fX(x) explicitly we know it will contain no reference to y and so cannot be equal to fX|Y(x|y).

Similarly independence is equivalent to the condition that fY|X(y|x)=fY(y). This criteria is deduced in an analogous fashion to the above identity fX|Y(x|y)=fX(x).

4.2 Conditional Expectation

In the Nike versus Adidas example at the opening of the chapter, two random variables were considered: SNike and SAdidas. One piece of information that we could give to Phil Knight is the expectation of SAdidas. However this doesn’t take into account all the information at our disposable. We know the value that the random variable SNike takes. The value we really want to calculate is the expectation of SAdidas given the value the SNike.

Posing this question in a general setting, motivates the following definition:

The conditional expectation of X given Y=y, is defined by E[X|Y=y]={xxpX|Y(x|y),if X is discrete,xfX|Y(x|y)dx,if X is continuous. where either pX|Y(x|y) is the conditional PMF of X given Y, or fX|Y(x|y) is the conditional PDF of X given Y.

Note that this definition covers both discrete and continuous random variables.

You are seeking to buy a car, and want to estimate the cost. You find an article online that has collected data on cars listed across various sites. The article models the cost of the car by a random variable X and the age of the car by a random variable Y. The probability of finding a car with X=x and Y=y is governed by the joint PDF fX,Y(x,y)={1ye(xy+y),0<x,y<,0,otherwise.

For some fixed age of car y>0, find the expected cost.



Mathematically the question is asking us to calculate E[X|Y=y].

In general it is enough to consider only the region on which fX,Y(x,y) is non-zero.

First we want to calculate fXY(xy). To do so, first calculate the marginal distribution fY(y):

fY(y)=fX,Y(x,y)dx=01ye(xy+y)dx=[exyy]0=ey

Hence, for y>0, the conditional probability density function is

fX|Y(x|y)=fX,Y(x,y)fY(y)={1ye(xy+y)eyif x>0,0,if x0.={1yexyif x>0,0,if x0.

Therefore by Definition 4.2.1 the conditional expectation of X is

E[X|Y=y]=xfX|Y(x|y)dx=0xyexydx=[xexy]0+0exydx=0+[yexy]0=y. where the third equality follows from integration by parts.

Critique the model that the article uses in Example 4.2.2.

Many of the nice properties of expectation such as linearity are inherited by conditional expectation.

Let a be a real number, and X,Y,Z be random variables. Each of the following rules pertaining to conditional expectation given Z=z hold in general

E[aXZ=z]=aE[XZ=z],E[X+YZ=z]=E[XZ=z]+E[YZ=z].

Similarly to the case of joint expectation in Definition 2.5.4, the definition of conditional expectation can extend to functions of random variables:

Let X,Y be continuous random variables, with conditional PDF fXY. Then, for a function g(X) of X, we have E[g(X)Y=y]=g(x)fXY(xy)dx.

Consider the random variables X and Y from Example 4.2.2. Calculate E[X2Y=y].



In Example 4.2.2, we found

fX|Y(x|y)=fX,Y(x,y)fY(y)={1ye(xy+y)eyif x>0,0,if x0.={1yexyif x>0,0,if x0.

Therefore applying Theorem 4.2.4

E[X2Y=y]=x2fXY(xy)dx=0x2yexydx=[x2exy]0+02xexydx=20xexydx=2[xyexy]0+20yexydx=2[y2exy]0=2y2.

Note that integration by parts has been used twice in the evaluation of the integral.

4.3 Conditional Variance

Considering again the opening Nike versus Adidas example, it would be sensible to provide Phil Knight with the variance of SAdidas. However calculating the variance of SAdidas doesn’t utilise our knowledge of the SNike random variable. What we really want to calculate is the variance of SAdidas given the value of SNike.

Moving this idea to the general setting leads to the following definition:

The conditional variance of X given Y=y, is defined by Var(XY=y)=E[(XE[XY=y])2Y=y]

Note that both of the expectations that appear on the right hand side of Definition 4.3.1 are conditional expectations. Calculating the conditional variance using this expression can be computationally difficult. The following proposition offers an alternative method by which to calculate conditional variance.

An equivalent definition of conditional variance is Var(XY=y)=E[X2Y=y](E[XY=y])2.

The right hand side of Proposition 4.3.2 can be calculated using Theorem 4.2.4.

Consider again the random variables from Example 4.2.2. Calculate the conditional variance of X given Y.



From Example 4.2.2 and Example 4.2.5, we know

E[XY=y]=y,E[X2Y=y]=2y2.

Therefore by Proposition 4.3.2:

Var(XY=y)=E[X2Y=y](E[XY])2=2y2(y)2=y2.

There is an interesting result linking the variance to a random variable X to an expression in terms of the conditional expectations and variances of X given a second random variable Y. This result is known as the Law of Total Variance:

Let X,Y be two random variables. Then Var(X)=E[Var(XY=y)]+Var(E[XY=y]).

The Law of Total Variance states that the variance of X decomposes as the sum of the expected variance of X given Y and the variance of the expectation of X given Y. This is a particularly powerful result that will be explored further in MATH1058: Statistical Models and Methods.