## 3.3 Conditional expectation

### 3.3.1 Law of total expectation

**Theorem 2.3 (Law of Total Expectation (LTE))**\[ \textrm{E}(Y) = \textrm{E}(\textrm{E}(Y|X)) \]

**Example 2.63 **
Suppose you construct a “random rectangle” as follows. The base \(X\) is a random variable with a Uniform(0, 1) distribution. The height \(Y\) is a random variable whose conditional distribution given \(X=x\) is Uniform(0, \(x\)). We are interested in \(\textrm{E}(Y)\) the expected value of the height of the rectangle.

- Explain how you could use the Uniform(0, 1) spinner to simulate an \((X, Y)\) pair.
- Explain how you could use simulation to approximate \(\textrm{E}(Y)\).
- Find \(\textrm{E}(Y|X=0.5)\).
- Find \(\textrm{E}(Y|X=0.2)\).
- Find \(\textrm{E}(Y|X=x)\) for a generic \(x\in(0, 1)\).
- Identify the random variable \(\textrm{E}(Y|X)\).
- Use LTE to find \(\textrm{E}(Y)\).
- Sketch a plot of the joint distribution of \((X, Y)\).
- Sketch a plot of the marginal distribution of \(Y\). Be sure to specify the possible values. Is it Uniform?
- What would you need to do to find \(\textrm{E}(Y)\) using the definition of expected value?

*Solution*to Example 2.63

- Spin the spinner twice and let \(U_1\) be the result of the first spin and \(U_2\) the result of the second. Let \(X=U_1\). Now consider an example. Given \(X=0.2\), we want \(Y\) to have a Uniform(0, 0.2) distribution. So we could take the result of the second spin (on a (0, 1) scale) and multiply by 0.2 to get a value on a (0, 0.2) scale. (Remember: linear rescaling only changes the possible value and not the shape of the distribution; see Section 2.6.4.) That is, conditional on \(X=0.2\), \(0.2U_2\) will have a Uniform(0, 0.2) distribution. Conditional on a general \(x\), \(xU_2\) will have a Uniform(0, \(x\)) distribution. So if we define the random variable \(Y\) as \(Y=XU_2\), then the conditional distribution of \(Y\) given \(X=x\) will be Uniform(0, \(x\)).
- Simulate many \((X, Y)\) pairs in the above manner, and find the average of the simulated \(Y\) values to approximate \(\textrm{E}(Y)\).
- The conditional distribution of \(Y\) given \(X=0.5\) is Uniform(0, 0.5) so \(\textrm{E}(Y|X=0.5)=0.5/2 = 0.25\).
- The conditional distribution of \(Y\) given \(X=0.2\) is Uniform(0, 0.2) so \(\textrm{E}(Y|X=0.2)=0.2/2 = 0.10\).
- For \(x\in(0, 1)\), the conditional distribution of \(Y\) given \(X=x\) is Uniform(0, \(x\)) so \(\textrm{E}(Y|X=0.2)=x/2\). Note that for any particular \(x\), \(\textrm{E}(Y|X=x)\) is a
*number*(e.g., \(\textrm{E}(Y|X=0.2)= 0.10\)). - \(\textrm{E}(Y|X)=X/2\). Recall that \(\textrm{E}(Y|X)\) is a random variable, and moreover a function of \(X\). From the previous part we can see that \(x/2\) maps \(x\mapsto\textrm{E}(Y|X=x)\), so \(\textrm{E}(Y|X) = X/2\).
- Use LTE. Remember that non-random constants pop out of expected values. \[ \textrm{E}(Y) = \textrm{E}(\textrm{E}(Y|X)) = \textrm{E}(X/2) = \textrm{E}(X)/2 = (0.5)/2 = 0.25 \]
- The \(x\) values will be uniformly distributed between 0 and 1. For each \(x\), the \(y\) values will be uniformly distributed along the vertical strip between 0 and \(x\). The \((X, Y)\) pairs will lie in the triangular region, \(\{(x,y):0<y<x<1\}\), but since the vertical strips are shorter for smaller values of \(x\), the density will be higher when \(x\) is small than when \(x\) is large. The joint pdf is \[ f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) = \frac{1}{x}(1), \qquad 0<y<x<1 \]
- Unconditionally, \(Y\) can take any value between 0 and 1. Collapse \(x\) values in the joint pdf. By looking the horizontal strips in the joint pdf plot, we see that the density of \(Y\) will be higher when \(y\) is near 0. We can find the marginal pdf by integrating out the \(x\). For a fixed \(y\) in (0, 1) the joint density is positive only if \(x\) is in \((y, 1)\). \[ f_Y(y) = \int f_{X,Y}(x,y) dx = \int_y^1 \frac{1}{x} dx = -\log(y), \qquad 0<y<1 \]
- In order to find \(\textrm{E}(Y)\) using the definition of expected value, you would need to (1) find the joint pdf of \((X, Y)\), (2) integrate the joint pdf with respect to \(x\) to find the marginal pdf of \(Y\), and (3) then integrate \(\int y f_Y(y) dy\) to find \(\textrm{E}(Y)\): \[ \int_0^1 y \left(-\log(y)\right)dy= \]

### 3.3.2 “Taking out what is known”

**Theorem 2.4 (Taking out what is known (TOWIK))**\[ \textrm{E}(g(X)Y|X) = g(X)\textrm{E}(Y|X) \]

TOWIK is a relationship between *random variables*.

Let \(x\) be a particular possible value of \(X\). Then \(g(x)\) is just a number. The conditional distribution of the random variable \(g(X)Y\) given \(X=x\) is the same as the conditional disribution of the random variable \(g(x)Y\) given \(X=x\). Therefore \(\textrm{E}(g(X)Y|X=x) = \textrm{E}(g(x)Y|X=x)= g(x)\textrm{E}(Y|X=x)\), where \(g(x)\) pops out of the expected value since it is just a number.

**Example 2.64 **
(Continuing Example 2.63.) Suppose you construct a “random rectangle” as follows. The base \(X\) is a random variable with a Uniform(0, 1) distribution. The height \(Y\) is a random variable whose conditional distribution given \(X=x\) is Uniform(0, \(x\)). We are interested in \(\textrm{E}(XY)\) the expected value of the area of the rectangle.

- Explain how you could use simulation to approximate \(\textrm{E}(XY)\).
- Find \(\textrm{E}(XY|X=0.5)\).
- Find \(\textrm{E}(XY|X=0.2)\).
- Find \(\textrm{E}(XY|X=x)\) for a generic \(x\in(0, 1)\). How does \(\textrm{E}(XY|X=x)\) relate to \(\textrm{E}(Y|X=x)\)?
- Identify the random variable \(\textrm{E}(XY|X)\). How does \(\textrm{E}(XY|X)\) relate to \(\textrm{E}(Y|X)\)?
- Use LTE to find \(\textrm{E}(XY)\).
- Find \(\textrm{Cov}(X, Y)\). Does the sign of the covariance make sense?

*Solution*to Example 2.64

- Recall the solution of Example 2.63 for how to simulate an \((X, Y)\) pair. Simulate many \((X, Y)\) pairs, for each pair compute the product \(XY\), and find the average of the simulated \(XY\) values to approximate \(\textrm{E}(XY)\).
- The conditional distribution of \(XY\) given \(X=0.5\) is the same as the conditional distribution of \(0.5Y\) given \(X=0.5\). Conditional on \(X=0.5\) we treat \(X\) like the constant 0.5. So \(\textrm{E}(XY|X=0.5)=\textrm{E}(0.5Y|X=0.5) = 0.5\textrm{E}(Y|X=0.5) = 0.5(0.25)=0.125\).
- Similar to the previous part, \(\textrm{E}(XY|X=0.2)=\textrm{E}(0.2Y|X=0.2) = 0.2\textrm{E}(Y|X=0.2) = 0.2(0.1) = 0.02\). Notice that even after replacing \(X\) with 0.2 we can’t drop the conditioning, since the condition \(X=2\) changes the distribution of \(Y\); \(\textrm{E}(Y|X=0.2)=0.1\) is not the same as \(\textrm{E}(Y)=0.25\).
- Observe the pattern in the two previous parts and replace 0.5 and 0.2 with a generic \(x\): \(\textrm{E}(XY|X=x)=\textrm{E}(xY|X=x) = x\textrm{E}(Y|X=x)\). So we have \(\textrm{E}(XY|X=x)=x\textrm{E}(Y|X=x)\); conditioning on \(X=x\), we treat \(X\) as the non-random constant \(x\) and so it pops out of the expected value, just like 0.5 and 0.2 did. Recalling from Example 2.63 that \(\textrm{E}(Y|X=x)=x/2\) we have \(\textrm{E}(XY|X=x)=x\textrm{E}(Y|X=x)=x(x/2)=x^2/2\). Note that for any particular \(x\), \(\textrm{E}(XY|X=x)\) is a
*number*(e.g., \(\textrm{E}(XY|X=0.2)= 0.2^2/2 = 0.02\)). - \(\textrm{E}(XY|X)=X\textrm{E}(Y|X)\) and moreover \(\textrm{E}(XY|X)=X^2/2\). Recall that \(\textrm{E}(XY|X)\) is a random variable, and moreover a function of \(X\). From the previous part we can see that \(x^2/2\) maps \(x\mapsto\textrm{E}(Y|X=x)\), so \(\textrm{E}(XY|X) = X^2/2\).

- Use LTE. Remember that non-random constants pop out of expected values. \[ \textrm{E}(XY) = \textrm{E}(\textrm{E}(XY|X)) = \textrm{E}(X\textrm{E}(Y|X))) = \textrm{E}(X^2/2) = \textrm{E}(X^2)/2 = (1/3)/2 = 1/6 \] \(\textrm{E}(X^2)=1/3\) follows either by LOTUS, \(\int_0^1 x^2(1)dx=1/3\), or since \(\textrm{E}(X^2) = \textrm{Var}(X) + (\textrm{E}(X))^2 = 1/12 + (1/2)^2=1/3\) where \(\textrm{E}(X)=1/12\) and \(\textrm{Var}(X)=1/12\) since \(X\) has a Uniform(0, 1) distribution.

A rectangle example like the one above illustrates the ideas behind the law of total expectation and taking out what is known. Suppose \(X\) represents the base of a rectangle and \(Y\) its height; the product \(XY\) represents the area of the rectangle. We can simulate a rectangle by simulating an \((X, Y)\) from the joint distribution, which might be specified by a marginal distribution of one variable and the conditional distribution of the other. After simulating many rectangles, we can compute the average height to estimate \(\textrm{E}(Y)\) and the average area to estimate \(\textrm{E}(XY)\).

To estimate \(\textrm{E}(Y)\) and \(\textrm{E}(XY)\) by conditioning on \(X\) and using the law of total expectation, we first sort and group the rectangles according to the value of their base \(X\).

- One group consists of all the rectangles with a base of \(X=1\). The heights of the rectangles in this group are distributed according to the conditional distribution of \(Y\) given \(X=1\). The average height of the rectangles in this group is \(\textrm{E}(Y|X=1)\). Since all areas in this group have a base of 1, the average area of rectangles in this group is \((1)\textrm{E}(Y|X=1)\).
- Similarly, the average height of the rectangles with base of \(X=2\) is \(\textrm{E}(Y|X=2)\) and the average area is \((2)\textrm{E}(Y|X=2)\).
- Generally, the average height of the rectangles with base of \(X=x\) is \(\textrm{E}(Y|X=x)\) and the average area is \((x)\textrm{E}(Y|X=s)\).

We now have the average height and average area of the rectangles in each group. But not all groups have the same number of rectangles. So when computing the overall average height and average area groups with more rectangles get more weight.