21 Expected Value

The distribution of a random variable specifies the possible values and the probability of any event that involves the random variable.
Characteristics of distributions based on long run averages can be defined as “expected” values.

Example 21.1 Recall the matching problem with $n = 4$ : objects labeled 1, 2, 3, 4, are placed at random in spots labeled 1, 2, 3, 4, with spot 1 the correct spot for object 1, etc. Let the random variable $X$ count the number of objects that are put back in the correct spot. Let $P$ denote the probability measure corresponding to the assumption that the objects are equally likely to be placed in any spot, so that the 24 possible placements are equally.

The distribution of $X$ is displayed below.

x	P(X=x)
0	0.3750
1	0.3333
2	0.2500
4	0.0417

Describe two ways for simulating values of $X$ .
The table below displays 10 simulated values of $X$ . How could you use the results of this simulation to approximate the long run average value of $X$ ? How could you get a better approximation of the long run average?

Repetition	Y
1	0
2	1
3	0
4	0
5	2
6	0
7	1
8	1
9	4
10	2

Rather than adding the 10 values and dividing by 10, how could you simplify the calculation in the previous part?
The table below summarizes 24000 simulated values of $X$ . Approximate the long run average value of $X$ .

Value of X	Number of repetitions
0	8979
1	7993
2	6068
4	960

Recall the distribution of $X$ . What would be the corresponding mathematical formula for the theoretical long run average value of $X$ ? This number is called the “expected value” of $X$ .
Is the expected value the most likely value of $X$ ?
Is the expected value of $X$ the “value that we would expect” on a single repetition of the phenomenon?
Explain in what sense the expected value is “expected”.

Example 21.2

Let $X$ be a random variable which has the Exponential(1) distribution. To motivate the computation of the expected value of a continuous random variable, we’ll first consider a discrete version of $X$ .

How could you use simulation to approximate the long run average value of $X$ ?
Suppose the values of $X$ are truncated¹ to integers. That is, 0.73 is recorded as 0, 1.15 is recorded as 1, 2.999 is recorded as 2, 3.001 is recorded as 3, etc. The following table summarizes 10000 simulated values of $X$ , truncated. Using just these values, how would you approximate the long run average value of $X$ ?

Truncated value of X	Number of repetitions
0	6302
1	2327
2	915
3	287
4	94
5	43
6	22
7	5
8	4
9	1

How could you approximate the probability that the truncated value of $X$ is 0? 1? 2? Suggest a formula for the (approximate) long run average value of $X$ . (Don’t worry if the approximation isn’t great; we’ll see how to improve it.)
Truncating to the nearest integer turns out not to yield a great approximation of the long run average value of $X$ . How could we get a better approximation?
Suppose instead of truncating to an integer, we truncate to the first decimal. For example 0.73 is recorded as 0.7, 1.15 is recorded as 1.1, 2.999 is recorded as 2.9, 3.001 is recorded as 3.0, etc. Suggest a formula for the (approximate) long run average value of $X$ .
We can continue in this way, truncating to the second decimal place, then the third, and so on. Considering what happens in the limit, suggest a formula for the theoretical long run average value of $X$ .

The expected value (a.k.a. expectation a.k.a. mean), of a random variable $X$ defined on a probability space with measure $P$ , is a number denoted $E (X)$ representing the probability-weighted average value of $X$ . Expected value is defined as $\begin{aligned} Discrete X with pmf p_{X} : & E (X) & = \sum_{x} x p_{X} (x) \\ Continuous X with pdf f_{X} : & E (X) & = \int_{- \infty}^{\infty} x f_{X} (x) d x \end{aligned}$
Note well that $E (X)$ represents a single number.
The expected value is the “balance point” (center of gravity) of a distribution.
The expected value of a random variable $X$ is defined by the probability-weighted average according to the underlying probability measure. But the expected value can also be interpreted as the long-run average value, and so can be approximated via simulation.
Read the symbol $E (\cdot)$ as
- Simulate lots of values of what’s inside $(\cdot)$
- Compute the average. This is a “usual” average; just sum all the simulated values and divide by the number of simulated values.

Example 21.3

Let $X$ be a random variable which has the Exponential(1) distribution.

Donny Dont says $E (X) = \int_{0}^{\infty} e^{- x} d x = 1$ . Do you agree?
Compute $E (X)$ .
Compute $P (X = E (X))$ .
Compute $P (X \leq E (X))$ .
Find the median value (50th percentile) of $X$ . Is the median less than, greater than, or equal to the mean? Why does this make sense?

Example 21.4 Recall Example 15.5 in which we assume that $X$ , the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game, has a Poisson(2.3) distribution with pmf

$p_{X} (x) = e^{- 2.3} \frac{{2.3}^{x}}{x!}, x = 0, 1, 2, \dots$

Recall from Example 15.5 that $P (X \leq 13) = 0.9999998$ . Evaluate the pmf for $x = 0, 1, \dots, 13$ and use arithmetic to compute $E (X)$ . (This will technically only give an approximation, since there is non-zero probability that $X > 13$ , but the calculation will give you a concrete example before jumping to the next part.)
Use the pmf and infinite series to compute $E (X)$ .
Interpret $E (X)$ in context.

21.1 “Law of the unconscious statistician” (LOTUS)

Example 21.5

Flip a coin 3 times and let $X$ be the number of flips that result in H, and let $Y = (X - 1.5)^{2}$ . (We will see later why we might be interested in such a transformation.)

Find the distribution of $Y$ .
Compute $E (Y)$ .
How could we have computed $E (Y)$ without first finding the distribution of $Y$ ?
Is $E ((X - 1.5)^{2})$ equal to $(E (X) - 1.5)^{2}$ ?

The “law of the unconscious statistician” (LOTUS) says that the expected value of a transformed random variable can be found without finding the distribution of the transformed random variable, simply by applying the probability weights of the original random variable to the transformed values. $\begin{aligned} Discrete X with pmf p_{X} : & E [g (X)] & = \sum_{x} g (x) p_{X} (x) \\ Continuous X with pdf f_{X} : & E [g (X)] & = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x \end{aligned}$
LOTUS says we don’t have to first find the distribution of $Y = g (X)$ to find $E [g (X)]$ ; rather, we just simply apply the transformation $g$ to each possible value $x$ of $X$ and then apply the corresponding weight for $x$ to $g (x)$ .
Whether in the short run or the long run, in general $\begin{aligned} Average of g (X) & \neq g (Average of X) \end{aligned}$
In terms of expected values, in general $\begin{aligned} E (g (X)) & \neq g (E (X)) \end{aligned}$ The left side $E (g (X))$ represents first transforming the $X$ values and then averaging the transformed values. The right side $g (E (X))$ represents first averaging the $X$ values and then plugging the average (a single number) into the transformation formula.

Example 21.6

Let $X$ be a random variable with a Uniform(-1, 1) distribution and let $Y = X^{2}$ . Recall that in Example 18.7 we found the pdf of $Y$ : $f_{Y} (y) = \frac{1}{2 \sqrt{y}}, 0 < y < 1$ .

Find $E (X^{2})$ using the distribution of $Y$ and the definition of expected value. Remember: if we did not have the distribution of $Y$ , we would first have to derive it as in Example 18.7.
Describe how to use simulation to approximate $E (Y)$ , in a way that is analogous to the method in the previous part.
Find $E (X^{2})$ using LOTUS.
Describe how to use simulation to approximate $E (X^{2})$ , in a way that is analogous to the method in the previous part.
Is $E (X^{2})$ equal to $(E (X))^{2}$ ?

Example 21.7

We want to find $E (X^{2})$ if $X$ has an Exponential(1) distribution. Donny Dont says: “I can just use LOTUS and replace $x$ with $x^{2}$ , so $E (X^{2})$ is $\int_{- \infty}^{\infty} x^{2} e^{- x^{2}} d x$ ”. Do you agree?

We could also round to the nearest integer. Whether we truncate or round won’t matter as we consider what happens in the limit.↩︎