18 Expected Value

The distribution of a random variable specifies the possible values and the probability of any event that involves the random variable.
Characteristics of distributions based on long run averages can be defined as “expected” values.

Example 18.1 Recall the matching problem with $n = 4$ : objects labeled 1, 2, 3, 4, are placed at random in spots labeled 1, 2, 3, 4, with spot 1 the correct spot for object 1, etc. Suppose the objects are equally likely to be placed in any spot, so that the $4! = 24$ possible placements are equally likely. Let the random variable $X$ count the number of objects that are placed in the correct spot. The distribution of $X$ is displayed below. For example, there are 6 placements which result in $X = 2$ matches (1243, 1432, 1324, 4231, 3214, 2134).

$x$	$p_{X} (x)$
0	9/24
1	8/24
2	6/24
4	1/24

The table below displays 10 simulated values of $X$ . How could you use the results of this simulation to approximate the long run average value of $X$ ? How could you get a better approximation of the long run average?

Repetition	Y
1	0
2	1
3	0
4	0
5	2
6	0
7	1
8	1
9	4
10	2

Rather than adding the 10 values and dividing by 10, how could you simplify the calculation in the previous part?
The table below summarizes 24000 simulated values of $X$ . Approximate the long run average value of $X$ .

Value of X	Number of repetitions
0	8979
1	7993
2	6068
4	960

Recall the distribution of $X$ . What would be the corresponding mathematical formula for the theoretical long run average value of $X$ ? This number is called the “expected value” of $X$ .
Is the expected value the most likely value of $X$ ?
Is the expected value of $X$ the “value that we would expect” on a single repetition of the phenomenon?
Explain in what sense the expected value is “expected”.

The expected value (a.k.a. expectation a.k.a. mean), of a random variable $X$ defined on a probability space with measure $P$ , is a number denoted $E (X)$ representing the probability-weighted average value¹ of $X$ . $\begin{aligned} Discrete X with pmf p_{X} : & E (X) & = \sum_{x} x p_{X} (x) \\ Continuous X with pdf f_{X} : & E (X) & = \int_{- \infty}^{\infty} x f_{X} (x) d x \end{aligned}$
Note well that $E (X)$ represents a single number.
The expected value is the “balance point” (center of gravity) of a distribution.
The expected value of a random variable $X$ is defined by the probability-weighted average according to the underlying probability measure. But the expected value can also be interpreted as the long-run average value, and so can be approximated via simulation.
Read the symbol $E (\cdot)$ as
- Simulate lots of values of what’s inside $(\cdot)$
- Compute the average. This is a “usual” average; just sum all the simulated values and divide by the number of simulated values.

Example 18.2 Model the waiting time, measured continuously in hours, from now until the next earthquake (of any magnitude) occurs in southern CA as a continuous random variable $X$ with an Exponential distribution with rate parameter 2.

Compute and interpret $E (X)$ .
Compute $P (X = E (X))$ .
Compute $P (X \leq E (X))$ .
Find the median value (50th percentile) of $X$ . Is the median less than, greater than, or equal to the mean? Why does this make sense?
What does the variable $Y = 60 X$ represent? What is $E (Y)$ ? What is the distribution of $Y$ ?

Example 18.3 Recall Example 14.3 in which we assume that $X$ , the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game, has a Poisson(2.3) distribution with pmf $p_{X} (x) = e^{- 2.3} \frac{{2.3}^{x}}{x!}, x = 0, 1, 2, \dots$

Recall from Example 14.3 that $P (X \leq 13) = 0.9999998$ . Evaluate the pmf for $x = 0, 1, \dots, 13$ and use arithmetic to compute $E (X)$ . (This will technically only give an approximation, since there is non-zero probability that $X > 13$ , but the calculation will give you a concrete example before jumping to the next part.)
Use the pmf and infinite series to compute $E (X)$ .
Interpret $E (X)$ in context.

18.1 “Law of the unconscious statistician” (LOTUS)

Example 18.4 Flip a coin 3 times and let $X$ be the number of flips that result in H, and let $Y = (X - 1.5)^{2}$ . (We will see later why we might be interested in such a transformation.)

Find the distribution of $Y$ .
Compute $E (Y)$ .
How could we have computed $E (Y)$ without first finding the distribution of $Y$ ?
Is $E ((X - 1.5)^{2})$ equal to $(E (X) - 1.5)^{2}$ ?

A function of a random variable is a random variable: if $X$ is a random variable and $g$ is a function then $Y = g (X)$ is a random variable.
Since $g (X)$ is a random variable it has a distribution. In general, the distribution of $g (X)$ will have a different shape than the distribution of $X$ .
The “law of the unconscious statistician” (LOTUS) says that the expected value of a transformed random variable can be found without finding the distribution of the transformed random variable, simply by applying the probability weights of the original random variable² to the transformed values. $\begin{aligned} Discrete X with pmf p_{X} : & E [g (X)] & = \sum_{x} g (x) p_{X} (x) \\ Continuous X with pdf f_{X} : & E [g (X)] & = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x \end{aligned}$
LOTUS says we don’t have to first find the distribution of $Y = g (X)$ to find $E [g (X)]$ ; rather, we just simply apply the transformation $g$ to each possible value $x$ of $X$ and then apply the corresponding weight for $x$ to $g (x)$ .
Whether in the short run or the long run, in general $\begin{aligned} Average of g (X) & \neq g (Average of X) \end{aligned}$
In terms of expected values, in general $\begin{aligned} E (g (X)) & \neq g (E (X)) \end{aligned}$ The left side $E (g (X))$ is what we typically want and represents first transforming the $X$ values and then averaging the transformed values. (The right side $g (E (X))$ represents first averaging the $X$ values and then plugging the average (a single number) into the transformation formula, but this doesn’t yield a meaningful value.)
The exception is linear rescalings: If $X$ is a random variable and $a, b$ are non-random constants then $E (a X + b) = a E (X) + b$

Example 18.5 Consider a simple electrical circuit with just a single 1 ohm resistor. Suppose a random voltage $V$ is applied. We are interested in the power $X = V^{2}$ .

Assume that $V$ has a Uniform(0, 20) distribution. Use LOTUS to compute $E (X)$ .
Use simulation to approximate the distribution of $X$ ; is it Uniform?
It can be shown that $X$ has pdf $f_{X} (x) = (1 / 40) x^{- 1 / 2}, 0 < x < 400$ Compute $E (X)$ .
Now suppose we want to find $E (V^{2})$ if $V$ has an Exponential distribution with mean 10. Donny Dont says: “I can just use LOTUS and replace $x$ with $x^{2}$ , so $E (X^{2})$ is $\int_{- \infty}^{\infty} 10 x^{2} e^{- 10 x^{2}} d x$ ”. Do you agree? Explain.

Remember, do NOT confuse a random variable with its distribution.
- The random variable is the numerical quantity being measured
- The distribution is the long run pattern of variation of many observed values of the random variable

18.2 Linearity of expected value

Example 18.6 Let $X$ and $Y$ denote the resistances (ohms) of two randomly selected resistors, with, respectively, Uniform(135, 165) and Uniform(162, 198) marginal distributions. Suppose the resistors are connected in series so that the system resistance is $R = X + Y$ . Make an educated guess for $E (R)$ . Then run a simulation to see if it suggests that your guess is correct.

Linearity of expected value. For any two random variables $X$ and $Y$ , $\begin{aligned} E (X + Y) & = E (X) + E (Y) \end{aligned}$
That is, the expected value of the sum is the sum of expected values, regardless of how the random variables are related.
Therefore, you only need to know the marginal distributions of $X$ and $Y$ to find the expected value of their sum. (But keep in mind that the distribution of $X + Y$ will depend on the joint distribution of $X$ and $Y$ .)
Whether in the short run or the long run, $\begin{aligned} Average of X + Y & = Average of X + Average of Y \end{aligned}$ regardless of the joint distribution of $X$ and $Y$ .
A linear combination of two random variables $X$ and $Y$ is of the form $a X + b Y$ where $a$ and $b$ are non-random constants. Combining properties of linear rescaling with linearity of expected value yields the expected value of a linear combination. $E (a X + b Y) = a E (X) + b E (Y)$
Linearity of expected value extends naturally to more than two random variables.

Example 18.7 Recall the matching problem in Example 18.1. When $n = 4$ we derived the distribution of $X$ and used it to show that $E (X) = 1$ . Now consider a general $n$ : there are $n$ objects that are shuffled and placed uniformly at random in $n$ spots with one object per spot. Let $X$ be the number of matches. Now we’ll see how to find $E (X)$ without first finding the distribution of $X$ . The key is to use indicator (a.k.a., Bernoulli) random variables.

Let $I_{1}$ be the indicator that object 1 is placed correctly in spot 1; that is $I_{1} = 1$ is object 1 is placed in spot 1, and $I_{1} = 0$ otherwise. Find $E (I_{1})$ .
When $n = 4$ , find $E (I_{j})$ for $j = 1, 2, 3, 4$ .
What is the relationship between the random variables $X$ and $I_{1}, I_{2}, I_{3}, I_{4}$ ?
Use the previous parts to find $E (X)$ .
Now consider a general $n$ . Let $I_{j}$ be the indicator that object $j$ is placed correctly in spot $j$ , $j = 1, \dots, n$ . Find $E (I_{j})$ .
Find $E (X)$ . Be amazed.
Interpret $E (X)$ is context.

Random variables that only take two possible values, 0 and 1, are called indicator (or Bernoulli) random variables.
Indicators provide the bridge between events (sets) and random variables (functions). A realization of any event is either true or false; the event either happens or it doesn’t. An indicator random variable just translates “true” or “false” into numbers, 1 for “true” and 0 for “false”.
Indicators also provide a bridge between expected value and probability. If $I_{A}$ is the indicator of event $A$ , then $E (I_{A}) = P (A)$
Representing a count as a sum of indicator random variables is a very common and useful strategy, especially in problems that involve “find the expected number of…”
Let $A_{1}, A_{2}, \dots, A_{n}$ be a collection of $n$ events. Suppose event $i$ occurs with marginal probability $p_{i} = P (A_{i})$ . Let $N = I_{A_{i}} + I_{A_{2}} + \dots + I_{A_{n}}$ be the random variable which counts the number of the events in the collection which occur. Then the expected number of events that occur is the sum of the event probabilities. $E (N) = \sum_{i = 1}^{n} p_{i} .$ If each event has the same probability, $p_{i} \equiv p$ , then $E (N)$ is equal to $n p$ . These formulas for the expected number of events are true regardless of whether there is any association between the events (that is, regardless of whether the events are independent.)

Remember, the pmf or pdf is 0 outside the range of possible values, so when working problems the generic bounds of $(- \infty, \infty)$ should be replaced by the possible values of $X$ .↩︎
Remember, the pmf or pdf is 0 outside the range of possible values, so when working problems the generic bounds of $(- \infty, \infty)$ should be replaced by the possible values of $X$ .↩︎