# Chapter 4 Random variables

Chapter 3 developed a general framework for modeling random outcomes and events. This framework can be applied to any set of random outcomes, no matter how complex.

However, many of the random outcomes we are interested in are quantitative, that is, they can be described by a number. Quantitative outcomes are also called “random variables.” In addition to the basic tools of probability developed in Chapter 3, we have some extremely useful specialized tools for random variables. This chapter will develop these tools.

Chapter goals

In this chapter we will learn how to:

• Calculate and interpret the CDF and PDF of a random variable, or several random variables.
• Calculate and interpret the expected value of a discrete random variable from its PDF.
• Calculate and interpret the variance and standard deviation of a discrete random variable from its PDF.
• Work with common probability distributions including the Bernoulli, binomial, uniform and normal.

## 4.1 Introduction to random variables

A random variable is a number whose value depends on a random outcome. The idea here is that we are going to use a random variable to describe some (but not necessarily every) aspect of the outcome.

Random variables in roulette

Here are a few random variables we could define in a roulette game:

• The original outcome $$b$$.
• An indicator for whether a bet on red wins: $r = I(b \in Red)=\begin{cases}1 & b \in Red\\ 0 & b \notin Red \\ \end{cases}$
• The net payout from a $1 bet on red: $w_{red} = w_{red}(b) = \begin{cases} 1 & \textrm{ if } b \in Red \\ -1 & \textrm{ if } b \in Red^c \end{cases}$ That is, a player who bets$1 on red wins $1 if the ball lands on red and loses$1 if the ball lands anywhere else.
• The net payout from a $1 bet on 14: $w_{14} = w_{14}(b) = \begin{cases} 35 & \textrm{ if } b = 14 \\ -1 & \textrm{ if } b \neq 14 \end{cases}$ That is, a player who bets$1 on 14 wins $35 if the ball lands on 14 and loses$1 if the ball lands anywhere else.

All of these random variables are defined in terms of the underlying outcome, but we can also define random variables in terms of other random variables. For example, we could have defined $$w_{red}$$ as $$w_{red} = 2r-1$$.

A random variable is always a function of the original outcome, but for convenience, we usually leave its dependence on the original outcome implicit, and write it as if it were an ordinary variable.

### 4.1.1 Probability distributions

A random variable has its own sample space (normally $$\mathbb{R}$$) and probability distribution. This probability distribution can be derived from the probability distribution of the underlying outcome.

Probability distributions for roulette

• The probability distribution for $$b$$ is: $\Pr(b = 0) = 1/37 \approx 0.027$ $\Pr(b = 1) = 1/37 \approx 0.027$ $\vdots$ $\Pr(b = 36) = 1/37 \approx 0.027$ All other values of $$b$$ have probability zero.
• The probability distribution for $$w_{red}$$ is: $\Pr(w_{red} = 1) = \Pr(b \in Red) = 18/37 \approx 0.486$ $\Pr(w_{red} = -1) = \Pr(b \notin Red) = 19/37 \approx 0.514$ All other values of $$w_{red}$$ have probability zero.
• The probability distribution for $$w_{14}$$ is: $\Pr(w_{14} = 35) = \Pr(b = 14) = 1/37 \approx 0.027$ $\Pr(w_{14} = -1) = \Pr(b \neq 14) = 36/37 \approx 0.973$ All other values of $$w_{14}$$ have probability zero.

Notice that these random variables are related to each other since they all depend on the same underlying outcome. Section 6.1 will explain how we can describe and analyze those relationships.

#### 4.1.1.1 The support

The support of a random variable $$x$$ is the smallest3 set $$S_x \subset \mathbb{R}$$ such that $$\Pr(x \in S_x) = 1$$.

In plain language the support is the set of all values in the sample space that have some chance of actually happening.

The support in roulette

The support is just the set of values with non-zero probability:

• The support of $$b$$ is $$S_{b} = \{0,1,2,\ldots,36\}$$.
• The support of $$w_{Red}$$ is $$S_{Red} = \{-1,1\}$$.
• The support of $$w_{14}$$ is $$S_{14} = \{-1,35\}$$.

The random variables we have considered so far have discrete support. That is, the support is a set of isolated points each of which has a strictly positive probability. But not all random variables have a discrete support. That will complicate the math quite a bit, as we will need to use calculus.

### 4.1.2 The PDF and CDF

The PDF and CDF are both functions that allow us to describe the probability distribution of a random variable.

#### 4.1.2.1 The PDF of a discrete random variable

We can describe the probability distribution of a random variable with a function called its probability density function (PDF).

The PDF of a discrete random variable is defined as: $f_x(a) = \Pr(x = a)$ where $$a$$ is any number. By convention, we typically use a lower-case $$f$$ to represent a PDF, and we use the subscript when needed to clarify which specific random variable we are talking about.

The PDF in roulette

Our three random variables are all discrete, and each has its own PDF:

$f_b(a) = \Pr(b = a) = \begin{cases} 1/37 & a \in \{0,1,\ldots,36\} \\ 0 & a \notin \{0,1,\ldots,36\} \\ \end{cases}$ $f_{red}(a) = \Pr(w_{red} = a) = \begin{cases} 19/37 & a = -1 \\ 18/37 & a = 1 \\ 0 & a \notin \{-1,1\} \\ \end{cases}$ $f_{14}(a) = \Pr(w_{14} = a) = \begin{cases} 36/37 & a = -1 \\ 1/37 & a = 35 \\ 0 & a \notin \{-1,35\} \\ \end{cases}$ Figure 4.1 below shows these three PDFs. Figure 4.1: PDFs for the roulette example

We can calculate any probability from the PDF by simple addition. That is: $\Pr(x \in A) = \sum_{s \in S_x} f_x(s)I(s \in A)$ where4 $$A \subset \mathbb{R}$$ is any event defined for $$x$$.

Some event probabilities in roulette

Since the outcome in roulette is discrete, we can calculate any event probability by adding up the probabilities of the event’s outcomes.

The probability of the event $$b \leq 3$$ can be calculated: \begin{align} \Pr(b \leq 3) &= \sum_{s=0}^{36}f_x(s)I(s \leq 3) \\ &= f_b(0) + f_b(1) + f_b(2) + f_b(3) \\ &= 4/37 \end{align}

The probability of the event $$b \in Even$$ can be calculated: \begin{align} \Pr(b \in Even) &= \sum_{s=0}^{36}f_x(s)I(s \in Even) \\ &= f_b(2) + f_b(4) + \cdots + f_b(36) \\ &= 18/37 \end{align}

The PDF of a discrete random variable has several general properties:

1. It is always between zero and one: $0 \leq f_x(a) \leq 1$ since it is a probability.
2. It sums up to one over the support: $\sum_{a \in S_x} f_x(a) = \Pr(x \in S_x) = 1$ since the support has probability one by definition.
3. It is strictly positive for all values in the support: $a \in S_x \implies f_x(a) > 0$ since the support is the smallest set that has probability one.

We can prove these, but I will skip that.

#### 4.1.2.2 The CDF

Another way to describe the probability distribution of a random variable is with a function called its cumulative distribution function (CDF). The CDF is a little less intuitive than the PDF, but it has the advantage that it always has the same definition whether or not the random variable is discrete.

The CDF of the random variable $$x$$ is the function $$F_x:\mathbb{R} \rightarrow [0,1]$$ defined by: $F_x(a) = Pr(x \leq a)$ where $$a$$ is any number. By convention, we typically use an upper-case $$F$$ to indicate a CDF, and we use the subscript to indicate what random variable we are talking about.

The CDF has several properties:

1. It always lies between zero and one: $0 \leq F_x(a) \leq 1$ since it is a probability.
2. It starts at zero and ends at one: $F_x(-\infty) = \Pr(x \leq -\infty) = 0$ $F_x(\infty) = \Pr(x \leq \infty) = 1$
3. It is non-decreasing. That is, for any $$a_1 \leq a_2$$ $F_x(a_1) \leq F_x(a_2)$ This is because the event $$x \leq a_2$$ implies the event $$x \leq a_1$$, so it must be at least as probable.
4. For any $$a_1 < a_2$$, $\Pr(a_1 < x \leq a_2) = F_x(a_2) - F_x(a_1)$

As I said earlier, the CDF is well-defined and has these properties whether $$x$$ is discrete or continuous.

If a random variable is discrete, we can construct its CDF by just adding up the PDF: \begin{align} F_x(a) &= \Pr(x \leq a) \\ &= \sum_{s \in S_x} f_x(s)I(s \leq a) \end{align} This formula leads to a “stair-step” appearance: the CDF is flat for all values outside of the support, and then jumps up at all values in the support.

CDFs for roulette

• The CDF of $$b$$ is: $F_b(a) = \begin{cases} 0 & a < 0 \\ 1/37 & 0 \leq a < 1 \\ 2/37 & 1 \leq a < 2 \\ \vdots & \vdots \\ 36/37 & 35 \leq a < 36 \\ 1 & a \geq 36 \\ \end{cases}$
• The CDF of $$w_{red}$$ is: $F_{red}(a) = \begin{cases} 0 & a < -1 \\ 19/37 & -1 \leq a < 1 \\ 1 & a \geq 1 \\ \end{cases}$
• The CDF of $$w_{14}$$ is: $F_{14}(a) = \begin{cases} 0 & a < -1 \\ 36/37 & -1 \leq a < 35 \\ 1 & a \geq 35 \\ \end{cases}$
Figure 4.2 below graphs these CDFs. Figure 4.2: CDFs for the roulette example

Notice that they show all of the general properties described above. In addition, they all have a distinctive “stair-step” shape, jumping up at each point in $$S_x$$ and flat between those points, This is a general property of CDFs for discrete random variables.

We can also go the other way, and construct the PDF of a discrete random variable from its CDF. Each little jump in the CDF is a point in the support, and the size of the jump is exactly equal to the PDF.

In more formal mathematics, the formula for deriving the PDF of a discrete random variable from its CDF would be written:

$f_x(a) = \lim_{\epsilon \rightarrow 0} F_x(a) - F_x(a-|\epsilon|)$ but we can just think of it as the size of the jump.

#### 4.1.2.3 Continuous random variables

So far we have considered random variables with a discrete support. However, many random variables of interest have a continuous support: they can take on any real value within some range.

For example, Canada produced 31.251 million metric tons of wheat in 2019. If we think of that number as a random variable, it’s clear that this number could have been 31.252 million metric tons if circumstances were different. It also could have been any number between those numbers, for example 31.2511 million or 31.2517 million.

A continuous random variable has the property that the the probability of any specific value is zero: $\Pr(x=a) = 0$ Now this creates something of a paradox: by the rules of probability the probability that $$x$$ takes on some value is $$\Pr(x \in \mathbb{R}) = 1$$ but the probability that $$x$$ takes on any specific value is zero. How can this work?

I’ll explain how it works with an example.

The standard uniform distribution

Consider a random variable $$x$$ that has the standard uniform distribution. What that means is that:

1. The support of $$x$$ is the range $$[0,1]$$.
2. All values in this range are equally likely.
The CDF of the standard uniform distribution is: $F_x(a) = \Pr(x \leq a) = \begin{cases} 0 & a < 0 \\ a & a \in [0,1] \\1 & a > 1 \\ \end{cases}$ Figure 4.3 below shows the CDF of the standard uniform distribution. Figure 4.3: CDF for the standard uniform distribution

As we have seen, the CDF of a discrete random variable rises in a “stair-step” manner. In contrast, the standard uniform CDF rises smoothly with no jumps. All continuous random variables have a CDF with this property.

Let $$a_1$$ and $$a_2$$ be any two numbers between 0 and 1, and let $$a_1 < a_2$$. Then the probability of $$x$$ being between $$a_1$$ and $$a_2$$ is: $\Pr(a_1 < x \leq a_2) = F_x(a_2) - F_x(a_1) = a_2 - a_1$ As $$a_2$$ gets closer and closer to $$a_1$$ this number gets closer and closer to zero, so the probability of $$x$$ being exactly $$a_1$$ is zero.

The PDF of a continuous random variable is defined as just the derivative of its CDF: $f_x(a) = \frac{dF_x(a)}{da}$

The PDF of the standard uniform distribution

The PDF of a standard uniform random variable is: $f_x(a) = \begin{cases} 0 & a < 0 \\ 1 & a \in [0,1] \\ 0 & a > 1 \\ \end{cases}$ which looks like this: Figure 4.4: PDF for the standard uniform distribution

Now, in order to work with continuous random variables we would need to use integral calculus. Integral calculus is taught in MATH 158, which is not a prerequisite for the course, So:

• Most of my examples will be for discrete case.
• I will briefly show you the math for the continuous case, but I will not expect you to do it.
• Most of the results I give you will apply for both cases.

Integral calculus for continuous random variables

I have defined the PDF for a continuous random variable based on its CDF, but we can also go the other way and calculate the CDF from the PDF. The formula for that calculation is: $F_x(a) = \int_{-\infty}^a f_x(v)dv$ More generally the probability of $$x$$ being between any two numbers is:

$\Pr(a \leq x \leq b) = F_x(b) - F_x(a) = \int_a^b f_x(v)dv$

Unless you have taken MATH 152 or MATH 158, you may have no idea what this is or how to solve it. That’s OK! All you need to know for this course is that it can be solved.

## 4.2 The properties of a random variable

The probability distribution of a random variable is fully described by its PDF or CDF. However, we will often be interested in describing the random variable with a few simple summary numbers.

For example, we might be interested in the most common value (also called the mode), or we might be interested in a “typical” value, or we might be interested in a simple measure of how much the random variable tends to vary. All of these quantities can be defined and calculated from on the PDF or CDF.

### 4.2.1 The mode

Roughly speaking, the mode of a random variable is its most likely value (i.e., the value with the highest PDF).

The mode in roulette

### 4.2.5 Variance and standard deviation

The mode, median and expected value all aim to describe a typical or central value of the random variable. We are also interested in measures of how much the random variable varies. We have already seen one - the range - but there are others, including the variance and standard deviation.

The variance of a random variable $$x$$ is defined as: $\sigma_x^2 = var(x) = E((x-E(x))^2)$ The standard deviation of a random variable is defined as the (positive) square root of its variance. $\sigma_x = sd(x) = \sqrt{var(x)}$ Both variance and standard deviation can be thought of as measures of how much $$x$$ tends to deviate from its central tendency $$E(x)$$.

Variance and standard deviation in roulette

The variance of $$r$$ is: $var(r) = (0-\underbrace{E(r)}_{18/37})^2 *\frac{19}{37} + (1-\underbrace{E(r)}_{18/37})^2 * \frac{18}{37} \approx 0.25$ and its standard deviation is: $sd(r) = \sqrt{var(r)} \approx 0.5$

The variance of $$w_{red}$$ is: $var(w_{red}) = (-1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{19}{37} + (1-\underbrace{E(w_{red})}_{\approx 0.027})^2 * \frac{18}{37} \approx 1.0$ and its standard deviation is $sd(w_{red}) = \sqrt{var(w_{red})} \approx 1.0$

The variance of $$w_{14}$$ is $var(w_{14}) = (-1-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{36}{37} + (35-\underbrace{E(w_{14})}_{\approx 0.027})^2 * \frac{1}{37} \approx 34.1$ and its standard deviation is $sd(w_{14}) = \sqrt{var(w_{14})} \approx 5.8$ That is, a bet on 14 has the same expected payout as a bet on red, but its payout is much more variable.

The variance is the expected value (sum) of a square, which implies several standard properties:

• It is always non-negative: $var(x) \geq 0$ $sd(x) \geq 0$
• For any constants $$a$$ and $$b$$: $var(a +bx) = b^2 var(x)$ $sd(a +bx) = b \, sd(x)$
• The variance can be written as: $var(x) = E(x^2) - E(x)^2$

We can easily derive these properties but we will skip that now.

## 4.3 Standard distributions

Some probability distributions appear so often in applications that we have given them names. We will go through a few of the most important ones below.

### 4.3.1 Discrete uniform

The discrete uniform distribution is a distribution with that puts equal probability on every value in a discrete set $$S_x$$. Its PDF is: $f_x(a) = \begin{cases} 1/|S_x| & a \in S_x \\ 0 & a \notin S_x \\ \end{cases}$ Discrete uniform distributions appear in gambling and similar applications.

The discrete uniform distribution in roulette

In our roulette example, the outcome $$b$$ has a discrete uniform distribution on $$\Omega = \{0,1,\ldots,36\}$$.

### 4.3.2 Bernoulli

The Bernoulli probability distribution is usually written: $x \sim Bernoulli(p)$ It has discrete support $$S_x = \{0,1\}$$ and PDF: $f_x(a) = \begin{cases} (1-p) & a = 0 \\ p & a = 1 \\ 0 & a = \textrm{anything else}\\ \end{cases}$ We typically use Bernoulli random variables to model the probability of some event $$A$$. If we define $$x$$ as the indicator variable $$x=I(A)$$, then $$x \sim Bernoulli(p)$$ where $$p=\Pr(A)$$.

The mean and variance of a $$Bernoulli(p)$$ random variable are: $E(x) = (1-p)*0 + p*1 = p$ $var(x) = E[(x-E(x))^2] = E[(x-p)^2] = (1-p)(0-p)^2 + p(1-p)^2 = p(1-p)$

The Bernoulli distribution in roulette

The variable $$r = I(Red)$$ has the $$Bernoulli(18/37)$$ distribution.

### 4.3.3 Binomial

The binomial probability distribution is usually written: $x \sim Binomial(n,p)$ It has discrete support $$S_x = \{0,1,2,\ldots,n\}$$ and its PDF is: $f_x(a) = \begin{cases} \frac{n!}{a!(n-a)!} p^a(1-p)^{n-a} & a \in S_x \\ 0 & \textrm{anything else} \\ \end{cases}$ The binomial distribution is typically used to model frequencies or counts.

Let $$(b_1,b_2,\ldots,b_n)$$ be a sequence of $$n$$ independent random variables from the $$Bernoulli(p)$$ distribution and let: $x = \sum_{i=1}^n b_i$ count up the number of times that $$b_i$$ is equal to one (i.e., the event modeled by $$b_i$$ happened). Then it is possible to derive the distribution for $$y$$, and it turns out to be $$Binomial(n,p)$$.

I won’t derive the formula for the binomial PDF, but the intuition is simple: $$\frac{n!}{a!(n-a)!}$$ is the number of outcomes in which $$x=a$$ and the $$p^a(1-p)^{n-a}$$ is the probability of each of those outcomes.

The mean and variance of a binomial random variable are: $E(x) = np$ $var(x) = np(1-p)$

The binomial distribution in roulette

Suppose we play 50 games of roulette, and bet on red in every game. Let $$WIN50$$ be the number of times we win.

Since the outcome of a single bet on red is $$r \sim Bernoulli(18/37)$$, this means that $$WIN50 \sim Binomial(50,18/37)$$.

### 4.3.4 Uniform and standard uniform

The uniform probability distribution is usually written $x \sim U(L,H)$ where $$L < H$$. It is a continuous probability distribution with support $$S_x = [L,H]$$ and PDF: $f_x(a) = \begin{cases}\frac{1}{H-L} & a \in S_x \\ 0 & \textrm{otherwise} \\ \end{cases}$ The uniform distribution puts equal probability on all values between $$L$$ and $$H$$. We have already seen the standard uniform distribution, which is just the $$U(0,1)$$ distribution.

Uniform distributions are commonly used by computers because:

• It is easy for a computer to generate a random number from the standard uniform distribution.
• You can generate a random variable with any probability distribution you like by following these steps:
1. Generate a random variable $$q \sim U(0,1)$$.
2. Calculate $$x = F^{-1}(q)$$ where $$F^{-1}$$ is the inverse CDF of the distribution you want.

Every video game you have ever played is constantly generating $$U(0,1)$$ random numbers and using them to determine the behavior of non-player characters, the location of resources, etc. Without that element of randomess, these games would be way too predictable to be much fun.

The mean and variance of the $$U(L,H)$$ distribution are: $E(x) = \frac{L+H}{2}$ $var(x) = \frac{(H-L)^2}{12}$ As with all continuous random variables, these calculations would require integration which is beyond the scope of this course.

### 4.3.5 Normal and standard normal

The normal distribution is typically written as: $x \sim N(\mu,\sigma^2)$ It is a continuous distribution with support $$S_x = \mathbb{R}$$ and PDF: $f_x(a) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(a-\mu)^2}{2\sigma}}$ The normal distribution is also called the Gaussian distribution.

The normal distribution looks strange, but it turns out to be a very important one in statistics for two reasons:

1. Any linear function of a normally distributed random variable is also normally distributed. That is, suppose that $$x \sim N(\mu,\sigma^2)$$ and let $$y = a + bx$$ where $$a$$ and $$b$$ are any constants. Then $$y \sim N(a+b\mu,b^2\sigma^2)$$.
2. A very important result called the Central Limit Theorem tells us that many random variables have a distribution that is well-approximated by the normal distribution. We will discuss this in much more detail later.

The mean and variance of a $$N(\mu,\sigma^2)$$ random variable are: $E(x) = \mu$ $var(x) = \sigma^2$

The $$N(0,1)$$ distribution is also called the standard normal distribution The standard normal distribution is so useful that we have special symbol for its PDF: $\phi(a) = \frac{1}{\sqrt{2\pi}} e^{-\frac{a^2}{2}}$ and its CDF: $\Phi(a) = \int_{-\infty}^a \phi(b)db$ The standard normal CDF $$\Phi(.)$$ does not have a closed form solution, but is easy to calculate on a computer and is available as a built-in function in Excel, R or any other program used to analyze data.

Why is this useful? Well remember that linear functions of normal random variables are also normal. This will allow us to calculate the CDF of any $$N(\mu,\sigma^2)$$ random variable using the standard normal CDF.

Consider a random variable $$x \sim N(\mu,\sigma^2)$$. Define another random variable $$z = \frac{x-\mu}{\sigma}$$. Then: $z \sim N\left(\mu*\frac{1}{\sigma}- \frac{\mu}{\sigma},\sigma^2*\left(\frac{1}{\sigma}\right)^2\right)$ or equivalently $$z \sim N(0,1)$$.

This implies: \begin{align} F_x(a) &= \Pr\left(x \leq a\right) \\ &= \Pr\left( \frac{x-\mu}{\sigma} \leq \frac{a-\mu}{\sigma}\right)\\ &= \Pr\left( z \leq \frac{a-\mu}{\sigma}\right) \\ &= \Phi\left(\frac{a-\mu}{\sigma}\right) \end{align} Since the standard normal CDF is available as a built-in function in Excel or R, so we can use this result to calculate the CDF for any normally distributed random variable.

1. Technically, it is the smallest closed set, but let’s ignore that for now.↩︎

2. If you are unfamiliar with the notation here, please refer to Section A.3.4 in the Math Review Appendix.↩︎