15 Discrete Random Variables: Probability Mass Functions

The (probability) distribution of a random variable specifies the possible values of the random variable and a way of determining corresponding probabilities.
A discrete random variable can take on only countably many isolated points on a number line. These are often counting type variables. Note that “countably many” includes the case of countably infinite, such as $\{0, 1, 2, \ldots\}$.
We often specify the distribution of a discrete with a probability mass function.
Certain common distributions have special names and properties.

15.1 Do not confuse a random variable with its distribution

A random variable measures a numerical quantity which depends on the outcome of a random phenomenon
The distribution of a random variable specifies the long run pattern of variation of values of the random variable over many repetitions of the underlying random phenomenon.

Example 15.1

Donny Dont is thoroughly confused about the distinction between a random variable and its distribution. Help him understand by by providing a simple concrete example of two different random variables $X$ and $Y$ that have the same distribution. Can you think of $X$ and $Y$ for which $\text{P}(X = Y) = 0$? How about a discrete example and a continuous example?

Example 15.2 Suppose that $X$, $Y$, and $Z$ all have the same distribution. Donny Dont says

The pair $(X, Y)$ has the same joint distribution as the pair $(X, Z)$.
$X+Y$ has the same distribution as $X+Z$.
$X+Y$ has the same distribution as $X+X=2X$.

Determine if each of Donny’s statements is correct. If not, explain why not using a simple example.

15.2 Probability mass functions

The probability mass function (pmf) (a.k.a., density (pdf)) of a discrete RV $X$, defined on a probability space with probability measure $\text{P}$, is a function $p_X:\mathbb{R}\mapsto[0,1]$ which specifies each possible value of the RV and the probability that the RV takes that particular value: $p_X(x)=\text{P}(X=x)$ for each possible value of $x$.

Example 15.3

Let $Y$ be the larger of two rolls of a fair four-sided die. Find the probability mass function of $Y$.

Example 15.4 Randomly select a county in the U.S. Let $X$ be the leading digit in the county’s population. For example, if the county’s population is 10,040,000 (Los Angeles County) then $X=1$; if 3,170,000 (Orange County) then $X=3$; if 283,000 (SLO County) then $X=2$; if 30,600 (Lassen County) then $X=3$. The possible values of $X$ are $1, 2, \ldots, 9$. You might think that $X$ is equally likely to be any of its possible values. However, a more appropriate model is to assume that $X$ has pmf

\[ p_X(x) = \begin{cases} \log_{10}(1+\frac{1}{x}), & x = 1, 2, \ldots, 9,\\ 0, & \text{otherwise} \end{cases} \]

This distribution is known as Benford’s law.

Construct a table specifying the distribution of $X$, and the corresponding spinner.
Find $\text{P}(X \ge 3)$

15.3 Poisson distributions

Example 15.5 Let $X$ be the number of home runs hit (in total by both teams) in a randomly selected Major League Baseball game. Technically, there is no fixed upper bound on what $X$ can be, so mathematically it is convenient to consider $0, 1, 2, \ldots$ as the possible values of $X$. Assume that the pmf of $X$ is

\[ p_X(x) = \begin{cases} e^{-2.3} \frac{2.3^x}{x!}, & x = 0, 1, 2, \ldots\\ 0, & \text{otherwise.} \end{cases} \]

This is known as the Poisson(2.3) distribution.

Verify that $p_X$ is a valid pmf.
Compute $\text{P}(X = 3)$, and interpret the value as a long run relative frequency.
Construct a table and spinner corresponding to the distribution of $X$.
Find $\text{P}(X \le 13)$, and interpret the value as a long run relative frequency. (The most home runs ever hit in a baseball game is 13.)
Find and interpret the ratio of $\text{P}(X = 5)$ to $\text{P}(X = 3)$. Does the value $e^{-2.3}$ affect this ratio?
Use simulation to find the long run average value of $X$, and interpret this value.
Use simulation to find the variance and standard deviation of $X$.

A discrete random variable $X$ has a Poisson distribution with parameter $\mu>0$ if its probability mass function $p_X$ satisfies

\[ p_X(x) = \frac{e^{-\mu}\mu^x}{x!}, \quad x=0,1,2,\ldots \]

The function $\mu^x / x!$ defines the shape of the pmf. The constant $e^{-\mu}$ ensures that the probabilities sum to 1.
If $X$ has a Poisson($\mu$) distribution then \[\begin{align*} \text{Long run average value of $X$} & = \mu\\ \text{Variance of $X$} & = \mu\\ \text{SD of $X$} & = \sqrt{\mu} \end{align*}\]

15.4 Binomial distributions

Example 15.6 Capture-recapture sampling is a technique often used to estimate the size of a population. Suppose you want to estimate $N$, the number of monarch butterflies in Pismo Beach. (Assume that $N$ is a fixed but unknown number; the population size doesn’t change over time.) You first capture a sample of $N_1$ butterflies, selected randomly, and tag them and release them. At a later date, you then capture a second sample of $n$ butterflies, selected randomly with replacement. Let $X$ be the number of butterflies in the second sample that have tags (because they were also caught in the first sample). (Assume that the tagging has no effect on behavior, so that selection in the first sample is independent of selection in the second sample.)

In practice, $N$ is unknown and the point of capture-recapture sampling is to estimate $N$. But let’s start with a simpler, but unrealistic, example where there are $N=52$ butterflies, $N_1 = 13$ are tagged and $N_0=52-13 = 39$ are not, and $n=5$ is the size of the second sample.

Explain why it is reasonable to assume that the results of the five individual selections are independent.
Compute $\text{P}(X=0)$.
Compute the probability that the first butterfly selected is tagged but the others are not.
Compute the probability that the last butterfly selected is tagged but the others are not.
Compute $\text{P}(X=1)$.
Compute $\text{P}(X=2)$.
Find the pmf of $X$.
Construct a table, plot, and spinner representing the distribution of $X$.
Make an educated guess for the long run average value of $X$.
How do the results depend on $N_1$ and $N_0$?

A discrete random variable $X$ has a Binomial distribution with parameters $n$, a nonnegative integer, and $p\in[0, 1]$ if its probability mass function is \[\begin{align*} p_{X}(x) & = \binom{n}{x} p^x (1-p)^{n-x}, & x=0, 1, 2, \ldots, n \end{align*}\]
If $X$ has a Binomial($n$, $p$) distribution then \[\begin{align*} \text{Long run average value of $X$} & = np\\ \text{Variance of $X$} & = np(1-p)\\ \text{SD of $X$} & = \sqrt{np(1-p)} \end{align*}\]
Imagine a box containing tickets with $p$ representing the proportion of tickets in the box labeled 1 (“success”); the rest are labeled 0 (“failure”). Randomly select $n$ tickets from the box with replacement and let $X$ be the number of tickets in the sample that are labeled 1. Then $X$ has a Binomial($n$, $p$) distribution. Since the tickets are labeled 1 and 0, the random variable $X$ which counts the number of successes is equal to the sum of the 1/0 values on the tickets. If the selections are made with replacement, the draws are independent, so it is enough to just specify the population proportion $p$ without knowing the population size $N$.
The situation in the previous paragraph and the butterfly example involves a sequence of Bernoulli trials.
- There are only two possible outcomes, “success” (1) and “failure” (0), on each trial.
- The unconditional/marginal probability of success is the same on every trial, and equal to $p$
- The trials are independent.
If $X$ counts the number of successes in a fixed number, $n$, of Bernoulli($p$) trials then $X$ has a Binomial($n, p$) distribution.