15 Discrete Random Variables: Probability Mass Functions
- The (probability) distribution of a random variable specifies the possible values of the random variable and a way of determining corresponding probabilities.
- A discrete random variable can take on only countably many isolated points on a number line. These are often counting type variables. Note that “countably many” includes the case of countably infinite, such as
. - We often specify the distribution of a discrete with a probability mass function.
- Certain common distributions have special names and properties.
15.1 Do not confuse a random variable with its distribution
- A random variable measures a numerical quantity which depends on the outcome of a random phenomenon
- The distribution of a random variable specifies the long run pattern of variation of values of the random variable over many repetitions of the underlying random phenomenon.
Example 15.1
Donny Dont is thoroughly confused about the distinction between a random variable and its distribution. Help him understand by by providing a simple concrete example of two different random variables
Example 15.2 Suppose that
- The pair
has the same joint distribution as the pair . has the same distribution as . has the same distribution as .
Determine if each of Donny’s statements is correct. If not, explain why not using a simple example.
15.2 Probability mass functions
- The probability mass function (pmf) (a.k.a., density (pdf)) of a discrete RV
, defined on a probability space with probability measure , is a function which specifies each possible value of the RV and the probability that the RV takes that particular value: for each possible value of .
Example 15.3
Let
Example 15.4 Randomly select a county in the U.S. Let
This distribution is known as Benford’s law.
- Construct a table specifying the distribution of
, and the corresponding spinner.
- Find
15.3 Poisson distributions
Example 15.5 Let
This is known as the Poisson(2.3) distribution.
- Verify that
is a valid pmf.
- Compute
, and interpret the value as a long run relative frequency.
- Construct a table and spinner corresponding to the distribution of
.
- Find
, and interpret the value as a long run relative frequency. (The most home runs ever hit in a baseball game is 13.)
- Find and interpret the ratio of
to . Does the value affect this ratio?
- Use simulation to find the long run average value of
, and interpret this value.
- Use simulation to find the variance and standard deviation of
.
- A discrete random variable
has a Poisson distribution with parameter if its probability mass function satisfies
- The function
defines the shape of the pmf. The constant ensures that the probabilities sum to 1. - If
has a Poisson( ) distribution then
15.4 Binomial distributions
Example 15.6 Capture-recapture sampling is a technique often used to estimate the size of a population. Suppose you want to estimate
In practice,
- Explain why it is reasonable to assume that the results of the five individual selections are independent.
- Compute
.
- Compute the probability that the first butterfly selected is tagged but the others are not.
- Compute the probability that the last butterfly selected is tagged but the others are not.
- Compute
.
- Compute
.
- Find the pmf of
.
- Construct a table, plot, and spinner representing the distribution of
.
- Make an educated guess for the long run average value of
.
- How do the results depend on
and ?
- A discrete random variable
has a Binomial distribution with parameters , a nonnegative integer, and if its probability mass function is - If
has a Binomial( , ) distribution then - Imagine a box containing tickets with
representing the proportion of tickets in the box labeled 1 (“success”); the rest are labeled 0 (“failure”). Randomly select tickets from the box with replacement and let be the number of tickets in the sample that are labeled 1. Then has a Binomial( , ) distribution. Since the tickets are labeled 1 and 0, the random variable which counts the number of successes is equal to the sum of the 1/0 values on the tickets. If the selections are made with replacement, the draws are independent, so it is enough to just specify the population proportion without knowing the population size . - The situation in the previous paragraph and the butterfly example involves a sequence of Bernoulli trials.
- There are only two possible outcomes, “success” (1) and “failure” (0), on each trial.
- The unconditional/marginal probability of success is the same on every trial, and equal to
- The trials are independent.
- If
counts the number of successes in a fixed number, , of Bernoulli( ) trials then has a Binomial( ) distribution.