Outcome (First roll, second roll) | X (sum) | Y (max) |
---|---|---|
(1, 1) | 2 | 1 |
(1, 2) | 3 | 2 |
(1, 3) | 4 | 3 |
(1, 4) | 5 | 4 |
(2, 1) | 3 | 2 |
(2, 2) | 4 | 2 |
(2, 3) | 5 | 3 |
(2, 4) | 6 | 4 |
(3, 1) | 4 | 3 |
(3, 2) | 5 | 3 |
(3, 3) | 6 | 3 |
(3, 4) | 7 | 4 |
(4, 1) | 5 | 4 |
(4, 2) | 6 | 4 |
(4, 3) | 7 | 4 |
(4, 4) | 8 | 4 |
5 Discrete Random Variables
- Roughly, a random variable assigns a number measuring some quantity of interest to each outcome of a random phenomenon.
- Mathematically, a random variable (RV)
is a function that takes an outcome in the sample space as input and returns a real number as output - The random variable itself is typically denoted with a capital letter (
); possible values of that random variable are denoted with lower case letters ( ).- Think of the capital letter
as a label standing in for a formula like “the number of heads in 4 flips of a coin” and as a dummy variable standing in for a particular value like 3.
- Think of the capital letter
- Discrete random variables take at most countably many possible values (e.g.,
). They are often counting variables (e.g., the number of Heads in 10 coin flips). - Continuous random variables can take any real value in some interval (e.g.,
, , .). That is, continuous random variables can take uncountably many different values. Continuous random variables are often measurement variables (e.g., height, weight, income). - A function of a random variable is also a random variable: if
is a RV then so is - Sums and products, etc., of random variables defined on the same sample space are random variables. If
and are RVs defined on the same sample space then so are , , - If the sample space outcomes are represented by rows in a spreadsheet, then random variables correspond to columns.
- Expressions like
or represent events: for which outcomes is the value of the random variable equal to the value . (Remember, if the sample space outcomes are represented by rows in a spreadsheet, then an event corresponds to a subset of rows (outcomes) that satisfies some criteria.) - The (probability) distribution of a collection of random variables identifies the possible values that the random variables can take and their relative likelihoods.
- We will see many ways of describing a distribution, depending on how many random variables are involved and their types (discrete or continuous).
Example 5.1 Roll a four-sided die twice, and record the result of each roll in sequence. Let
- Identify the event
and compute its probability. Then interpret the probability both as a long relative frequency and as a relative likelihood.
- Construct a table and plot of
for each possible value of .
- Construct a table and plot of
for each possible value of .
5.1 Simulating from a Distribution
Example 5.2 Continuing Example 5.1.
- Describe how you could simulate a single value of
.
- Construct a spinner (like from a kids game) to represent the distribution of
.
- Describe another way to simulate a single value of
.
- Describe how you could use simulation to approximate the distribution of
.
- Don’t confuse a random variable with its distribution!
- A random variable measures a numerical quantity which depends on the outcome of a random phenomenon
- The distribution of a random variable specifies the long run pattern of variation of values of the random variable over many repetitions of the underlying random phenomenon.
- Any marginal distribution can be represented by a single spinner (like from a kids game).
- In principle, there are always two ways of simulating a value
of a random variable .- Simulate from the probability space. Simulate an outcome
from the underlying probability space and set . - Simulate from the distribution. Construct a spinner corresponding to the distribution of
and spin it once to generate .
- Simulate from the probability space. Simulate an outcome
- The second method requires that the distribution of
is known. However, as we will see in many examples, it is common to specify the distribution of a random variable directly without defining the underlying probability space.
5.2 Probability Mass Functions
- The probability distribution of a single discrete random variable
is often displayed in a table containing the probability of the event for each possible value . - In some cases, a distribution has a “formulaic” shape. For a discrete random variable
, the probability mass function (pmf) expresses as a function of : . - Be sure to specify the possible values of the random variable!
- Certain common distributions have special names and properties.
Example 5.3 Continuing Example 5.1.
- Verify that the pmf of
is
- Verify that the pmf of
is
5.3 Matching Problem
The “matching problem” involves
Spot 1 | Spot 2 | Spot 3 | Spot 4 | X (number of matches) |
---|---|---|---|---|
1 | 2 | 3 | 4 | |
1 | 2 | 4 | 3 | |
1 | 3 | 2 | 4 | |
1 | 3 | 4 | 2 | |
1 | 4 | 2 | 3 | |
1 | 4 | 3 | 2 | |
2 | 1 | 3 | 4 | |
2 | 1 | 4 | 3 | |
2 | 3 | 1 | 4 | |
2 | 3 | 4 | 1 | |
2 | 4 | 1 | 3 | |
2 | 4 | 3 | 1 | |
3 | 1 | 2 | 4 | |
3 | 1 | 4 | 2 | |
3 | 2 | 1 | 4 | |
3 | 2 | 4 | 1 | |
3 | 4 | 1 | 2 | |
3 | 4 | 2 | 1 | |
4 | 1 | 2 | 3 | |
4 | 1 | 3 | 2 | |
4 | 2 | 1 | 3 | |
4 | 2 | 3 | 1 | |
4 | 3 | 1 | 2 | |
4 | 3 | 2 | 1 |
Example 5.4 Consider the matching problem with
- Complete Table 5.4 to identify the value of
for each outcome.
- Identify the event
and compute its probability. Then interpret the probability both as a long relative frequency and as a relative likelihood.
- Construct a table, plot, and spinner representing the distribution of
.
# One repetition of the number of matches, for a given n
= function(n) {
simulate_number_matches # sample(1:n) puts the values 1:n in random order
# sample(1:n) == 1:n checks if each object in the shuffled order matches
# returns a logical/binary 1=TRUE, 0 = FALSE vector
# count the number of matches by summing the 1/0's
sum(sample(1:n) == 1:n)
}
# Many repetitions, for n = 4
= 10000
N_rep = replicate(N_rep, simulate_number_matches(4))
number_matches
# Summarize the simulated values
plot(table(number_matches) / N_rep,
type = "h",
xlab = "Number of matches",
ylab = "Simulated relative frequency")
Example 5.5 Now consider the matching problem with general
- Use this pmf to approximate the probability of at least one match, and compare to the simulation results for general
.
- Interpret the probability both as a long relative frequency and as a relative likelihood.
- Construct a table, plot, and spinner corresponding to the above pmf. Compare to the simulation results for general
.
= 0:7
x
= exp(-1) / factorial(x)
p_x
data.frame(x, p_x) |>
kbl(digits = 6) |>
kable_styling(fixed_thead = TRUE)
plot(x, p_x,
type = "h",
xlab = "Number of matches (x)",
ylab = "Approximate probability p(x)")
= 10
n
# Method 1: Simulate from the probability space
= replicate(N_rep, simulate_number_matches(n))
number_matches
# Summarize the simulate values
plot(table(number_matches) / N_rep,
type = "h",
xlab = "Number of matches",
ylab = "Simulated relative frequency")
# Method 2: Simulate from the (approximate) distribution
= 0:n
x_values
= sample(x_values,
x size = N_rep,
prob = exp(-1) / factorial(x_values),
replace = TRUE)
# Summarize the simulate values
plot(table(number_matches) / N_rep,
type = "h",
xlab = "Number of matches",
ylab = "Simulated relative frequency")