5  Discrete Random Variables

Table 5.1: Table representing the sum (\(X\)) and larger (\(Y\)) of two rolls of a four-sided die
Outcome (First roll, second roll) X (sum) Y (max)
(1, 1) 2 1
(1, 2) 3 2
(1, 3) 4 3
(1, 4) 5 4
(2, 1) 3 2
(2, 2) 4 2
(2, 3) 5 3
(2, 4) 6 4
(3, 1) 4 3
(3, 2) 5 3
(3, 3) 6 3
(3, 4) 7 4
(4, 1) 5 4
(4, 2) 6 4
(4, 3) 7 4
(4, 4) 8 4

Example 5.1 Roll a four-sided die twice, and record the result of each roll in sequence. Let \(X\) be the sum of the two dice, and let \(Y\) be the larger of the two rolls (or the common value if both rolls are the same). Table 5.1 represents the possible outcomes and the values of the random variables \(X\) and \(Y\). Assume the die is fair and the rolls are independent.

  1. Identify the event \(\{X = 4\}\) and compute its probability. Then interpret the probability both as a long relative frequency and as a relative likelihood.




  2. Construct a table and plot of \(\text{P}(X = x)\) for each possible value \(x\) of \(X\).




  3. Construct a table and plot of \(\text{P}(Y = y)\) for each possible value \(y\) of \(Y\).




Table 5.2: The marginal distribution of \(X\), the sum of two rolls of a fair four-sided die.
x P(X=x)
2 0.0625
3 0.1250
4 0.1875
5 0.2500
6 0.1875
7 0.1250
8 0.0625
Table 5.3: The marginal distribution of \(Y\), the larger of two rolls of a fair four-sided die.
y P(Y=y)
1 0.0625
2 0.1875
3 0.3125
4 0.4375

Figure 5.1: The marginal distribution of \(X\), the sum of two rolls of a fair four-sided die.

Figure 5.2: The marginal distribution of \(Y\), the larger of two rolls of a fair four-sided die.

5.1 Simulating from a Distribution

Example 5.2 Continuing Example 5.1.

  1. Describe how you could simulate a single value of \(Y\).




  2. Construct a spinner (like from a kids game) to represent the distribution of \(Y\).




  3. Describe another way to simulate a single value of \(Y\).




  4. Describe how you could use simulation to approximate the distribution of \(Y\).




Figure 5.3: Spinner representing the marginal distribution of \(X\), the sum of two rolls of a fair four-sided die.

Figure 5.4: Spinner representing the marginal distribution of \(Y\), the larger of two rolls of a fair four-sided die.
  • Don’t confuse a random variable with its distribution!
    • A random variable measures a numerical quantity which depends on the outcome of a random phenomenon
    • The distribution of a random variable specifies the long run pattern of variation of values of the random variable over many repetitions of the underlying random phenomenon.
  • Any marginal distribution can be represented by a single spinner (like from a kids game).
  • In principle, there are always two ways of simulating a value \(x\) of a random variable \(X\).
    1. Simulate from the probability space. Simulate an outcome \(\omega\) from the underlying probability space and set \(x = X(\omega)\).
    2. Simulate from the distribution. Construct a spinner corresponding to the distribution of \(X\) and spin it once to generate \(x\).
  • The second method requires that the distribution of \(X\) is known. However, as we will see in many examples, it is common to specify the distribution of a random variable directly without defining the underlying probability space.

5.2 Probability Mass Functions

  • The probability distribution of a single discrete random variable \(X\) is often displayed in a table containing the probability of the event \(\{X=x\}\) for each possible value \(x\).
  • In some cases, a distribution has a “formulaic” shape. For a discrete random variable \(X\), the probability mass function (pmf) \(p_X\) expresses \(\text{P}(X=x)\) as a function of \(x\): \(p_X(x) = \text{P}(X = x)\).
  • Be sure to specify the possible values of the random variable!
  • Certain common distributions have special names and properties.

Example 5.3 Continuing Example 5.1.

  1. Verify that the pmf of \(X\) is \[ p_X(x) = \frac{4 - |x - 5|}{16}, \qquad x = 2, 3, \ldots, 8 \]




  2. Verify that the pmf of \(Y\) is \[ p_Y(y) = \frac{2y - 1}{16}, \qquad y = 1, 2, 3, 4 \]




5.3 Matching Problem

The “matching problem” involves \(n\) distincts objects labeled \(1, \ldots, n\) which are placed in \(n\) distinct boxes labeled \(1, \ldots, n\), with exactly one object placed in each box. Suppose the objects are placed in the boxes uniformly at random, so that any possible arrangement is equally likely. Let \(X\) be the number of matches, that is, the number of objects for which their label matches the label of the box in which they are placed.

Table 5.4: Table representing the sample space in the matching problem with \(n=4\).
Spot 1 Spot 2 Spot 3 Spot 4 X (number of matches)
1 2 3 4
1 2 4 3
1 3 2 4
1 3 4 2
1 4 2 3
1 4 3 2
2 1 3 4
2 1 4 3
2 3 1 4
2 3 4 1
2 4 1 3
2 4 3 1
3 1 2 4
3 1 4 2
3 2 1 4
3 2 4 1
3 4 1 2
3 4 2 1
4 1 2 3
4 1 3 2
4 2 1 3
4 2 3 1
4 3 1 2
4 3 2 1

Example 5.4 Consider the matching problem with \(n=4\). There are \(4!=24\) possible outcomes, displayed in Table 5.4.

  1. Complete Table 5.4 to identify the value of \(X\) for each outcome.




  2. Identify the event \(\{X = 0\}\) and compute its probability. Then interpret the probability both as a long relative frequency and as a relative likelihood.




  3. Construct a table, plot, and spinner representing the distribution of \(X\).




# One repetition of the number of matches, for a given n
simulate_number_matches = function(n) {
  # sample(1:n) puts the values 1:n in random order
  # sample(1:n) == 1:n checks if each object in the shuffled order matches
  #   returns a logical/binary 1=TRUE, 0 = FALSE vector
  # count the number of matches by summing the 1/0's
  
  sum(sample(1:n) == 1:n)
}

# Many repetitions, for n = 4
N_rep = 10000
number_matches = replicate(N_rep, simulate_number_matches(4))

# Summarize the simulated values
plot(table(number_matches) / N_rep,
     type = "h",
     xlab = "Number of matches",
     ylab = "Simulated relative frequency")

Figure 5.5: Simulated distribution of \(X\) (number of matches) in the matching problem with \(n=4\).
Table 5.5: Theoretical distribution of \(X\) (number of matches) in the matching problem with \(n=4\).
x P(X=x)
0 0.3750
1 0.3333
2 0.2500
4 0.0417

Figure 5.6: Spinner representing the theoretical distribution of \(X\), the number of matches in the matching problem when \(n=4\).

Example 5.5 Now consider the matching problem with general \(n\). Finding the distribution of \(X\) by enumerating the \(n!\) possible outcomes as we did for \(n=4\) is not feasible for general \(n\). But we can approximate the distribution of \(X\) with simulation. Simulation shows that the distribution of the number of matches \(X\) is approximately the same for all \(n\) (unless \(n\) is very small, i.e., \(n\le 5\)). In particular, for any \(n\), the probability mass function of \(X\) is approximately \[ p_X(x) = \frac{e^{-1}}{x!}, \qquad x = 0, 1, 2, \ldots \]

  1. Use this pmf to approximate the probability of at least one match, and compare to the simulation results for general \(n\).




  2. Interpret the probability both as a long relative frequency and as a relative likelihood.




  3. Construct a table, plot, and spinner corresponding to the above pmf. Compare to the simulation results for general \(n\).




x = 0:7

p_x = exp(-1) / factorial(x)

data.frame(x, p_x) |>
  kbl(digits = 6) |>
  kable_styling(fixed_thead = TRUE)
Table 5.6: Approximate distribution of number of matches in the matching problem for general \(n\)
x p_x
0 0.367879
1 0.367879
2 0.183940
3 0.061313
4 0.015328
5 0.003066
6 0.000511
7 0.000073
plot(x, p_x,
     type = "h",
     xlab = "Number of matches (x)",
     ylab = "Approximate probability p(x)")

Figure 5.7: Approximate distribution of number of matches in the matching problem for general \(n\)

Figure 5.8: Spinner representing the approximate distribution of number of matches in the matching problem for general \(n\)
n = 10

# Method 1: Simulate from the probability space
number_matches = replicate(N_rep, simulate_number_matches(n))

# Summarize the simulate values
plot(table(number_matches) / N_rep,
     type = "h",
     xlab = "Number of matches",
     ylab = "Simulated relative frequency")

Figure 5.9: Simulated distribution of \(X\) (number of matches) in the matching problem with \(n = 10\): By simulating outcomes and measuring \(X\).
# Method 2: Simulate from the (approximate) distribution
x_values = 0:n

x = sample(x_values,
           size = N_rep,
           prob = exp(-1) / factorial(x_values),
           replace = TRUE)

# Summarize the simulate values
plot(table(number_matches) / N_rep,
     type = "h",
     xlab = "Number of matches",
     ylab = "Simulated relative frequency")

Figure 5.10: Simulated distribution of \(X\) (number of matches) in the matching problem with \(n = 10\): By simulating directly from the approximate distribution.