4.3 Simulating from distributions: rolling dice yet again
Consider yet again the sum \(X\) and max \(Y\) of two rolls of a fair four-sided die. (We promise we’ll move on to more interesting examples soon!) Section 2.5 discussed the joint and marginal probability distributions of \(X\) and \(Y\). In Section 3.4 we simulated values of \(X\) and \(Y\) by first simulating dice rolls. Now we’ll see another way.
Example 4.1 Construct a spinner representing each of the following.
- The marginal distribution of \(X\).
- The marginal distribution of \(Y\).
- The joint distribution of \(X\) and \(Y\).
Solution. to Example 4.1.
Show/hide solution
- The spinner in Figure 4.2 represents the marginal distribution of \(X\); see Table 2.11.
- The spinner in Figure 4.3 represents the marginal distribution of \(Y\); see Table 2.12.
- The spinner in Figure 4.4 represents the joint distribution of \(X\) and \(Y\). For example, the spinner returns the pair \((4, 2)\) with probability 1/16 and the pair \((4, 3)\) with probability 2/16. See Table 2.14 or Table 2.13. Remember, a joint distribution of two random variables is a distribution of pairs on values.
In principle, there are always two ways of simulating a value \(x\) of a random variable \(X\).
- Simulate from the probability space. Simulate an outcome \(\omega\) from the underlying probability space and set \(x = X(\omega)\).
- Simulate from the distribution. Construct a spinner corresponding to the distribution of \(X\) and spin it once to generate \(x\).
The second method requires that the distribution of \(X\) is known. However, as we will see in many examples, it is common to specify the distribution of a random variable directly without defining the underlying probability space. Recall the brown bag analogy in Section 2.3.1 and the discussion at the end of Section 2.5.
Example 4.2 We have seen the Symbulate code for the “simulate from the probability space” method in the dice rolling example. Now we consider the “simulate from the distribution method”.
- Write the Symbulate code to define \(X\) by specifying its marginal distribution.
- Write the Symbulate code to define \(Y\) by specifying its marginal distribution.
Solution. to Example 4.2.
Show/hide solution
We simulate a value of \(X\) from its distribution by spinning the spinner in Figure 4.2.
Similarly, we simulate a value of \(Y\) from its distribution by spinning the spinner in Figure 4.3.
We can define a BoxModel corresponding to each of these spinners, and then define a random variable through the identify function. Essentially, we define the random variable by specifying its distribution, rather specifying the underlying probability space.
Note that the default size
argument in BoxModel
is size = 1
, so we have omitted it.
= RV(BoxModel([2, 3, 4, 5, 6, 7, 8], probs = [1 / 16, 2 / 16, 3 / 16, 4 / 16, 3 / 16, 2 / 16, 1 / 16]))
X
= RV(BoxModel([1, 2, 3, 4], probs = [1 / 16, 3 / 16, 5 / 16, 7 / 16])) Y
We can specify the distribution of a discrete random variable via a box model with one ticket for each possible value of the random variable, along with its corresponding probability.
The joint distribution in Table 2.14 corresponds to the spinner in Figure 4.4. We now have two ways to simulate an \((X, Y)\) pair that has the distribution in Table 2.14.
- Simulate two rolls of a fair four sided die. Let \(X\) be the sum of the two values and let \(Y\) be the larger of the two rolls (or the common value if a tie).
- Spin the spinner in Figure 4.4 once and record the resulting \((X, Y)\) pair. (Recall that this spinner returns a pair of values.) Of course, this method requires that the joint distribution of \((X, Y)\) is known.
Below is the Symbulate code for simulating directly from the joint distribution in Table 2.14. Note that the tickets in BoxModel
correspond to the possible \((X, Y)\) pairs, which are not equally likely (even though the 16 pairs of rolls are). We specify the probability of each ticket by using the probs
option. To generate a single \((X, Y)\) pair — like spinning the spinner in Figure 4.4 once — we draw one ticket from the box of pairs; this is why75 size = 1
.
= [(2, 1), (3, 2), (4, 2), (4, 3), (5, 3), (5, 4), (6, 3), (6, 4), (7, 4), (8, 4)]
xy_pairs = [1/16, 2/16, 1/16, 2/16, 2/16, 2/16, 1/16, 2/16, 2/16, 1/16]
pxy
= BoxModel(xy_pairs, probs = pxy, size = 1)
P
5) P.sim(
Index | Result |
---|---|
0 | (7, 4) |
1 | (3, 2) |
2 | (3, 2) |
3 | (7, 4) |
4 | (5, 3) |
We can now define random variables \(X\) and \(Y\). An outcome of P
is a pair of values. Recall that a Symbulate RV
is always defined in terms of a probability space and a function RV(probspace, function)
. The default function is the identity: \(g(\omega) = \omega\). Therefore, RV(P)
would just correspond to the pair of values generated by P
. The sum \(X\) corresponds to the first coordinate in the pair and the max \(Y\) corresponds to the second. We can define these random variables in Symbulate by “unpacking” the pair as in the following76
= RV(P) X, Y
Then we can simulate many \((X, Y)\) pairs and summarize as we did in Section 3.4.7.
& X & Y).sim(5) (RV(P)
Index | Result |
---|---|
0 | ((5, 3), 5, 3) |
1 | ((6, 4), 6, 4) |
2 | ((4, 3), 4, 3) |
3 | ((4, 3), 4, 3) |
4 | ((6, 3), 6, 3) |
Example 4.3 (Don’t do what Donny Don’t does.) Donny says “Forget the joint distribution spinner in Figure 4.4. I can simulate an \((X, Y)\) pair just by spinning the spinner in Figure 4.2 to generate \(X\) and the one in Figure 4.3 to generate \(Y\).” Is Donny correct? If not, can you help him see why not?
Solution. to Example 4.3
Show/hide solution
Donny is not correct. Yes, spinning the \(X\) spinner in Figure 4.2 will generate values of \(X\) according to the proper marginal distribution, and similarly with Figure 4.3 and \(Y\). However, spinning each of the spinners will not produce \((X, Y)\) pairs with the correct joint distribution. For example, Donny’s method could produce \(X=2\) and \(Y=4\), which is not a possible \((X, Y)\) pair. Donny’s method treats the values of \(X\) and \(Y\) as if they were independent; the result of the \(X\) spin would not change what could happen with the \(Y\) spin (since the spins are physically independent). However, the \(X\) and \(Y\) values are related. For example, if \(X=2\) then \(Y\) must be 1; if \(X=4\) then \(Y\) must be 2 or 3; etc. The joint distribution spinner in Figure 4.4 correctly reflects the relationship between \(X\) and \(Y\). But as we discussed in Section 2.5, in general you cannot recover the joint distribution from the marginal distributions, which is what Donny is attempting to do.
Donny’s method corresponds to (1) rolling the die twice and summing to get \(X\), then (2) rolling the die two more times and finding the larger roll to get \(Y\). Essentially, Donny is not using the same probability space for \(X\) and \(Y\), and therefore events involving both random variables cannot be studied. In Symbulate, Donny’s code — which would produce an error — would look like
X = RV(BoxModel([1, 2, 3, 4], size = 2), sum)
Y = RV(BoxModel([1, 2, 3, 4], size = 2), max)
(X & Y).sim(10000)
### Error: Events must be defined on same probability space.
In Donny’s code, his random variables are defined on different probability spaces; one box model is used to generate the rolls for \(X\) and a separate box model is used to generate the rolls for \(Y\). As we have mentioned a few times, random variables (and events) must all be defined on the same probability space77.
Example 4.4 (Don’t do what Donny Don’t does.) Donny says “I see what you mean about needing the spinner in Figure 4.4 to simulate \((X, Y)\) pairs. So then forget the spinners in Figure 4.2 and Figure 4.3. If I want to simulate \(X\) values, I could just spin the joint distribution spinner in Figure 4.4 and ignore the \(Y\) values.” Is Donny’s method correct? If not, can you help him see why not?
Solution. to Example 4.4
Show/hide solution
Donny is correct! The joint distribution spinner in Figure 4.4 correctly produces \((X, Y)\) pairs according to the joint distribution in Table 2.14. Ignoring the \(Y\) values is like “summing the rows” and only worrying about what happens in total for \(X\). For example, in the long run, 1/16 of spins will generate (4, 2) and 2/16 of spins will generate (4, 3), so ignoring the \(y\) values, 3/16 of spins will return an \(x\) value of 4. From the joint distribution you can always find the marginal distributions (e.g., by finding row and column totals).
Now, Donny’s method does work, but it does require more work than necessary. If we really only needed to simulate \(X\) values, we only need the distribution of \(X\) and not the joint distribution of \(X\) and \(Y\), so you could use the \(X\) spinner in Figure 4.2. But if we wanted to study anything about the relationship between \(X\) and \(Y\) then we would need the joint distribution spinner.
4.3.1 Summary
- There are two ways to simulate a value of a random variable.
- Simulate an outcome from the underlying probability space and evaluate the random variable for the simulated outcome.
- Identify the distribution of the random variable, and simulate a value from that distribution (e.g., by constructing a spinner).
- We can specify the distribution of a discrete random variable via a box model with one ticket for each possible value of the random variable, along with its corresponding probability.
- The joint distribution of two random variables can be represented by a spinner that returns pairs of values.
- To simulate an \((X, Y)\) pair it is, in general78, not sufficient to simulate a value of \(X\) from its marginal distribution and a value of \(Y\) from its marginal distribution. Instead, a pair \((X, Y)\) must be simulated from the joint distribution.
By default
size = 1
, so we could have omitted it, but we include it for emphasis. Drawing 1 ticket from this box model returns an \((X, Y)\) pair.↩︎Since
P
returns pairs of outcomes,Z = RV(P)
is a random vector. Components of a vector can be indexed with brackets[]
; e.g., the first component isZ[0]
and the second isZ[1]
. (Remember: Python uses zero-based indexing.) So the “unpacked” code is an equivalent but simpler version ofZ = RV(P); X = Z[0]; Y = Z[1]
.↩︎If Donny really wanted to simulate two independent pairs of rolls, one to compute \(X\) and one to compute \(Y\), he would still need define the random variables on the same probability space, using
BoxModel([1, 2, 3, 4], size = 2) ** 2
for which an example outcome would be ((3, 2), (1, 1)). Then he could defineX=RV(P)[0].apply(sum)
andY=RV(P)[1].apply(max)
. But it’s hard to justify why Donny would want to do this.↩︎When \(X\) and \(Y\) are independent it is sufficient to simulate values of \(X\) and \(Y\) separately from their respective marginal distributions. We will study independence in detail later.↩︎