12 Joint Distributions
Example 12.1 Roll a fair four-sided die twice. Let
- Compute and interpret
.
- Construct a “flat” table displaying the distribution of
pairs, with one pair in each row.
- Construct a two-way displaying the joint distribution on
and .
- Sketch a plot depicting the joint distribution of
and .
- Starting with the two-way table, how could you obtain
?
- Starting with the two-way table, how could you obtain the marginal distribution of
? of ?
- Starting with the marginal distribution of
and the marginal distribution of , could you necessarily construct the two-way table of the joint distribution? Explain.
- The joint distribution of random variables
and is a probability distribution on pairs, and describes how the values of and vary together or jointly. - Marginal distributions can be obtained from a joint distribution by “stacking”/“collapsing”/“aggregating” out the other variable.
- In general, marginal distributions alone are not enough to determine a joint distribution. (The exception is when random variables are independent.)
1 | 2 | 3 | 4 | |
2 | 1/16 | 0 | 0 | 0 |
3 | 0 | 2/16 | 0 | 0 |
4 | 0 | 1/16 | 2/16 | 0 |
5 | 0 | 0 | 2/16 | 2/16 |
6 | 0 | 0 | 1/16 | 2/16 |
7 | 0 | 0 | 0 | 2/16 |
8 | 0 | 0 | 0 | 1/16 |
Example 12.2 Continuing the dice rolling example, construct a spinner representing the joint distribution of
= 16000
N_rep
# first roll
= sample(1:4, size = N_rep, replace = TRUE)
u1
# second roll
= sample(1:4, size = N_rep, replace = TRUE)
u2
# sum
= u1 + u2
x
# max
= pmax(u1, u2)
y
= data.frame(u1, u2, x, y) dice_sim
Repetition | First roll | Second roll | X (sum) | Y (max) |
---|---|---|---|---|
1 | 2 | 1 | 3 | 2 |
2 | 1 | 2 | 3 | 2 |
3 | 1 | 3 | 4 | 3 |
4 | 4 | 2 | 6 | 4 |
5 | 2 | 3 | 5 | 3 |
6 | 2 | 3 | 5 | 3 |
# Joint distribution: counts
table(x, y)
y
x 1 2 3 4
2 980 0 0 0
3 0 1960 0 0
4 0 1039 2070 0
5 0 0 2038 1948
6 0 0 1017 1935
7 0 0 0 2052
8 0 0 0 961
# Joint distribution: proportions
table(x, y) / N_rep
y
x 1 2 3 4
2 0.0612500 0.0000000 0.0000000 0.0000000
3 0.0000000 0.1225000 0.0000000 0.0000000
4 0.0000000 0.0649375 0.1293750 0.0000000
5 0.0000000 0.0000000 0.1273750 0.1217500
6 0.0000000 0.0000000 0.0635625 0.1209375
7 0.0000000 0.0000000 0.0000000 0.1282500
8 0.0000000 0.0000000 0.0000000 0.0600625
sum((x == 5) * (y == 3)) / N_rep
[1] 0.127375
library(tidyverse)
library(viridis)
ggplot(dice_sim |>
# changing to factor ("categorical" helps with plotting)
mutate(x = factor(x), y = factor(y)),
aes(x = x, y = y)) +
# fill color is relative frequency
stat_bin_2d(aes(fill = after_stat(count) / sum(after_stat(count)))) +
# color scale
scale_fill_viridis(limits = c(0, 2 / 16 + 0.01)) +
# labels
labs(x = "X (sum)",
y = "Y (max)",
fill = "Relative frequency")