12 Joint Distributions
Example 12.1
Roll a fair four-sided die twice. Let
- Compute and interpret
.
- Construct a “flat” table displaying the distribution of
pairs, with one pair in each row.
- Construct a two-way displaying the joint distribution on
and .
- Sketch a plot depicting the joint distribution of
and .
- Starting with the two-way table, how could you obtain
?
- Starting with the two-way table, how could you obtain the marginal distribution of
? of ?
- Starting with the marginal distribution of
and the marginal distribution of , could you necessarily construct the two-way table of the joint distribution? Explain.
- The joint distribution of random variables
and is a probability distribution on pairs, and describes how the values of and vary together or jointly. - Marginal distributions can be obtained from a joint distribution by “stacking”/“collapsing”/“aggregating” out the other variable.
- In general, marginal distributions alone are not enough to determine a joint distribution. (The exception is when random variables are independent.)
1 | 2 | 3 | 4 | |
2 | 1/16 | 0 | 0 | 0 |
3 | 0 | 2/16 | 0 | 0 |
4 | 0 | 1/16 | 2/16 | 0 |
5 | 0 | 0 | 2/16 | 2/16 |
6 | 0 | 0 | 1/16 | 2/16 |
7 | 0 | 0 | 0 | 2/16 |
8 | 0 | 0 | 0 | 1/16 |
Example 12.2
Continuing the dice rolling example, construct a spinner representing the joint distribution of
= 16000
N_rep
# first roll
= sample(1:4, size = N_rep, replace = TRUE)
u1
# second roll
= sample(1:4, size = N_rep, replace = TRUE)
u2
# sum
= u1 + u2
x
# max
= pmax(u1, u2)
y
= data.frame(u1, u2, x, y) dice_sim
Repetition | First roll | Second roll | X (sum) | Y (max) |
---|---|---|---|---|
1 | 1 | 2 | 3 | 2 |
2 | 2 | 4 | 6 | 4 |
3 | 1 | 3 | 4 | 3 |
4 | 4 | 2 | 6 | 4 |
5 | 4 | 3 | 7 | 4 |
6 | 2 | 1 | 3 | 2 |
# Joint distribution: counts
table(x, y)
y
x 1 2 3 4
2 1018 0 0 0
3 0 2025 0 0
4 0 990 1937 0
5 0 0 2040 2005
6 0 0 942 2056
7 0 0 0 1944
8 0 0 0 1043
# Joint distribution: proportions
table(x, y) / N_rep
y
x 1 2 3 4
2 0.0636250 0.0000000 0.0000000 0.0000000
3 0.0000000 0.1265625 0.0000000 0.0000000
4 0.0000000 0.0618750 0.1210625 0.0000000
5 0.0000000 0.0000000 0.1275000 0.1253125
6 0.0000000 0.0000000 0.0588750 0.1285000
7 0.0000000 0.0000000 0.0000000 0.1215000
8 0.0000000 0.0000000 0.0000000 0.0651875
sum((x == 5) * (y == 3)) / N_rep
[1] 0.1275
library(tidyverse)
library(viridis)
ggplot(dice_sim |>
# changing to factor ("categorical" helps with plotting)
mutate(x = factor(x), y = factor(y)),
aes(x = x, y = y)) +
# fill color is relative frequency
stat_bin_2d(aes(fill = after_stat(count) / sum(after_stat(count)))) +
# color scale
scale_fill_viridis(limits = c(0, 2 / 16 + 0.01)) +
# labels
labs(x = "X (sum)",
y = "Y (max)",
fill = "Relative frequency")