library(tidyverse)
library(palettetown)
library(ggridges)
library(patchwork)
library(brms)
library(bayesplot)
library(tidybayes)
19 Metric Predicted Variable with One Nominal Predictor
This chapter considers data structures that consist of a metric predicted variable and a nominal predictor…. This type of data structure can arise from experiments or from observational studies. In experiments, the researcher assigns the categories (at random) to the experimental subjects. In observational studies, both the nominal predictor value and the metric predicted value are generated by processes outside the direct control of the researcher. In either case, the same mathematical description can be applied to the data (although causality is best inferred from experimental intervention).
The traditional treatment of this sort of data structure is called single-factor analysis of variance (ANOVA), or sometimes one-way ANOVA. Our Bayesian approach will be a hierarchical generalization of the traditional ANOVA model. The chapter will also consider the situation in which there is also a metric predictor that accompanies the primary nominal predictor. The metric predictor is sometimes called a covariate, and the traditional treatment of this data structure is called analysis of covariance (ANCOVA). The chapter also considers generalizations of the traditional models, because it is straight forward in Bayesian software to implement heavy-tailed distributions to accommodate outliers, along with hierarchical structure to accommodate heterogeneous variances in the different groups, etc. (Kruschke, 2015, pp. 553–554)
19.1 Describing multiple groups of metric data
Load the primary packages.
Figure 19.1 illustrates the conventional description of grouped metric data. Each group is represented as a position on the horizontal axis. The vertical axis represents the variable to be predicted by group membership. The data are assumed to be normally distributed within groups, with equal standard deviation in all groups. The group means are deflections from overall baseline, such that the deflections sum to zero. Figure 19.1 provides a specific numerical example, with data that were randomly generated from the model. (p. 554)
We’ll want a custom data-generating function for our primary group data.
<- function(seed, mean) {
generate_data set.seed(seed)
rnorm(n, mean = grand_mean + mean, sd = 2)
}
<- 20
n <- 101
grand_mean
<- tibble(group = 1:5,
d deviation = c(4, -5, -2, 6, -3)) |>
mutate(d = map2(.x = group, .y = deviation, .f = generate_data)) |>
unnest(d) |>
mutate(iteration = rep(1:n, times = 5))
glimpse(d)
Rows: 100
Columns: 4
$ group <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ deviation <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
$ d <dbl> 103.74709, 105.36729, 103.32874, 108.19056, 105.65902, 103.3…
$ iteration <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
Here we’ll make a tibble containing the necessary data for the rotated Gaussians. As far as I can tell, Kruschke’s Gaussians only span to the bounds of percentile-based 98% intervals. We partition off those bounds for each group
by the ll
and ul
columns in the first mutate()
function. In the second mutate()
, we expand the dataset to include a sequence of 100 values between those lower- and upper-limit points. In the third mutate()
, we feed those points into the dnorm()
function, with group-specific means and a common sd
.
<- d |>
densities distinct(group, deviation) |>
mutate(ll = qnorm(0.01, mean = grand_mean + deviation, sd = 2),
ul = qnorm(0.99, mean = grand_mean + deviation, sd = 2)) |>
mutate(d = map2(.x = ll, .y = ul,
.f = \(ll, ul) seq(from = ll, to = ul, length.out = 100))) |>
mutate(density = map2(.x = d, .y = grand_mean + deviation,
.f = \(d, m) dnorm(x = d, mean = m, sd = 2))) |>
unnest(c(d, density))
head(densities)
# A tibble: 6 × 6
group deviation ll ul d density
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 100. 110. 100. 0.0133
2 1 4 100. 110. 100. 0.0148
3 1 4 100. 110. 101. 0.0165
4 1 4 100. 110. 101. 0.0183
5 1 4 100. 110. 101. 0.0203
6 1 4 100. 110. 101. 0.0224
We’ll need two more supplementary tibbles to add the flourishes to the plot. The d_arrow
tibble will specify our light-gray arrows. The text
tibble will contain our annotation information.
<- tibble(d = grand_mean,
d_arrow group = 1:5,
deviation = c(4, -5, -2, 6, -3),
offset = 0.1)
head(d_arrow)
# A tibble: 5 × 4
d group deviation offset
<dbl> <int> <dbl> <dbl>
1 101 1 4 0.1
2 101 2 -5 0.1
3 101 3 -2 0.1
4 101 4 6 0.1
5 101 5 -3 0.1
<- tibble(
d_text d = grand_mean,
group = c(0:5, 0),
deviation = c(0, 4, -5, -2, 6, -3, 10),
offset = rep(c(1/4, 0), times = c(6, 1)),
angle = rep(c(90, 0), times = c(6, 1)),
label = c("beta[0]==101", "beta['[1]']==4","beta['[2]']==-5", "beta['[3]']==-2",
"beta['[4]']==6", "beta['[5]']==3", "sigma['all']==2"))
head(d_text)
# A tibble: 6 × 6
d group deviation offset angle label
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 101 0 0 0.25 90 beta[0]==101
2 101 1 4 0.25 90 beta['[1]']==4
3 101 2 -5 0.25 90 beta['[2]']==-5
4 101 3 -2 0.25 90 beta['[3]']==-2
5 101 4 6 0.25 90 beta['[4]']==6
6 101 5 -3 0.25 90 beta['[5]']==3
We’re almost ready to plot. Before we do, let’s talk color and theme. For this chapter, we’ll take our color palette from the palettetown package (Lucas, 2016), which provides an array of color palettes inspired by Pokémon. Our color palette will be #17, which is based on Pidgeotto.
::show_col(pokepal(pokemon = 17)) scales
<- pokepal(pokemon = 17)
pp
pp
[1] "#785848" "#E0B048" "#D03018" "#202020" "#A87858" "#F8E858" "#583828"
[8] "#E86040" "#F0F0A0" "#F8A870" "#C89878" "#F8F8F8" "#A0A0A0"
Our overall plot theme will be based on the default theme_grey()
with a good number of adjustments.
theme_set(
theme_grey() +
theme(text = element_text(color = pp[4]),
axis.text = element_text(color = pp[4]),
axis.ticks = element_line(color = pp[4]),
legend.background = element_blank(),
legend.box.background = element_blank(),
legend.key = element_rect(fill = pp[9]),
panel.background = element_rect(fill = pp[9], color = pp[9]),
panel.grid = element_blank(),
plot.background = element_rect(fill = pp[12], color = pp[12]),
strip.background = element_rect(fill = alpha(pp[2], 1/3), color = "transparent"),
strip.text = element_text(color = pp[4]))
)
Now make Figure 19.1.
|>
d ggplot(aes(x = d, y = group, group = group)) +
geom_vline(xintercept = grand_mean, color = pp[12]) +
geom_jitter(alpha = 4/4, color = pp[10], height = 0.05, shape = 1) +
# The Gaussians
geom_ridgeline(data = densities,
aes(height = -density),
color = pp[7], fill = "transparent",
linewidth = 3/4, min_height = NA, scale = 3/2) +
# The small arrows
geom_segment(data = d_arrow,
aes(xend = d + deviation,
y = group + offset, yend = group + offset),
arrow = arrow(length = unit(0.2, "cm")),
color = pp[5], linewidth = 1) +
# The large arrow on the left
geom_segment(aes(x = 80, xend = grand_mean,
y = 0, yend = 0),
arrow = arrow(length = unit(0.2, "cm")),
color = pp[5], linewidth = 3/4) +
# The text
geom_text(data = d_text,
aes(x = grand_mean + deviation, y = group + offset,
label = label, angle = angle),
color = pp[4], parse = TRUE, size = 4) +
scale_y_continuous(NULL, breaks = 1:5,
labels = c("<1,0,0,0,0>", "<0,1,0,0,0>", "<0,0,1,0,0>", "<0,0,0,1,0>", "<0,0,0,0,1>")) +
xlab(NULL) +
coord_flip(xlim = c(90, 112),
ylim = c(-0.2, 5.5))
The descriptive model presented in Figure 19.1 is the traditional one used by classical ANOVA (which is described a bit more in the next section). More general models are straight forward to implement in Bayesian software. For example, outliers could be accommodated by using heavy-tailed noise distributions (such as a
distribution) instead of a normal distribution, and different groups could be given different standard deviations. (p. 556)
19.2 Traditional analysis of variance
The terminology, “analysis of variance,” comes from a decomposition of overall data variance into within-group variance and between-group variance (Fisher, 1925). Algebraically, the sum of squared deviations of the scores from their overall mean equals the sum of squared deviations of the scores from their respective group means plus the sum of squared deviations of the group means from the overall mean. In other words, the total variance can be partitioned into within-group variance plus between-group variance. Because one definition of the word “analysis” is separation into constituent parts, the term ANOVA accurately describes the underlying algebra in the traditional methods. That algebraic relation is not used in the hierarchical Bayesian approach presented here. The Bayesian method can estimate component variances, however. Therefore, the Bayesian approach is not ANOVA, but is analogous to ANOVA. (p. 556)
19.3 Hierarchical Bayesian approach
“Our goal is to estimate its parameters in a Bayesian framework. Therefore, all the parameters need to be given a meaningfully structured prior distribution” (p. 557). However, our approach will depart a little from the one in the text. All our parameters will not “have generic noncommittal prior distributions” (p. 557). Most importantly, we will not follow the example in (Gelman, 2006) of putting a broad uniform prior on
# Bracket
<- tibble(x = 0.99,
p1 y = 0.5,
label = "{_}") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[8], family = "Times", hjust = 1, size = 10) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
## Plain arrow
# Save our custom arrow settings
<- arrow(angle = 20, length = unit(0.35, "cm"), type = "closed")
my_arrow <- tibble(x = 0.72,
p2 y = 1,
xend = 0.68,
yend = 0.25) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p3 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("italic(M)[0]", "italic(S)[0]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Second normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p4 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("0", "sigma[beta]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Two annotated arrows
<- tibble(x = c(0.16, 0.81),
p5 y = 1,
xend = c(0.47, 0.77),
yend = 0) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
annotate(geom = "text",
x = c(0.25, 0.74, 0.83), y = 0.5,
label = c("'~'", "'~'", "italic(j)"),
color = pp[4], family = "Times", parse = TRUE, size = c(10, 10, 7)) +
xlim(0, 1) +
theme_void()
# Likelihood formula
<- tibble(x = 0.99,
p6 y = 0.25,
label = "beta[0]+sum()[italic(j)]*beta['['*italic(j)*']']*italic(x)['['*italic(j)*']'](italic(i))") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", hjust = 1, parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# Half-normal density
<- tibble(x = seq(from = 0, to = 3, by = 0.01)) |>
p7 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 1.5, y = 0.2,
label = "half-normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = 1.5, y = 0.6,
label = "0*','*~italic(S)[sigma]",
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Annotated arrow
<- tibble(x = 0.38,
p8 y = 0.65,
label = "'='") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 10) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0.25,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# The third normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p9 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("mu[italic(i)]", "sigma[italic(y)]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Another annotated arrow
<- tibble(x = 0.55,
p10 y = 0.6,
label = "'~'") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 10) +
geom_segment(x = 0.82, xend = 0.38,
y = 1, yend = 0.2,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# The final annotated arrow
<- tibble(x = c(0.375, 0.625),
p11 y = c(1/3, 1/3),
label = c("'~'", "italic(i)")) |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = c(10, 7),
color = pp[4], family = "Times", parse = TRUE) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Some text
<- tibble(x = 0.5,
p12 y = 0.5,
label = "italic(y[i])") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 7) +
xlim(0, 1) +
theme_void()
# Define the layout
<- c(
layout area(t = 1, b = 1, l = 6, r = 7),
area(t = 2, b = 3, l = 6, r = 7),
area(t = 3, b = 4, l = 1, r = 3),
area(t = 3, b = 4, l = 5, r = 7),
area(t = 6, b = 7, l = 1, r = 7),
area(t = 5, b = 6, l = 1, r = 7),
area(t = 6, b = 7, l = 9, r = 11),
area(t = 9, b = 10, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 11),
area(t = 11, b = 11, l = 5, r = 7),
area(t = 12, b = 12, l = 5, r = 7))
# Combine, adjust, and display
+ p2 + p3 + p4 + p6 + p5 + p7 + p9 + p8 + p10 + p11 + p12) +
(p1 plot_layout(design = layout) &
ylim(0, 1) &
theme(plot.margin = margin(0, 5.5, 0, 5.5))
Later on in the text, Kruschke opined:
A crucial pre-requisite for estimating
from all the groups is an assumption that all the groups are representative and informative for the estimate. It only makes sense to influence the estimate of one group with data from the other groups if the groups can be meaningfully described as representative of a shared higher-level distribution. (p. 559)
Although I agree with him in spirit, this doesn’t appear to strictly be the case. As odd and paradoxical as this sounds, partial pooling can be of use even when the some of the cases are of a different kind. For more on the topic, see Efron and Morris’s classic (1977) paper, Stein’s paradox in statistics, and my blog post walking out one of their examples in brms.
19.3.1 Implementation in JAGS brms.
The brms setup, of course, differs a bit from JAGS.
<- brm(
fit data = my_data,
family = gaussian,
~ 1 + (1 | categirical_variable),
y prior = c(prior(normal(0, x), class = Intercept),
prior(normal(0, x), class = b),
prior(cauchy(0, x), class = sd),
prior(cauchy(0, x), class = sigma)))
The noise standard deviation class = sigma
. The grand mean is depicted by the first 1
in the model formula and its prior is indicated by the class = Intercept
argument. We indicate we’d like group-based deviations from the grand mean with the (1 | categirical_variable)
syntax, where the 1
on the left side of the bar indicates we’d like our intercepts to vary by group and the categirical_variable
part simply represents the name of a given categorical variable we’d like those intercepts to vary by. The brms default is to do this with deviance scores, the mean for which will be zero. Although it’s not obvious in the formula syntax, the model presumes the group-based deviations are normally distributed with a mean of zero and a standard deviation, which Kruschke termed class = sd
. We, of course, are not using a uniform prior on any of our variance parameters. But in order to be weakly informative, we will use the half-Cauchy. Recall that since the brms default is to set the lower bound for any variance parameter to 0, there’s no need to worry about doing so ourselves. So even though the syntax only indicates cauchy
, it’s understood to mean Cauchy with a lower bound at zero; since the mean is usually 0, that makes is a half-Cauchy.
Kruschke set the upper bound for his
Kruschke suggested using a gamma on
tibble(x = seq(from = 0, to = 110, by = 0.1)) |>
ggplot(aes(x = x, y = dgamma(x, 2, 0.1))) +
geom_area(fill = pp[10]) +
annotate(geom = "text",
x = 14.25, y = 0.015, label = "'Gamma'*(2*', '*0.1)",
color = pp[1], parse = TRUE, size = 4.25) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 110)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)))
If you’d like that prior be even less informative, just reduce it to like sd(y)/2
and its standard deviation is 2*sd(y)
, using the function gammaShRaFromModeSD
explained in Section 9.2.2.” (pp. 560–561). Let’s make that function.
<- function(mode, sd) {
gamma_a_b_from_omega_sigma if (mode <= 0) stop("`mode` must be > 0")
if (sd <= 0) stop("`sd` must be > 0")
<- (mode + sqrt(mode^2 + 4 * sd^2)) / (2 * sd^2)
rate <- 1 + mode * rate
shape list(shape = shape, rate = rate)
}
So in the case of standardized data where sd(1)
= 1, we’d use our gamma_a_b_from_omega_sigma()
function like so.
<- 1
sd_y
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)) (s_r
$shape
[1] 1.283196
$rate
[1] 0.5663911
And that produces the following gamma distribution.
tibble(x = seq(from = 0, to = 21, by = 0.01)) |>
ggplot(aes(x = x, y = dgamma(x, s_r$shape, s_r$rate))) +
geom_area(fill = pp[8]) +
annotate(geom = "text",
x = 2.75, y = 0.02, label = "'Gamma'*(1.283196*', '*0.5663911)",
color = pp[7], parse = TRUE, size = 2.75) +
scale_x_continuous(breaks = c(0, 1, 5, 10, 20), expand = c(0, 0), limits = c(0, 21)) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)))
In the parameter space that matters, from zero to one, that gamma is pretty noninformative. It peaks between the two, slopes very gently rightward, but has the nice steep slope on the left keeping the estimates off the zero boundary. And even though that right slope is very gentle given the scale of the data, it’s aggressive enough that it should keep the MCMC chains from spending a lot of time in ridiculous parts of the parameter space. I.e., when working with finite numbers of iterations, we want our MCMC chains wasting exactly zero iterations investigating what the density might be for
19.3.2 Example: Sex and death.
Let’s load and glimpse()
at Hanley and Shapiro’s (1994) fruit-fly data.
<- read_csv("data.R/FruitflyDataReduced.csv")
my_data
glimpse(my_data)
Rows: 125
Columns: 3
$ Longevity <dbl> 35, 37, 49, 46, 63, 39, 46, 56, 63, 65, 56, 65, 70, 63…
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ Thorax <dbl> 0.64, 0.68, 0.68, 0.72, 0.72, 0.76, 0.76, 0.76, 0.76, …
We can use geom_density_ridges()
to help get a sense of how our criterion Longevity
is distributed across groups of CompanionNumber
.
|>
my_data group_by(CompanionNumber) |>
mutate(group_mean = mean(Longevity)) |>
ungroup() |>
mutate(CompanionNumber = fct_reorder(CompanionNumber, group_mean)) |>
ggplot(aes(x = Longevity, y = CompanionNumber, fill = group_mean)) +
geom_density_ridges(color = pp[9], linewidth = 0.2, scale = 3/2) +
scale_fill_gradient(low = pp[4], high = pp[2]) +
scale_x_continuous(expand = expansion(mult = c(0, 0.05)), limits = c(0, NA)) +
scale_y_discrete(NULL, expand = expansion(mult = c(0, 0.4))) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
legend.position = "none")
We’ll want to do the preparatory work to define our stanvars
.
<- mean(my_data$Longevity)) (mean_y
[1] 57.44
<- sd(my_data$Longevity)) (sd_y
[1] 17.56389
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)) (s_r
$shape
[1] 1.283196
$rate
[1] 0.03224747
With the prep work is done, here are our stanvars
.
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta")
Now fit the model, our hierarchical Bayesian alternative to an ANOVA.
.1 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + (1 | CompanionNumber),
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(gamma(alpha, beta), class = sd),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.99),
stanvars = stanvars,
file = "fits/fit19.01")
Much like Kruschke’s JAGS chains, our brms chains are well behaved, but only after fiddling with the adapt_delta
setting.
color_scheme_set(scheme = pp[c(10, 8, 12, 5, 1, 4)])
plot(fit19.1, widths = c(2, 3))
Also like Kruschke, our chains appear moderately autocorrelated.
# Extract the posterior draws
<- as_draws_df(fit19.1)
draws
# Plot
|>
draws mutate(chain = .chain) |>
mcmc_acf(pars = vars(b_Intercept:sigma), lags = 10)
Here’s the model summary.
print(fit19.1)
Warning: There were 1 divergent transitions after warmup. Increasing
adapt_delta above 0.99 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 1 + (1 | CompanionNumber)
Data: my_data (Number of observations: 125)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~CompanionNumber (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 15.14 7.83 6.30 36.35 1.00 2382 3409
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 57.62 7.68 41.65 73.79 1.00 2366 2534
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 14.90 0.96 13.16 16.94 1.00 6797 6682
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
With the ranef()
function, we can get the summaries of the group-specific deflections.
ranef(fit19.1)
$CompanionNumber
, , Intercept
Estimate Est.Error Q2.5 Q97.5
None0 5.5568444 7.989504 -10.684101 22.287040
Pregnant1 6.7326810 7.984169 -9.509364 23.203262
Pregnant8 5.3384232 7.978771 -10.773413 21.747306
Virgin1 -0.8203872 7.966322 -17.491255 15.384177
Virgin8 -17.7558744 8.040377 -35.198619 -2.106991
And with the coef()
function, we can get those same group-level summaries in a non-deflection metric.
coef(fit19.1)
$CompanionNumber
, , Intercept
Estimate Est.Error Q2.5 Q97.5
None0 63.17337 2.952249 57.41697 69.02449
Pregnant1 64.34920 2.918295 58.60453 70.13374
Pregnant8 62.95495 2.919065 57.24735 68.80056
Virgin1 56.79614 2.891547 51.25394 62.56749
Virgin8 39.86065 3.044018 33.88484 45.91160
Those are all estimates of the group-specific means. Since it wasn’t modeled, all have the same parameter estimates for
posterior_summary(fit19.1)["sigma", ]
Estimate Est.Error Q2.5 Q97.5
14.8991245 0.9583564 13.1618487 16.9399621
To prepare for our version of the top panel of Figure 19.3, let’s make a corresponding plot of the fixed intercept, the grand mean. For this plot, we’ll use slice_sample()
to randomly sample from the posterior draws.
# How many random draws from the posterior would you like?
<- 20
n_draws
# Subset
set.seed(19)
|>
draws slice_sample(n = n_draws, replace = FALSE) |>
expand_grid(Longevity = 0:120) |>
mutate(density = dnorm(x = Longevity, mean = b_Intercept, sd = sigma)) |>
ggplot(aes(x = Longevity)) +
geom_line(aes(y = density, group = .draw),
alpha = 2/3, color = pp[4], linewidth = 1/3) +
geom_jitter(data = my_data,
aes(y = -0.001),
alpha = 3/4, color = pp[10], height = 0.001) +
scale_x_continuous(breaks = 0:4 * 25, limits = c(0, NA),
expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Posterior Predictive Distribution",
subtitle = "The jittered dots are the ungrouped Longevity data. The\nGaussians are posterior draws depicting the overall\ndistribution, the grand mean.") +
coord_cartesian(xlim = c(0, 110))
For the next plot, we’ll use a tidybayes::add_epred_draws()
-based work flow instead.
<- my_data |>
densities distinct(CompanionNumber) |>
add_epred_draws(fit19.1, ndraws = n_draws, seed = 19, dpar = c("mu", "sigma"))
glimpse(densities)
Rows: 100
Columns: 8
Groups: CompanionNumber, .row [5]
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .draw <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ .epred <dbl> 63.16864, 62.47580, 66.74862, 67.78523, 67.66527, 67.3…
$ mu <dbl> 63.16864, 62.47580, 66.74862, 67.78523, 67.66527, 67.3…
$ sigma <dbl> 13.97914, 16.68802, 15.46328, 14.48031, 14.37639, 14.8…
With the first two lines, we made a CompanionNumber
. The add_epred_draws()
function comes from tidybayes (see the Posterior fits section of Kay, 2021). The first argument of the add_epred_draws()
is newdata
, which works much like it does in brms::fitted()
; it took our fit19.1
. With the ndraws
argument, we indicated we just wanted 20 random draws from the posterior. The seed
argument makes those random draws reproducible. With dpar
, we requested distributional regression parameters in the output. In our case, those were the CompanionNumber
. Since we took 20 draws across 5 groups, we ended up with a 100-row tibble.
The next steps are a direct extension of the method we used to make our Gaussians for our version of Figure 19.1.
<- densities |>
densities mutate(ll = qnorm(0.025, mean = mu, sd = sigma),
ul = qnorm(0.975, mean = mu, sd = sigma)) |>
mutate(Longevity = map2(.x = ll, .y = ul,
.f = \(ll, ul) seq(from = ll, to = ul, length.out = 100))) |>
unnest(Longevity) |>
mutate(density = dnorm(Longevity, mu, sigma))
glimpse(densities)
Rows: 10,000
Columns: 12
Groups: CompanionNumber, .row [5]
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .draw <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .epred <dbl> 63.16864, 63.16864, 63.16864, 63.16864, 63.16864, 63.1…
$ mu <dbl> 63.16864, 63.16864, 63.16864, 63.16864, 63.16864, 63.1…
$ sigma <dbl> 13.97914, 13.97914, 13.97914, 13.97914, 13.97914, 13.9…
$ ll <dbl> 35.77002, 35.77002, 35.77002, 35.77002, 35.77002, 35.7…
$ ul <dbl> 90.56726, 90.56726, 90.56726, 90.56726, 90.56726, 90.5…
$ Longevity <dbl> 35.77002, 36.32353, 36.87704, 37.43054, 37.98405, 38.5…
$ density <dbl> 0.004180876, 0.004514715, 0.004867574, 0.005239790, 0.…
If you look at the code we used to make ll
and ul
, you’ll see we used 95% intervals, this time. Our second mutate()
function is basically the same. After unnesting the tibble, we just needed to plug in the Longevity
, mu
, and sigma
values into the dnorm()
function to compute the corresponding density values.
|>
densities ggplot(aes(x = Longevity, y = CompanionNumber)) +
# Here we make our density lines
geom_ridgeline(aes(height = density, group = interaction(CompanionNumber, .draw)),
color = adjustcolor(pp[4], alpha.f = 2/3), fill = NA,
linewidth = 1/3, scale = 25) +
# The original data with little jitter thrown in
geom_jitter(data = my_data,
alpha = 3/4, color = pp[10], height = 0.04) +
# Pretty much everything below this line is aesthetic fluff
scale_x_continuous(breaks = 0:4 * 25, limits = c(0, 110),
expand = expansion(mult = c(0, 0.05))) +
labs(y = NULL,
title = "Data with Posterior Predictive Distrib.") +
coord_cartesian(ylim = c(1.25, 5.25)) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
Do be aware that when you use this method, you may have to fiddle around with the geom_ridgeline()
scale
argument to get the Gaussian’s heights on reasonable-looking relative heights. Stick in different numbers to get a sense of what I mean. I also find that I’m often not a fan of the way the spacing on the y axis ends up with default geom_ridgeline()
. It’s easy to overcome this with a little ylim
fiddling.
To return to the more substantive interpretation, the top panel of
Figure 19.3 suggests that the normal distributions with homogeneous variances appear to be reasonable descriptions of the data. There are no dramatic outliers relative to the posterior predicted curves, and the spread of the data within each group appears to be reasonably matched by the width of the posterior normal curves. (Be careful when making visual assessments of homogeneity of variance because the visual spread of the data depends on the sample size; for a reminder see the [see the right panel of Figure 17.1, p. 478].) The range of credible group means, indicated by the peaks of the normal curves, suggests that the group Virgin8 is clearly lower than the others, and the group Virgin1 might be lower than the controls. To find out for sure, we need to examine the differences of group means, which we do in the next section. (p. 564)
For clarity, the “see the right panel of Figure 17.1, p. 478” part was changed following Kruschke’s Corrigenda.
19.3.3 Contrasts.
It is straight forward to examine the posterior distribution of credible differences. Every step in the MCMC chain provides a combination of group means that are jointly credible, given the data. Therefore, every step in the MCMC chain provides a credible difference between groups…
To construct the credible differences of group
and group , at every step in the MCMC chain we compute
In other words, the baseline cancels out of the calculation, and the difference is a sum of weighted group deflections. Notice that the weights sum to zero. To construct the credible differences of the average of groups
- and the average of groups - , at every step in the MCMC chain we compute
Again, the difference is a sum of weighted group deflections. The coefficients on the group deflections have the properties that they sum to zero, with the positive coefficients summing to
and the negative coefficients summing to . Such a combination is called a contrast. The differences can also be expressed in terms of effect size, by dividing the difference by at each step in the chain. (pp. 565–566)
To warm up, here’s how to compute the first contrast shown in the lower portion of Kruschke’s Figure 19.3–the contrast between the two pregnant conditions and the none-control condition.
|>
draws transmute(c = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant8,Intercept]`) / 2 -
`r_CompanionNumber[None0,Intercept]`) |>
ggplot(aes(x = c)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], slab_color = pp[5], slab_fill = pp[5]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(x = expression(Difference~(mu[1]+mu[2])/2-mu[3]),
subtitle = "Pregnant1.Pregnant8 vs None0")
Up to this point, our primary mode of showing marginal posterior distributions has either been minute variations on Kruschke’s typical histogram approach or with densities. We’ll use those again in the future, too. In this chapter and the next, we’ll veer a little further from the source material and depict our marginal posteriors with dot plots and their very close relatives, quantile plots. In the dot plot, above, each of the 4,000 posterior draws is depicted by one of the stacked brown dots. To stack the dots in neat columns like that, tidybayes has to round a little. Though we lose something in the numeric precision, we gain a lot in interpretability. We’ll have more to say in just a moment.
In case you were curious, here are the HMC-based posterior mode and 95% HDIs.
|>
draws transmute(difference = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant8,Intercept]`) / 2 -
`r_CompanionNumber[None0,Intercept]`) |>
mode_hdi(difference)
# A tibble: 1 × 6
difference .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 0.149 -6.66 7.37 0.95 mode hdi
Little difference, there. Now let’s quantify the same contrast as a standardized mean difference effect size.
|>
draws transmute(es = ((`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant8,Intercept]`) / 2 -
`r_CompanionNumber[None0,Intercept]`) / sigma) |>
ggplot(aes(x = es)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(x = expression(Effect~Size~(Difference/sigma[italic(y)])),
subtitle = "Pregnant1.Pregnant8 vs None0")
From a standardized-mean-difference perspective, that’s tiny. Also note that because our model fit19.1
did not allow the standard deviation parameter
Did you notice the quantiles = 100
argument within stat_dotsinterval()
? Instead of a dot plot with 4,000 tiny little dots, that argument converted the output to a quantile plot. The 4,000 posterior draws are now summarized by 100 dots, each of which represents
Okay, now let’s do the rest in bulk. First we’ll do the difference scores.
<- draws |>
differences transmute(
`Pregnant1.Pregnant8.None0 vs Virgin1` = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant8,Intercept]` +
`r_CompanionNumber[None0,Intercept]`) / 3 - `r_CompanionNumber[Virgin1,Intercept]`,
`Virgin1 vs Virgin8` = `r_CompanionNumber[Virgin1,Intercept]` - `r_CompanionNumber[Virgin8,Intercept]`,
`Pregnant1.Pregnant8.None0 vs Virgin1.Virgin8` = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant8,Intercept]` +
`r_CompanionNumber[None0,Intercept]`) / 3 - (`r_CompanionNumber[Virgin1,Intercept]` +
`r_CompanionNumber[Virgin8,Intercept]`) / 2)
|>
differences pivot_longer(cols = everything()) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Difference") +
facet_wrap(~ name, scales = "free")
Because we save our data wrangling labor from above as differences
, it won’t take much more effort to compute and plot the corresponding effect sizes as displayed in the bottom row of Figure 19.3.
|>
differences mutate_all(.funs = ~ . / draws$sigma) |>
pivot_longer(cols = everything()) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Effect Size (Standardized mean difference)") +
facet_wrap(~ name, scales = "free_x")
In traditional ANOVA, analysts often perform a so-called omnibus test that asks whether it is plausible that all the groups are simultaneously exactly equal. I find that the omnibus test is rarely meaningful, however…. In the hierarchical Bayesian estimation used here, there is no direct equivalent to an omnibus test in ANOVA, and the emphasis is on examining all the meaningful contrasts. (p. 567)
Speaking of all meaningful contrasts, if you’d like to make all pairwise comparisons in a hierarchical model of this form, tidybayes offers a convenient way to do so (see the Comparing levels of a factor section of Kay, 2021). Here we’ll demonstrate with stat_dotsinterval()
.
.1 |>
fit19# These two lines are where the magic is at
spread_draws(r_CompanionNumber[CompanionNumber, ]) |>
compare_levels(r_CompanionNumber, by = CompanionNumber) |>
ggplot(aes(x = r_CompanionNumber, y = CompanionNumber)) +
geom_vline(xintercept = 0, color = pp[12]) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
labs(x = "Contrast",
y = NULL) +
coord_cartesian(ylim = c(1.5, 10.5)) +
theme(axis.text.y = element_text(hjust = 0))
But back to that omnibus test notion. If you really wanted to, I suppose one rough analogue would be to use information criteria to compare the hierarchical model to one that includes a single intercept with no group-level deflections. Here’s what the simpler model would look like.
.2 <- brm(
fit19data = my_data,
family = gaussian,
~ 1,
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.02")
Check the model summary.
print(fit19.2)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 1
Data: my_data (Number of observations: 125)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 57.42 1.59 54.30 60.52 1.00 10897 7658
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 17.67 1.12 15.63 20.02 1.00 10000 7461
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Here are their LOO values and their difference score.
.1 <- add_criterion(fit19.1, criterion = "loo")
fit19.2 <- add_criterion(fit19.2, criterion = "loo")
fit19
loo_compare(fit19.1, fit19.2) |>
print(simplify = FALSE)
elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
fit19.1 0.0 0.0 -517.7 7.0 5.5 0.6 1035.4 13.9
fit19.2 -19.3 5.3 -537.0 7.1 1.8 0.3 1074.0 14.2
The hierarchical model has a better LOO. Here are the stacking-based model weights.
<- model_weights(fit19.1, fit19.2)) (mw
fit19.1 fit19.2
9.999982e-01 1.843334e-06
If you don’t like scientific notation, just round()
.
|>
mw round(digits = 3)
fit19.1 fit19.2
1 0
Yep, in complimenting the LOO difference, virtually all the stacking weight went to the hierarchical model. You might think of this another way. The conceptual question we’re asking is: Does it make sense to say that the
|>
draws ggplot(aes(x = sd_CompanionNumber__Intercept)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = c(0, 50)) +
labs(x = NULL,
title = expression("Behold the fit19.1 posterior for "*sigma[beta]*"."),
subtitle = "This parameter's many things, but zero isn't one of them.")
Yeah, zero and other values close to zero don’t look credible for that parameter. 95% of the mass is between 5 and 30, with the bulk hovering around 10. We don’t need an
19.3.4 Multiple comparisons and shrinkage.
The previous section suggested that an analyst should investigate all contrasts of interest. This recommendation can be thought to conflict with traditional advice in the context on null hypothesis significance testing, which instead recommends that a minimal number of comparisons should be conducted in order to maximize the power of each test while keeping the overall false alarm rate capped at 5% (or whatever maximum is desired)…. Instead, a Bayesian analysis can mitigate false alarms by incorporating prior knowledge into the model. In particular, hierarchical structure (which is an expression of prior knowledge) produces shrinkage of estimates, and shrinkage can help rein in estimates of spurious outlying data. For example, in the posterior distribution from the fruit fly data, the modal values of the posterior group means have a range of
. The sample means of the groups have a range of . Thus, there is some shrinkage in the estimated means. The amount of shrinkage is dictated only by the data and by the prior structure, not by the intended tests. (p. 568)
We may as well compute those ranges by hand. Here’s the range of the observed data.
|>
my_data group_by(CompanionNumber) |>
summarise(mean = mean(Longevity)) |>
summarise(range = max(mean) - min(mean))
# A tibble: 1 × 1
range
<dbl>
1 26.1
For our hierarchical model fit19.1
, the posterior means are rank ordered in the same way as the empirical data.
coef(fit19.1)$CompanionNumber[, , "Intercept"] |>
data.frame() |>
rownames_to_column(var = "companion_number") |>
arrange(Estimate) |>
mutate_if(is.double, .funs = round, digits = 1)
companion_number Estimate Est.Error Q2.5 Q97.5
1 Virgin8 39.9 3.0 33.9 45.9
2 Virgin1 56.8 2.9 51.3 62.6
3 Pregnant8 63.0 2.9 57.2 68.8
4 None0 63.2 3.0 57.4 69.0
5 Pregnant1 64.3 2.9 58.6 70.1
If we compute the range by a difference of the point estimates of the highest and lowest posterior means, we can get a quick number.
coef(fit19.1)$CompanionNumber[, , "Intercept"] |>
as_tibble() |>
summarise(range = max(Estimate) - min(Estimate))
# A tibble: 1 × 1
range
<dbl>
1 24.5
Note that wasn’t fully Bayesian of us. Those means and their difference carry uncertainty and that uncertainty can be fully expressed if we use all the posterior draws (i.e., use summary = FALSE
and wrangle).
coef(fit19.1, summary = FALSE)$CompanionNumber[, , "Intercept"] |>
as_tibble() |>
transmute(range = Pregnant1 - Virgin8) |>
mode_hdi(range)
# A tibble: 1 × 6
range .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 24.1 16.1 32.8 0.95 mode hdi
Happily, the central tendency of the range is near equivalent with both methods, but now we have 95% intervals, too. Do note how wide they are. This is why we work with the full set of posterior draws.
19.3.5 The two-group case.
A special case of our current scenario is when there are only two groups. The model of the present section could, in principle, be applied to the two-group case, but the hierarchical structure would do little good because there is virtually no shrinkage when there are so few groups (and the top-level prior on
is broad as assumed here). (p. 568)
For kicks and giggles, let’s practice. Since Pregnant1
and Virgin8
had the highest and lowest empirical means—making them the groups best suited to define our range, we’ll use them to fit the 2-group hierarchical model. To fit it with haste, just use update()
.
.3 <- update(
fit19.1,
fit19newdata = my_data |>
filter(CompanionNumber %in% c("Pregnant1", "Virgin8")),
seed = 19,
file = "fits/fit19.03")
Even with just two groups, there were no gross issues with fitting the model.
print(fit19.3)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 1 + (1 | CompanionNumber)
Data: filter(my_data, CompanionNumber %in% c("Pregnant1" (Number of observations: 50)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~CompanionNumber (Number of levels: 2)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 32.95 22.22 8.95 94.60 1.00 2925 4477
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 51.11 25.09 -1.67 104.59 1.00 2830 2528
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 14.22 1.49 11.64 17.48 1.00 5671 4788
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
If you compare the posteriors for fit19.3
is substantially larger.
posterior_summary(fit19.1)["sd_CompanionNumber__Intercept", ]
Estimate Est.Error Q2.5 Q97.5
15.141200 7.831739 6.297608 36.352988
posterior_summary(fit19.3)["sd_CompanionNumber__Intercept", ]
Estimate Est.Error Q2.5 Q97.5
32.946365 22.223396 8.947535 94.604415
Here that is in a coefficient plot using tidybayes::stat_interval()
.
bind_rows(as_draws_df(fit19.1) |> select(sd_CompanionNumber__Intercept),
as_draws_df(fit19.3) |> select(sd_CompanionNumber__Intercept)) |>
mutate(fit = rep(c("fit19.1", "fit19.3"), each = n() / 2)) |>
ggplot(aes(x = sd_CompanionNumber__Intercept, y = fit)) +
stat_interval(point_interval = mode_hdi, .width = c(0.5, 0.8, 0.95)) +
scale_color_manual(values = pp[c(11, 5, 7)],
labels = c("95%", "80%", "50%")) +
scale_x_continuous(expression(sigma[beta]),
limits = c(0, NA), expand = expansion(mult = c(0, 0.05))) +
ylab(NULL) +
theme(legend.key.size = unit(0.45, "cm"))
This all implies less shrinkage and a larger range.
coef(fit19.3, summary = FALSE)$CompanionNumber[, , "Intercept"] |>
as_tibble() |>
transmute(range = Pregnant1 - Virgin8) |>
mode_hdi(range)
# A tibble: 1 × 6
range .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 25.5 18.1 33.8 0.95 mode hdi
And indeed, the range between the two groups is larger. Now the posterior mode for their difference has almost converged to that of the raw data. Kruschke then went on to recommend using a single-level model in such situations, instead.
That is why the two-group model in Section 16.3 did not use hierarchical structure, as illustrated in Figure 16.11 (p. 468). That model also used a
distribution to accommodate outliers in the data, and that model allowed for heterogeneous variances across groups. Thus, for two groups, it is more appropriate to use the model of Section 16.3. The hierarchical multi-group model is generalized to accommodate outliers and heterogeneous variances in Section 19.5. (p. 568)
As a refresher, here’s what the brms code for that Chapter 16 model looked like.
.3 <- brm(
fit16data = my_data,
family = student,
bf(Score ~ 0 + Group,
~ 0 + Group),
sigma prior = c(prior(normal(mean_y, sd_y * 100), class = b),
prior(normal(0, log(sd_y)), class = b, dpar = sigma),
prior(exponential(one_over_twentynine), class = nu)),
chains = 4, cores = 4,
stanvars = stanvars,
seed = 16,
file = "fits/fit16.03")
Let’s adjust it for our data. Since we have a reduced data set, we’ll need to re-compute our stanvars
values, which were based on the raw data.
# It's easier to just make a reduced data set
<- my_data |>
my_small_data filter(CompanionNumber %in% c("Pregnant1", "Virgin8"))
<- mean(my_small_data$Longevity)) (mean_y
[1] 51.76
<- sd(my_small_data$Longevity)) (sd_y
[1] 19.11145
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)) (s_r
$shape
[1] 1.283196
$rate
[1] 0.02963623
Here we update stanvars
.
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta") +
stanvar(1/29, name = "one_over_twentynine")
Note that our priors, here, are something of a blend of those from Chapter 16 and those from our hierarchical model, fit19.1
.
.4 <- brm(
fit19data = my_small_data,
family = student,
bf(Longevity ~ 0 + CompanionNumber,
~ 0 + CompanionNumber),
sigma prior = c(prior(normal(mean_y, sd_y * 10), class = b),
prior(normal(0, log(sd_y)), class = b, dpar = sigma),
prior(exponential(one_over_twentynine), class = nu)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.04")
Here’s the model summary.
print(fit19.4)
Family: student
Links: mu = identity; sigma = log; nu = identity
Formula: Longevity ~ 0 + CompanionNumber
sigma ~ 0 + CompanionNumber
Data: my_small_data (Number of observations: 50)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat
CompanionNumberPregnant1 64.63 3.27 58.00 70.96 1.00
CompanionNumberVirgin8 38.71 2.54 33.71 43.71 1.00
sigma_CompanionNumberPregnant1 2.73 0.15 2.44 3.04 1.00
sigma_CompanionNumberVirgin8 2.48 0.15 2.19 2.79 1.00
Bulk_ESS Tail_ESS
CompanionNumberPregnant1 13791 7956
CompanionNumberVirgin8 12375 7876
sigma_CompanionNumberPregnant1 12931 9309
sigma_CompanionNumberVirgin8 12608 8147
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
nu 39.59 30.91 5.80 120.30 1.00 12030 8145
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Man, look at those Bulk_ESS
values! As it turns out, they can be greater than the number of post-warmup samples. And here’s the range in posterior means.
fixef(fit19.4, summary = FALSE) |>
as_tibble() |>
transmute(range = CompanionNumberPregnant1 - CompanionNumberVirgin8) |>
mode_hdi(range)
# A tibble: 1 × 6
range .lower .upper .width .point .interval
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 25.8 17.7 33.8 0.95 mode hdi
The results are pretty much the same as that of the two-group hierarchical model, maybe a touch larger. Yep, Kruschke was right. Hierarchical models with two groups and permissive priors on
19.4 Including a metric predictor
“In Figure 19.3, the data within each group have a large standard deviation. For example, longevity values in the Virgin8 group range from
|>
my_data group_by(CompanionNumber) |>
summarise(min = min(Longevity),
max = max(Longevity),
range = max(Longevity) - min(Longevity))
# A tibble: 5 × 4
CompanionNumber min max range
<chr> <dbl> <dbl> <dbl>
1 None0 37 96 59
2 Pregnant1 42 97 55
3 Pregnant8 35 86 51
4 Virgin1 21 81 60
5 Virgin8 16 60 44
But you get the point. For each group, there was quite a range. We might add predictors to the model to help account for those ranges.
The additional metric predictor is sometimes called a covariate. In the experimental setting, the focus of interest is usually on the nominal predictor (i.e., the experimental treatments), and the covariate is typically thought of as an ancillary predictor to help isolate the effect of the nominal predictor. But mathematically the nominal and metric predictors have equal status in the model. Let’s denote the value of the metric covariate for subject
as . Then the expected value of the predicted variable for subject is
with the usual sum-to-zero constraint on the deflections of the nominal predictor stated in Equation 19.2. In words, Equation 19.5 says that the predicted value for subject
is a baseline plus a deflection due to the group of plus a shift due to the value of on the covariate. (p. 569)
And the
makes sense to set the intercept as the mean of predicted values if the covariate is re-centered at its mean value, which is denoted
. Therefore Equation 19.5 is algebraically reformulated to make the baseline respect those constraints…. The first equation below is simply Equation 19.5 with recentered on its mean, . The second line below merely algebraically rearranges the terms so that the nominal deflections sum to zero and the constants are combined into the overall baseline:
(pp. 569–570)
We have a visual depiction of all this in the hierarchical model diagram of Figure 19.4.
# Bracket
<- tibble(x = 0.99,
p1 y = 0.5,
label = "{_}") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[8], family = "Times", hjust = 1, size = 10) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# Plain arrow
<- tibble(x = 0.71,
p2 y = 1,
xend = 0.68,
yend = 0.25) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p3 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("italic(M)[0]", "italic(S)[0]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Second normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p4 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("0", "sigma[beta]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Third density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p5 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("italic(M)[italic(c)]", "italic(S)[italic(c)]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Three annotated arrows
<- tibble(x = c(0.09, 0.49, 0.9),
p6 y = 1,
xend = c(0.20, 0.40, 0.64),
yend = 0) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
annotate(geom = "text",
x = c(0.11, 0.42, 0.47, 0.74), y = 0.5,
label = c("'~'", "'~'", "italic(j)", "'~'"),
size = c(10, 10, 7, 10),
color = pp[4], family = "Times", parse = TRUE) +
xlim(0, 1) +
theme_void()
# Likelihood formula
<- tibble(x = 0.99,
p7 y = 0.25,
label = "beta[0]+sum()[italic(j)]*beta['['*italic(j)*']']*italic(x)['['*italic(j)*']'](italic(i))+beta[italic(cov)]*italic(x)[italic(cov)](italic(i))") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", hjust = 1, parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# Half-normal density
<- tibble(x = seq(from = 0, to = 3, by = 0.01)) |>
p8 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 1.5, y = 0.2,
label = "half-normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = 1.5, y = 0.6,
label = "0*','*~italic(S)[sigma]",
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Annotated arrow
<- tibble(x = 0.38,
p9 y = 0.65,
label = "'='") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 10) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0.25,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# The fourth normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p10 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("mu[italic(i)]", "sigma[italic(y)]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Another annotated arrow
<- tibble(x = 0.5,
p11 y = 0.6,
label = "'~'") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 10) +
geom_segment(x = 0.85, xend = 0.27,
y = 1, yend = 0.2,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# The final annotated arrow
<- tibble(x = c(0.375, 0.625),
p12 y = c(1/3, 1/3),
label = c("'~'", "italic(i)")) |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(size = c(10, 7),
color = pp[4], family = "Times", parse = TRUE) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Some text
<- tibble(x = 0.5,
p13 y = 0.5,
label = "italic(y[i])") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 7) +
xlim(0, 1) +
theme_void()
# Define the layout
<- c(
layout area(t = 1, b = 1, l = 6, r = 7),
area(t = 2, b = 3, l = 6, r = 7),
area(t = 3, b = 4, l = 1, r = 3),
area(t = 3, b = 4, l = 5, r = 7),
area(t = 3, b = 4, l = 9, r = 11),
area(t = 6, b = 7, l = 1, r = 9),
area(t = 5, b = 6, l = 1, r = 11),
area(t = 6, b = 7, l = 11, r = 13),
area(t = 9, b = 10, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 7),
area(t = 8, b = 9, l = 5, r = 13),
area(t = 11, b = 11, l = 5, r = 7),
area(t = 12, b = 12, l = 5, r = 7))
# Combine, adjust, and display
+ p2 + p3 + p4 + p5 + p7 + p6 + p8 + p10 + p9 + p11 + p12 + p13) +
(p1 plot_layout(design = layout) &
ylim(0, 1) &
theme(plot.margin = margin(0, 5.5, 0, 5.5))
19.4.1 Example: Sex, death, and size.
Kruschke recalled fit19.1
’s estimate for
as_draws_df(fit19.1) |>
ggplot(aes(x = sigma)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(sigma[italic(y)])) +
theme(panel.grid = element_blank())
Yep, that looks about right. That large of a difference in days would indeed make it difficult to detect between-group differences if those differences were typically on the scale of just a few days. Since Thorax
is moderately correlated with Longevity
, including Thorax
in the statistical model should help shrink that
<- my_data |>
my_data mutate(thorax_c = Thorax - mean(Thorax))
head(my_data)
# A tibble: 6 × 4
Longevity CompanionNumber Thorax thorax_c
<dbl> <chr> <dbl> <dbl>
1 35 Pregnant8 0.64 -0.181
2 37 Pregnant8 0.68 -0.141
3 49 Pregnant8 0.68 -0.141
4 46 Pregnant8 0.72 -0.101
5 63 Pregnant8 0.72 -0.101
6 39 Pregnant8 0.76 -0.0610
Our model code follows the structure of that in Kruschke’s Jags-Ymet-Xnom1met1-MnormalHom-Example.R
and Jags-Ymet-Xnom1met1-MnormalHom.R
files. As a preparatory step, we redefine the values necessary for stanvars
.
<- mean(my_data$Longevity)) (mean_y
[1] 57.44
<- sd(my_data$Longevity)) (sd_y
[1] 17.56389
<- sd(my_data$thorax_c)) (sd_thorax_c
[1] 0.07745367
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)) (s_r
$shape
[1] 1.283196
$rate
[1] 0.03224747
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(sd_thorax_c, name = "sd_thorax_c") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta")
Now we’re ready to fit the brm()
model, our hierarchical alternative to ANCOVA.
.5 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + thorax_c + (1 | CompanionNumber),
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(normal(0, 2 * sd_y / sd_thorax_c), class = b),
prior(gamma(alpha, beta), class = sd),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.99),
stanvars = stanvars,
file = "fits/fit19.05")
Here’s the model summary.
print(fit19.5)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 1 + thorax_c + (1 | CompanionNumber)
Data: my_data (Number of observations: 125)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~CompanionNumber (Number of levels: 5)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 13.88 7.45 6.02 33.61 1.00 2552 3470
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 57.72 7.21 43.64 71.68 1.00 2549 3045
thorax_c 135.86 12.40 111.34 160.56 1.00 6904 6947
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 10.60 0.69 9.34 12.06 1.00 7607 6752
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s see if that
as_draws_df(fit19.5) |>
ggplot(aes(x = sigma)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(sigma[italic(y)]))
Yep, sure did! Now our between-group comparisons should be more precise. Heck, if we wanted to we could even make a difference plot.
tibble(sigma1 = as_draws_df(fit19.1) |> pull(sigma),
sigma5 = as_draws_df(fit19.5) |> pull(sigma)) |>
transmute(dif = sigma1 - sigma5) |>
ggplot(aes(x = dif)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(x = expression(sigma[italic(y)][" | fit19.1"]-sigma[italic(y)][" | fit19.5"]),
title = "This is a difference distribution")
If you want a quick and dirty plot of the relation between thorax_c
and Longevity
, you might employ brms::conditional_effects()
.
conditional_effects(fit19.5) |>
plot(line_args = list(color = pp[5], fill = pp[11]))
But to make plots like the ones at the top of Figure 19.5, we’ll have to work a little harder. First, we need some intermediary values marking off the three values along the Thorax
-axis Kruschke singled out in his top panel plots. As far as I can tell, they were the min()
, the max()
, and their mean()
.
<- range(my_data$Thorax)) (r
[1] 0.64 0.94
mean(r)
[1] 0.79
Next, we’ll make the data necessary for our side-tipped Gaussians. For kicks and giggles, we’ll choose 80 draws instead of 20. But do note how we used our r
values, from above, to specify both Thorax
and thorax_c
values in addition to the CompanionNumber
categories for the newdata
argument. Otherwise, this workflow is very much the same as in previous plots.
<- 80
n_draws
<- my_data |>
densities distinct(CompanionNumber) |>
expand_grid(Thorax = c(r[1], mean(r), r[2])) |>
mutate(thorax_c = Thorax - mean(my_data$Thorax)) |>
add_epred_draws(fit19.5, ndraws = n_draws, seed = 19, dpar = c("mu", "sigma")) |>
mutate(ll = qnorm(0.025, mean = mu, sd = sigma),
ul = qnorm(0.975, mean = mu, sd = sigma)) |>
mutate(Longevity = map2(.x = ll, .y = ul, .f = \(ll, ul) seq(from = ll, to = ul, length.out = 100))) |>
unnest(Longevity) |>
mutate(density = dnorm(x = Longevity, mean = mu, sd = sigma))
glimpse(densities)
Rows: 120,000
Columns: 14
Groups: CompanionNumber, Thorax, thorax_c, .row [15]
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ Thorax <dbl> 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, …
$ thorax_c <dbl> -0.18096, -0.18096, -0.18096, -0.18096, -0.18096, -0.1…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .draw <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .epred <dbl> 39.17164, 39.17164, 39.17164, 39.17164, 39.17164, 39.1…
$ mu <dbl> 39.17164, 39.17164, 39.17164, 39.17164, 39.17164, 39.1…
$ sigma <dbl> 11.60219, 11.60219, 11.60219, 11.60219, 11.60219, 11.6…
$ ll <dbl> 16.43175, 16.43175, 16.43175, 16.43175, 16.43175, 16.4…
$ ul <dbl> 61.91152, 61.91152, 61.91152, 61.91152, 61.91152, 61.9…
$ Longevity <dbl> 16.43175, 16.89114, 17.35054, 17.80993, 18.26932, 18.7…
$ density <dbl> 0.005037415, 0.005439648, 0.005864798, 0.006313270, 0.…
Here, we’ll use a simplified workflow to extract the fitted()
values in order to make the regression lines. Since these are straight lines, all we need are two values for each draw, one at the extremes of the Thorax
axis.
<- my_data |>
f distinct(CompanionNumber) |>
expand_grid(Thorax = c(r[1], mean(r), r[2])) |>
mutate(thorax_c = Thorax - mean(my_data$Thorax)) |>
add_epred_draws(fit19.5, ndraws = n_draws, seed = 19, value = "Longevity")
glimpse(f)
Rows: 1,200
Columns: 8
Groups: CompanionNumber, Thorax, thorax_c, .row [15]
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ Thorax <dbl> 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, 0.64, …
$ thorax_c <dbl> -0.18096, -0.18096, -0.18096, -0.18096, -0.18096, -0.1…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ .draw <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ Longevity <dbl> 39.17164, 38.72720, 41.58792, 39.72092, 44.90579, 34.9…
Now we’re ready to make our plots for the top row of Figure 19.3.
|>
densities ggplot(aes(x = Longevity, y = Thorax)) +
# The Gaussians
geom_ridgeline(aes(height = -density, group = interaction(Thorax, .draw)),
color = adjustcolor(pp[4], alpha.f = 1/5), fill = NA,
linewidth = 1/5, min_height = NA, scale = 5/3) +
# The vertical lines below the Gaussians
geom_line(aes(group = interaction(Thorax, .draw)),
alpha = 1/5, color = pp[4], linewidth = 1/5) +
# The regression lines
geom_line(data = f,
aes(group = .draw),
alpha = 1/5, color = pp[4], linewidth = 1/5) +
# The data
geom_point(data = my_data,
alpha = 3/4, color = pp[10]) +
coord_flip(xlim = c(0, 110),
ylim = c(0.58, 1)) +
facet_wrap(~ CompanionNumber, ncol = 5)
Now we have a covariate in the model, we have to decide on which of its values we want to base our group comparisons. Unless there’s a substantive reason for another value, the mean is a good standard choice. And since the covariate thorax_c
is already mean centered, that means we can effectively leave it out of the equation. Here we make and save them in the simple difference metric.
<- as_draws_df(fit19.5)
draws
<- draws |>
differences transmute(
`Pregnant1.Pregnant8 vs None0` = (`r_CompanionNumber[Pregnant1,Intercept]` + `r_CompanionNumber[Pregnant1,Intercept]`) / 2 - `r_CompanionNumber[None0,Intercept]`,
`Pregnant1.Pregnant8.None0 vs Virgin1` = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[None0,Intercept]`) / 3 - `r_CompanionNumber[Virgin1,Intercept]`,
`Virgin1 vs Virgin8` = `r_CompanionNumber[Virgin1,Intercept]` - `r_CompanionNumber[Virgin8,Intercept]`,
`Pregnant1.Pregnant8.None0 vs Virgin1.Virgin8` = (`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[Pregnant1,Intercept]` +
`r_CompanionNumber[None0,Intercept]`) / 3 - (`r_CompanionNumber[Virgin1,Intercept]` +
`r_CompanionNumber[Virgin8,Intercept]`) / 2)
<- differences |>
p1 pivot_longer(cols = everything()) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Difference") +
theme(strip.text = element_text(size = 6.4)) +
facet_wrap(~ name, ncol = 4, scales = "free_x")
Now we’ll look at the differences in the effect size metric. Since we saved our leg work above, it’s really easy to just convert the differences in bulk with mutate_all()
. After the conversion, we’ll bind the two rows of subplots together with a little patchwork and display the results.
<- differences |>
p2 mutate_all(.funs = ~ . / draws$sigma) |>
pivot_longer(cols = everything()) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Effect Size") +
theme(strip.text = element_text(size = 6.4)) +
facet_wrap(~ name, ncol = 4, scales = "free_x")
# Combine
/ p2 p1
“The HDI widths of all the contrasts have gotten smaller by virtue of including the covariate in the analysis” (p. 571).
19.4.2 Analogous to traditional ANCOVA.
In contrast with ANCOVA,
Bayesian methods do not partition the least-squares variance to make estimates, and therefore the Bayesian method is analogous to ANCOVA but is not ANCOVA. Frequentist practitioners are urged to test (with
values) whether the assumptions of (a) equal slope in all groups, (b) equal standard deviation in all groups, and (c) normally distributed noise can be rejected. In a Bayesian approach, the descriptive model is generalized to address these concerns, as will be discussed in Section 19.5. (p. 572)
19.4.3 Relation to hierarchical linear regression.
Here Kruschke contrasts our last model with the one from way back in Section 17.3. As a refresher, here’s what that code looked like.
.4 <- brm(
fit17data = my_data,
family = student,
~ 1 + x_z + (1 + x_z || Subj),
y_z prior = c(prior(normal(0, 10), class = Intercept),
prior(normal(0, 10), class = b),
prior(normal(0, 1), class = sigma),
# The next line is new
prior(normal(0, 1), class = sd),
prior(exponential(one_over_twentynine) + 1, class = nu)),
chains = 4, cores = 4,
stanvars = stanvar(1/29, name = "one_over_twentynine"),
seed = 17,
file = "fits/fit17.04")
And for convenience, here’s the code from the model we just fit.
.5 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + thorax_c + (1 | CompanionNumber),
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(normal(0, 2 * sd_y / sd_thorax_c), class = b),
prior(gamma(alpha, beta), class = sd),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.99),
stanvars = stanvars,
file = "fits/fit19.05")
It’s easy to get lost in the differences in the priors and the technical details with the model chains and such. The main thing to notice, here, is the differences in the model formulas (i.e., the likelihoods). Both models had intercepts and slopes. But whereas the model from Section 17.3 set both parameters to random, only the intercept in our last model was random. The covariate thorax_c
was fixed–it did not vary by group. Had we wanted it to, our formula
syntax would have been something like Longevity ~ 1 + thorax_c + (1 + thorax_c || CompanionNumber)
. And again, as noted in Section 17.3.1, the ||
portion of the syntax set the random intercepts and slopes to be orthogonal (i.e., correlate exactly at zero). As we’ll see, this will often not be the case. But let’s not get ahead of ourselves.
Conceptually, the main difference between the models is merely the focus of attention. In the hierarchical linear regression model, the focus was on the slope coefficient. In that case, we were trying to estimate the magnitude of the slope, simultaneously for individuals and overall. The intercepts, which describe the levels of the nominal predictor, were of ancillary interest. In the present section, on the other hand, the focus of attention is reversed. We are most interested in the intercepts and their differences between groups, with the slopes on the covariate being of ancillary interest. (p. 573)
19.5 Heterogeneous variances and robustness against outliers
In Figure 19.6 on page 574, Kruschke laid out the diagram for a hierarchical Student’s-
- modeling the log of
, - indicating its grand mean with the
sigma ~ 1
syntax, - modeling the group-level deflections as Gaussian with a mean of 0 and standard deviation
estimated from the data, and - choosing a sensible prior for
that is left-bound at 0 and gently slopes to the right (i.e., a folded or gamma distribution).
Thus, here’s our brms-centric variant of the diagram in Figure 19.6.
# Bracket
<- tibble(x = 0.99,
p1 y = 0.5,
label = "{_}") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[8], family = "Times", hjust = 1, size = 10) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# Plain arrow
<- tibble(x = 0.68,
p2 y = 1,
xend = 0.68,
yend = 0.25) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p3 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("italic(M)[0]", "italic(S)[0]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Second normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p4 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.45), y = 0.6,
hjust = c(0.5, 0),
label = c("0", "sigma[beta]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Third normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p5 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.15), y = 0.6,
label = c("italic(M)[mu[sigma]]", "italic(S)[mu[sigma]]"),
hjust = c(0.5, 0),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Half-normal density
<- tibble(x = seq(from = 0, to = 3, by = 0.01)) |>
p6 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 1.5, y = 0.2,
label = "half-normal",
size = 6, color = pp[4]) +
annotate(geom = "text",
x = 1.5, y = 0.6,
label = "0*','*~italic(S)[sigma[sigma]]",
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Exponential density
<- tibble(x = seq(from = 0, to = 1, by = 0.01)) |>
p7 ggplot(aes(x = x, y = (dexp(x, 2) / max(dexp(x, 2))))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0.5, y = 0.2,
label = "exp",
color = pp[4], size = 7) +
annotate(geom = "text",
x = 0.5, y = 0.6,
label = "italic(K)",
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Likelihood formula
<- tibble(x = 0.5,
p8 y = 0.25,
label = "beta[0]+sum()[italic(j)]*beta['['*italic(j)*']']*italic(x)['['*italic(j)*']'](italic(i))") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 1)) +
ylim(0, 1) +
theme_void()
# Normal density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p9 ggplot(aes(x = x, y = (dnorm(x)) / max(dnorm(x)))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "normal",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(0, 1.2), y = 0.6,
hjust = c(0.5, 0),
label = c("mu[sigma]", "sigma[sigma]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Four annotated arrows
<- tibble(x = c(0.06, 0.37, 0.67, 0.95),
p10 y = 1,
xend = c(0.15, 0.31, 0.665, 0.745),
yend = c(0, 0, 0.2, 0.2)) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
annotate(geom = "text",
x = c(0.065, 0.31, 0.36, 0.64, 0.79), y = 0.5,
label = c("'~'", "'~'", "italic(j)", "'~'", "'~'" ),
size = c(10, 10, 7, 10, 10),
color = pp[4], family = "Times", parse = TRUE) +
xlim(0, 1) +
theme_void()
# Student-t density
<- tibble(x = seq(from = -3, to = 3, by = 0.1)) |>
p11 ggplot(aes(x = x, y = (dt(x, 3) / max(dt(x, 3))))) +
geom_area(fill = pp[9]) +
annotate(geom = "text",
x = 0, y = 0.2,
label = "student t",
color = pp[4], size = 7) +
annotate(geom = "text",
x = c(-1.4, 0), y = 0.6,
label = c("nu", "mu[italic(i)]"),
color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0)) +
theme_void() +
theme(axis.line.x = element_line(color = pp[4], linewidth = 0.5))
# Log sigma
<- tibble(x = 0.65,
p12 y = 0.6,
label = "log~sigma[italic(j)*'('*italic(i)*')']") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(hjust = 0, color = pp[4], family = "Times", parse = TRUE, size = 7) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 2)) +
ylim(0, 1) +
theme_void()
# Two annotated arrows
<- tibble(x = 0.15,
p13 y = c(1, 0.47),
xend = c(0.15, 0.72),
yend = c(0.75, 0.1)) |>
ggplot(aes(x = x, xend = xend,
y = y, yend = yend)) +
geom_segment(arrow = my_arrow, color = pp[4]) +
annotate(geom = "text",
x = c(0.1, 0.15, 0.28), y = c(0.92, 0.64, 0.22),
label = c("'~'", "nu*minute+1", "'='"),
color = pp[4], family = "Times", parse = TRUE, size = c(10, 7, 10)) +
xlim(0, 1) +
theme_void()
# One annotated arrow
<- tibble(x = 0.38,
p14 y = 0.65,
label = "'='") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 10) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0.15,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Another annotated arrow
<- tibble(x = c(0.58, 0.71),
p15 y = 0.6,
label = c("'~'", "italic(j)")) |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = c(10, 7)) +
geom_segment(x = 0.85, xend = 0.42,
y = 1, yend = 0.18,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# The final annotated arrow
<- tibble(x = c(0.375, 0.625),
p16 y = c(1/3, 1/3),
label = c("'~'", "italic(i)")) |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = c(10, 7)) +
geom_segment(x = 0.5, xend = 0.5,
y = 1, yend = 0,
arrow = my_arrow, color = pp[4]) +
xlim(0, 1) +
theme_void()
# Some text
<- tibble(x = 0.5,
p17 y = 0.5,
label = "italic(y[i])") |>
ggplot(aes(x = x, y = y, label = label)) +
geom_text(color = pp[4], family = "Times", parse = TRUE, size = 7) +
xlim(0, 1) +
theme_void()
# Define the layout
<- c(
layout area(t = 1, b = 1, l = 8, r = 9),
area(t = 2, b = 3, l = 8, r = 9),
area(t = 3, b = 4, l = 3, r = 5),
area(t = 3, b = 4, l = 7, r = 9),
area(t = 3, b = 4, l = 11, r = 13),
area(t = 3, b = 4, l = 15, r = 17),
area(t = 6, b = 7, l = 1, r = 3),
area(t = 6, b = 7, l = 5, r = 9),
area(t = 6, b = 7, l = 11, r = 13),
area(t = 5, b = 6, l = 3, r = 17),
area(t = 10, b = 11, l = 6, r = 8),
area(t = 10, b = 11, l = 7, r = 9),
area(t = 8, b = 10, l = 1, r = 8),
area(t = 8, b = 10, l = 6, r = 8),
area(t = 8, b = 10, l = 6, r = 13),
area(t = 12, b = 12, l = 6, r = 8),
area(t = 13, b = 13, l = 6, r = 8))
# Combine, adjust, and display
+ p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9 + p10 + p11 + p12 + p13 + p14 + p15 + p16 + p17) +
(p1 plot_layout(design = layout) &
ylim(0, 1) &
theme(plot.margin = margin(0, 5.5, 0, 5.5))
Since we’re modeling sd(my_data$y) |> log()
and a reasonable spread like 1. We can simulate a little to get a sense of what those distributions look like.
<- 1e3
n_draws
set.seed(19)
tibble(prior = rnorm(n_draws, mean = log(1), sd = 1)) |>
mutate(prior_exp = exp(prior)) |>
pivot_longer(cols = everything()) |>
ggplot(aes(x = value)) +
stat_dots(slab_color = pp[5], slab_fill = pp[5]) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)),
limits = c(0, NA)) +
facet_wrap(~ name, scales = "free")
Here’s what is looks like with sd = 2
.
set.seed(19)
tibble(prior = rnorm(n_draws, mean = log(1), sd = 2)) |>
mutate(prior_exp = exp(prior)) |>
ggplot(aes(x = prior_exp)) +
stat_dots(slab_color = pp[5], slab_fill = pp[5]) +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(NULL, breaks = NULL, expand = expansion(mult = c(0, 0.05)),
limits = c(0, NA)) +
coord_cartesian(xlim = c(0, 17))
Though we’re still peaking around 1, there’s more mass in the tail, making it easier for the likelihood to pull away from the prior mode.
But all this is the prior on the fixed effect, the grand mean of
Consider we have data my_data
for which our primary variable of interest is y
. Starting from preparing our stanvars
values, here’s what the model code might look like.
# Get ready for `stanvars`
<- mean(my_data$y)
mean_y <- sd(my_data$y)
sd_y
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)
s_r
# Define `stanvars`
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta") +
stanvar(1/29, name = "one_over_twentynine")
# Fit the model
<- brm(
fit data = my_data,
family = student,
bf(Longevity ~ 1 + (1 | CompanionNumber),
~ 1 + (1 | CompanionNumber)),
sigma prior = c(# Grand means
prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(normal(log(sd_y), 1), class = Intercept, dpar = sigma),
# The priors controlling the spread for our hierarchical deflections
prior(gamma(alpha, beta), class = sd),
prior(normal(0, 1), class = sd, dpar = sigma),
# Don't forget our student-t nu
prior(exponential(one_over_twentynine), class = nu)),
stanvars = stanvars)
19.5.1 Example: Contrast of means with different variances.
Let’s load and take a look at Kruschke’s simulated group data.
<- read_csv("data.R/NonhomogVarData.csv")
my_data
head(my_data)
# A tibble: 6 × 2
Group Y
<chr> <dbl>
1 A 97.8
2 A 99.9
3 A 92.4
4 A 96.9
5 A 101.
6 A 80.7
Here are the means and Group
.
|>
my_data group_by(Group) |>
summarise(mean = mean(Y),
sd = sd(Y))
# A tibble: 4 × 3
Group mean sd
<chr> <dbl> <dbl>
1 A 97 8.00
2 B 99 1.00
3 C 102 1.00
4 D 104. 8
First we’ll fit the model with homogeneous variances. To keep things simple, here we’ll fit a conventional model following the form of our original fit1
. Here are our stanvars
.
<- mean(my_data$Y)) (mean_y
[1] 100.5
<- sd(my_data$Y)) (sd_y
[1] 6.228965
<- sd_y / 2
omega <- 2 * sd_y
sigma
<- gamma_a_b_from_omega_sigma(mode = omega, sd = sigma)) (s_r
$shape
[1] 1.283196
$rate
[1] 0.09092861
# Define `stanvars`
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta")
Now fit the ANOVA-like homogeneous-variances model.
.6 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + (1 | Group),
Y prior = c(prior(normal(mean_y, sd_y * 10), class = Intercept),
prior(gamma(alpha, beta), class = sd),
prior(cauchy(0, sd_y), class = sigma)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.995),
stanvars = stanvars,
file = "fits/fit19.06")
Here’s the model summary.
print(fit19.6)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Y ~ 1 + (1 | Group)
Data: my_data (Number of observations: 96)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~Group (Number of levels: 4)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 4.93 3.36 1.51 14.21 1.00 2024 2803
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 100.52 3.06 94.32 106.85 1.00 2045 1994
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 5.76 0.44 4.98 6.70 1.00 6152 5832
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s get ready to make our version of the top of Figure 19.7. First we wrangle.
# How many model-implied Gaussians would you like?
<- 20
n_draws
<- my_data |>
densities distinct(Group) |>
add_epred_draws(fit19.6, ndraws = n_draws, seed = 19, dpar = c("mu", "sigma")) |>
mutate(ll = qnorm(0.025, mean = mu, sd = sigma),
ul = qnorm(0.975, mean = mu, sd = sigma)) |>
mutate(Y = map2(.x = ll, .y = ul,
.f = \(ll, ul) seq(from = ll, to = ul, length.out = 100))) |>
unnest(Y) |>
mutate(density = dnorm(Y, mean = mu, sd = sigma)) |>
group_by(.draw) |>
mutate(density = density / max(density))
glimpse(densities)
Rows: 8,000
Columns: 12
Groups: .draw [20]
$ Group <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ .draw <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ .epred <dbl> 98.34888, 98.34888, 98.34888, 98.34888, 98.34888, 98.34888,…
$ mu <dbl> 98.34888, 98.34888, 98.34888, 98.34888, 98.34888, 98.34888,…
$ sigma <dbl> 5.570203, 5.570203, 5.570203, 5.570203, 5.570203, 5.570203,…
$ ll <dbl> 87.43148, 87.43148, 87.43148, 87.43148, 87.43148, 87.43148,…
$ ul <dbl> 109.2663, 109.2663, 109.2663, 109.2663, 109.2663, 109.2663,…
$ Y <dbl> 87.43148, 87.65203, 87.87259, 88.09314, 88.31369, 88.53425,…
$ density <dbl> 0.1465288, 0.1582290, 0.1705958, 0.1836410, 0.1973740, 0.21…
In our wrangling code, the main thing to notice is those last two lines. If you look closely to Kruschke’s Gaussians, you’ll notice they all have the same maximum height. Up to this point, ours haven’t. This has to do with technicalities on how densities are scaled. In brief, the wider densities have been shorter. So those last two lines scaled all the densities within the same group to the same metric. Otherwise the code was business as usual.
Anyway, here’s our version of the top panel of Figure 19.7.
|>
densities ggplot(aes(x = Y, y = Group)) +
geom_ridgeline(aes(height = density, group = interaction(Group, .draw)),
color = adjustcolor(pp[7], alpha.f = 2/3), fill = NA,
linewidth = 1/3, scale = 3/4) +
geom_jitter(data = my_data,
alpha = 3/4, color = pp[10], height = 0.04) +
scale_x_continuous(breaks = seq(from = 80, to = 120, by = 10)) +
labs(y = NULL,
title = "Data with Posterior Predictive Distrib.") +
coord_cartesian(xlim = c(75, 125),
ylim = c(1.25, 4.5)) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
Here are the difference distributions in the middle of Figure 19.7.
<- as_draws_df(fit19.6)
draws
<- draws |>
differences transmute(`D vs A` = `r_Group[D,Intercept]` - `r_Group[A,Intercept]`,
`C vs B` = `r_Group[C,Intercept]` - `r_Group[B,Intercept]`)
|>
differences pivot_longer(cols = everything()) |>
mutate(name = factor(name, levels = c("D vs A", "C vs B"))) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(x = "Difference") +
facet_wrap(~ name, scales = "free_x")
Now here are the effect sizes at the bottom of the figure.
|>
differences mutate_all(.funs = ~ . / draws$sigma) |>
pivot_longer(cols = everything()) |>
mutate(name = factor(name, levels = c("D vs A", "C vs B"))) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
labs(x = "Effect Size") +
facet_wrap(~ name, scales = "free_x")
Oh and remember, if you’d like to get all the possible contrasts in bulk, tidybayes has got your back.
.6 |>
fit19spread_draws(r_Group[Group,]) |>
compare_levels(r_Group, by = Group) |>
# These next two lines allow us to reorder the contrasts along the y
ungroup() |>
mutate(Group = reorder(Group, r_Group)) |>
ggplot(aes(x = r_Group, y = Group)) +
geom_vline(xintercept = 0, color = pp[12]) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
labs(x = "Contrast",
y = NULL) +
coord_cartesian(ylim = c(1.5, 6.5)) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
But to get back on track, here are the stanvars
for the robust hierarchical variances model.
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(sd_y, name = "sd_y") +
stanvar(s_r$shape, name = "alpha") +
stanvar(s_r$rate, name = "beta") +
stanvar(1/29, name = "one_over_twentynine")
Now fit that robust better-than-ANOVA model.
.7 <- brm(
fit19data = my_data,
family = student,
bf(Y ~ 1 + (1 | Group),
~ 1 + (1 | Group)),
sigma prior = c(# Grand means
prior(normal(mean_y, sd_y * 10), class = Intercept),
prior(normal(log(sd_y), 1), class = Intercept, dpar = sigma),
# The priors controlling the spread for our hierarchical deflections
prior(gamma(alpha, beta), class = sd),
prior(normal(0, 1), class = sd, dpar = sigma),
# Don't forget our student-t nu
prior(exponential(one_over_twentynine), class = nu)),
iter = 4000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
control = list(adapt_delta = 0.99, max_treedepth = 12),
stanvars = stanvars,
file = "fits/fit19.07")
The chains look good.
plot(fit19.7, widths = 2:3)
Here’s the parameter summary.
print(fit19.7)
Family: student
Links: mu = identity; sigma = log; nu = identity
Formula: Y ~ 1 + (1 | Group)
sigma ~ 1 + (1 | Group)
Data: my_data (Number of observations: 96)
Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
total post-warmup draws = 12000
Multilevel Hyperparameters:
~Group (Number of levels: 4)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 4.61 3.20 1.35 13.21 1.00 3244 5062
sd(sigma_Intercept) 1.21 0.39 0.64 2.15 1.00 4487 5023
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 100.49 2.82 94.60 106.22 1.00 4091 3762
sigma_Intercept 1.21 0.52 0.22 2.34 1.00 4236 5758
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
nu 32.17 28.26 4.38 109.51 1.00 10070 7467
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Let’s get ready to make our version of the top of Figure 19.7. First we wrangle.
<- my_data |>
densities distinct(Group) |>
add_epred_draws(fit19.7, ndraws = n_draws, seed = 19, dpar = c("mu", "sigma", "nu")) |>
mutate(ll = qt(0.025, df = nu),
ul = qt(0.975, df = nu)) |>
mutate(Y = map2(.x = ll, .y = ul,
.f = \(ll, ul) seq(from = ll, to = ul, length.out = 100))) |>
unnest(Y) |>
mutate(density = dt(Y, df = nu)) |>
# Notice the conversion
mutate(Y = mu + Y * sigma) |>
group_by(.draw) |>
mutate(density = density / max(density))
glimpse(densities)
Rows: 8,000
Columns: 13
Groups: .draw [20]
$ Group <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
$ .row <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ .chain <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ .iteration <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ .draw <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ .epred <dbl> 95.8735, 95.8735, 95.8735, 95.8735, 95.8735, 95.8735, 95.87…
$ mu <dbl> 95.8735, 95.8735, 95.8735, 95.8735, 95.8735, 95.8735, 95.87…
$ sigma <dbl> 5.250768, 5.250768, 5.250768, 5.250768, 5.250768, 5.250768,…
$ nu <dbl> 6.2418, 6.2418, 6.2418, 6.2418, 6.2418, 6.2418, 6.2418, 6.2…
$ ll <dbl> -2.424117, -2.424117, -2.424117, -2.424117, -2.424117, -2.4…
$ ul <dbl> 2.424117, 2.424117, 2.424117, 2.424117, 2.424117, 2.424117,…
$ Y <dbl> 83.14503, 83.40217, 83.65931, 83.91645, 84.17359, 84.43073,…
$ density <dbl> 0.09054714, 0.09720163, 0.10433744, 0.11198520, 0.12017662,…
If you look closely at our code, above, you’ll note switching from the Gaussian to the Student qnorm()
and dnorm()
to qt()
and dt()
, respectively. The base R Student mutate()
functions, the computations were all based on the standard Student mutate()
.
Now we’re ready to make and save our version of the top panel of Figure 19.7.
<- densities |>
p1 ggplot(aes(x = Y, y = Group)) +
geom_ridgeline(aes(height = density, group = interaction(Group, .draw)),
color = adjustcolor(pp[7], alpha.f = 2/3), fill = NA,
linewidth = 1/3, scale = 3/4) +
geom_jitter(data = my_data,
alpha = 3/4, color = pp[10], height = 0.04) +
scale_x_continuous(breaks = seq(from = 80, to = 120, by = 10)) +
labs(y = NULL,
title = "Data with Posterior Predictive Distrib.") +
coord_cartesian(xlim = c(75, 125),
ylim = c(1.25, 4.5)) +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank())
Here we make the difference distributions in the middle of Figure 19.8.
<- as_draws_df(fit19.7)
draws
<- draws |>
p2 transmute(`D vs A` = `r_Group[D,Intercept]` - `r_Group[A,Intercept]`,
`C vs B` = `r_Group[C,Intercept]` - `r_Group[B,Intercept]`) |>
pivot_longer(cols = everything()) |>
mutate(name = factor(name, levels = c("D vs A", "C vs B"))) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Difference") +
facet_wrap(~ name, scales = "free_x")
And here we make the plots of the corresponding effect sizes at the bottom of the Figure 19.8.
# First compute and save the sigma_j's, which will come in handy later
<- draws |>
draws mutate(sigma_A = exp(b_sigma_Intercept + `r_Group__sigma[A,Intercept]`),
sigma_B = exp(b_sigma_Intercept + `r_Group__sigma[B,Intercept]`),
sigma_C = exp(b_sigma_Intercept + `r_Group__sigma[C,Intercept]`),
sigma_D = exp(b_sigma_Intercept + `r_Group__sigma[D,Intercept]`))
<- draws |>
p3 # Note we're using pooled standard deviations to standardize our effect sizes, here
transmute(`D vs A` = (`r_Group[D,Intercept]` - `r_Group[A,Intercept]`) / sqrt((sigma_A^2 + sigma_D^2) / 2),
`C vs B` = (`r_Group[C,Intercept]` - `r_Group[B,Intercept]`) / sqrt((sigma_B^2 + sigma_C^2) / 2)) |>
pivot_longer(cols = everything()) |>
mutate(name = factor(name, levels = c("D vs A", "C vs B"))) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Effect Size") +
facet_wrap(~ name, scales = "free_x")
Combine them all and plot!
/ p2 / p3 + plot_layout(heights = c(2, 1, 1)) p1
Notice that because (a) the sigma parameters were heterogeneous and (b) they were estimated on the log scale, we had to do quite a bit more data processing before they effect size estimates were ready.
“Finally, because each group has its own estimated scale (i.e.,
# Recall we computed the sigma_j's a couple blocks up.
# Now we put them to use
|>
draws transmute(`D vs A` = sigma_D - sigma_A,
`C vs B` = sigma_C - sigma_B,
`D vs C` = sigma_D - sigma_C,
`B vs A` = sigma_B - sigma_A) |>
pivot_longer(cols = everything()) |>
mutate(name = factor(name, levels = c("D vs A", "C vs B", "D vs C", "B vs A"))) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[4], quantiles = 100,
slab_fill = pp[5], slab_size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(Differences~'in'~sigma[italic(j)])) +
facet_wrap(~ name, ncol = 4, scales = "free")
For more on models including a hierarchical structure on both the mean and scale structures, check out Donald Williams and colleagues’ work on what they call Mixed Effect Location and Scale Models [MELSM; e.g., Donald R. Williams et al. (2021); Donald R. Williams et al. (2019)]. They’re quite handy and I’ve begun using them in my applied work (e.g., here). You can also find a brief introduction to MELSM’s within the context of the multilevel growth model in Section 14.6 of my (2023) translation of McElreath’s (2020) second edition.
19.6 Exercises Walk out an effect size
We computed a lot of effect sizes in this chapter. They were all standardized mean differences. Cohen (1988) discussed these kinds of effect sizes in this way:
We need a “pure” number, one free of our original measurement unit, with which to index what can be alternately called the degree of departure from the null hypothesis of the alternate hypothesis, or the ES (effect size) we wish to detect. This is accomplished by standardizing the raw effect size as expressed in the measurement unit of the dependent variable by dividing it by the (common) standard deviation of the measures in their respective populations, the latter also in the original measurement unit. (p. 20)
Though Cohen framed his discussion in terms of null-hypothesis significance testing, we can just as easily apply it to our Bayesian modeling framework. The main thing is we can use his definitions from above to define a particular kind of effect size–the standardized mean difference between two groups. This is commonly referred to as a Cohen’s
where the unstandardized means of the variable of interest
Let’s walk this out with the fruit-fly data from Section 19.3.2.
<- read_csv("data.R/FruitflyDataReduced.csv")
my_data
glimpse(my_data)
Rows: 125
Columns: 3
$ Longevity <dbl> 35, 37, 49, 46, 63, 39, 46, 56, 63, 65, 56, 65, 70, 63…
$ CompanionNumber <chr> "Pregnant8", "Pregnant8", "Pregnant8", "Pregnant8", "P…
$ Thorax <dbl> 0.64, 0.68, 0.68, 0.72, 0.72, 0.76, 0.76, 0.76, 0.76, …
Recall we have five groups indexed by CompanionNumber
, each with
|>
my_data count(CompanionNumber)
# A tibble: 5 × 2
CompanionNumber n
<chr> <int>
1 None0 25
2 Pregnant1 25
3 Pregnant8 25
4 Virgin1 25
5 Virgin8 25
Let’s focus on just two groups, the male fruit flies for which individual males were supplied access to one or with virgin female fruit flies per day. In the data, these are CompanionNumber == Virgin1
and CompanionNumber == Virgin8
, respectively. Here’s a look at their mean Longevity
values.
|>
my_data filter(str_detect(CompanionNumber, "Virgin")) |>
group_by(CompanionNumber) |>
summarise(mean = mean(Longevity))
# A tibble: 2 × 2
CompanionNumber mean
<chr> <dbl>
1 Virgin1 56.8
2 Virgin8 38.7
If we’re willing to treat the males in the Virgin1
group as group “a” and those in the Virgin8
group as group “b”, we can save those mean values like so.
<- filter(my_data, CompanionNumber == "Virgin1") |> summarise(y_bar = mean(Longevity)) |> pull()
y_bar_a <- filter(my_data, CompanionNumber == "Virgin8") |> summarise(y_bar = mean(Longevity)) |> pull() y_bar_b
Now we’ll compute their pooled standard deviation.
|>
my_data filter(str_detect(CompanionNumber, "Virgin")) |>
group_by(CompanionNumber) |>
summarise(s = sd(Longevity)) |>
summarise(s_p = sqrt(sum(s^2) / 2))
# A tibble: 1 × 1
s_p
<dbl>
1 13.6
Save that value.
<- filter(my_data, CompanionNumber == "Virgin1") |> summarise(s = sd(Longevity)) |> pull()
s_a <- filter(my_data, CompanionNumber == "Virgin8") |> summarise(s = sd(Longevity)) |> pull()
s_b
<- sqrt((s_a^2 + s_b^2) / 2) s_p
If you’re confused by how we computed the pooled standard deviation, recall that when comparing two groups that may have different group-level standard deviations, the formula for the
where
- y_bar_b) / s_p (y_bar_a
[1] 1.327554
Though I’m not up on contemporary standards in fruit fly research, a Cohen’s
As I hinted at earlier, the problem with this approach is it ignores uncertainty. Frequentists use various formulas to express this in terms of 95% confidence intervals. Our approach will be to express it with the posterior distribution of a Bayesian model. We’ve already accomplished this with our fit19.1
from above. Here we’ll use three other approaches.
Instead of the Bayesian hierarchical alternative to the frequentist ANOVA, we can use a single-level model where we predict a metric variable with separate intercepts for the two levels of CompanionNumber
. First, we subset the data and define our stanvars
.
<- my_data |>
my_data filter(str_detect(CompanionNumber, "Virgin"))
<- (y_bar_a + y_bar_b) / 2
mean_y
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(s_p, name = "sd_y")
Fit the model with brm()
.
.8 <- brm(
fit19data = my_data,
family = gaussian,
~ 0 + CompanionNumber,
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = b),
prior(cauchy(0, sd_y), class = sigma)),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.08")
Check the summary.
print(fit19.8)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 0 + CompanionNumber
Data: my_data (Number of observations: 50)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
CompanionNumberVirgin1 56.75 2.73 51.52 62.03 1.00 7560
CompanionNumberVirgin8 38.76 2.72 33.40 44.09 1.00 7631
Tail_ESS
CompanionNumberVirgin1 5999
CompanionNumberVirgin8 5614
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 13.80 1.47 11.31 17.00 1.00 7616 6045
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Extract the posterior draws.
<- as_draws_df(fit19.8) draws
Here we’ll plot the three dimensions of the posterior, each with the corresponding value from the Cohen’s
<- tibble(name = c("b_CompanionNumberVirgin1", "b_CompanionNumberVirgin8", "sigma"),
lines value = c(y_bar_a, y_bar_b, s_p))
|>
draws pivot_longer(cols = b_CompanionNumberVirgin1:sigma) |>
ggplot(aes(x = value)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(data = lines,
aes(xintercept = value),
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("posterior") +
facet_wrap(~ name, scales = "free")
The model did a great job capturing all three parameters. If we would like to compute our Cohen’s fit19.8
, we’d execute something like this.
|>
draws mutate(d = (b_CompanionNumberVirgin1 - b_CompanionNumberVirgin8) / sigma) |>
ggplot(aes(x = d)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = (y_bar_a - y_bar_b) / s_p,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression("Cohen's"~italic(d)~"expressed as a posterior"))
Similar to the previous plots, this time we superimposed the posterior density with the sample estimate for (y_bar_a - y_bar_b) / s_p
. Happily, the hand-calculated estimate coheres nicely with the central tendency of our posterior distribution. But now we get a full measure of uncertainty. Notice how wide those 95% HDIs are. Hopefully this isn’t a surprise given our noncommittal priors and only
A second way we might use a single-level model to compute a Cohen’s CompanionNumber
into a binary variable Virgin1
for which 1 corresponds to CompanionNumber == Virgin1
and 0 corresponds to CompanionNumber == Virgin8
. Compute the dummy.
<- my_data |>
my_data mutate(Virgin1 = if_else(CompanionNumber == "Virgin1", 1, 0))
Now fit the dummy-predictor model with brm()
.
.9 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + Virgin1,
Longevity prior = c(prior(normal(mean_y, sd_y * 5), class = Intercept),
prior(normal(0, sd_y * 5), class = b),
prior(cauchy(0, sd_y), class = sigma)),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.09")
print(fit19.9)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Longevity ~ 1 + Virgin1
Data: my_data (Number of observations: 50)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 38.72 2.78 33.23 44.14 1.00 6985 6060
Virgin1 18.03 3.92 10.29 25.78 1.00 6964 6033
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 13.80 1.42 11.29 16.82 1.00 7341 6131
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
With this parameterization, our posterior for Intercept
is the same, within simulation variation, as CompanionNumberVirgin8
from fit7
. The posterior for sigma
is about the same for both models, too. But focus on Virgin1
. This is the unstandardized mean difference, what we called
<- as_draws_df(fit19.9)
draws
|>
draws ggplot(aes(x = b_Virgin1)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = y_bar_a - y_bar_b,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("Unstandardized mean difference")
Here’s how to standardize that unstandardized effect size into a Cohen’s-
|>
draws mutate(d = b_Virgin1 / sigma) |>
ggplot(aes(x = d)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = (y_bar_a - y_bar_b) / s_p,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression("Cohen's"~italic(d)~"expressed as a posterior"))
Let’s work this one more way. By simple algebra, a standardized mean difference is the same as the difference between two standardized means. The trick, though, is you have to use the pooled standard deviation (Longevity
before fitting the model and continue using the dummy variable approach, the Virgin1
posterior will be the same as a Cohen’s
Standardize the criterion with s_p
.
<- my_data |>
my_data mutate(Longevity_s = (Longevity - mean(Longevity)) / s_p)
Because our criterion in a standardized metric, we no longer need our stanvars
.
.10 <- brm(
fit19data = my_data,
family = gaussian,
~ 1 + Virgin1,
Longevity_s prior = c(prior(normal(0, 1 * 5), class = Intercept),
prior(normal(0, 1 * 5), class = b),
prior(cauchy(0, 1), class = sigma)),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
file = "fits/fit19.10")
Behold our out-of-the-box Bayesian Cohen’s
# No transformation necessary
as_draws_df(fit19.10) |>
ggplot(aes(x = b_Virgin1)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = (y_bar_a - y_bar_b) / s_p,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression("Cohen's"~italic(d)~"expressed as a posterior"))
If you work them through, the three approaches we just took can be generalized to models with more than two groups. You just need to be careful how to compute the
It’s also the case the that standardized mean differences we computed for fit19.1
, above, are not quite Cohen’s
19.6.1 Your sample sizes may differ.
In the examples, above, the two groups had equal sample sizes, which allowed us to be lazy with how we hand computed the sample estimate of the pooled standard deviation. When working with data for which
which strategically weights the sample estimate for the pooled standard deviation by sample size.
We should practice with TwoGroupIQ
data back in Section 16.1.2. Let’s load those data, again.
<- read_csv("data.R/TwoGroupIQ.csv")
my_data
glimpse(my_data)
Rows: 120
Columns: 2
$ Score <dbl> 102, 107, 92, 101, 110, 68, 119, 106, 99, 103, 90, 93, 79, 89, 1…
$ Group <chr> "Smart Drug", "Smart Drug", "Smart Drug", "Smart Drug", "Smart D…
The data are IQ scores for participants in two groups. They look like this.
|>
my_data ggplot(aes(x = Score, Group)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
xlab("IQ score") +
coord_cartesian(ylim = c(1.5, 2.25))
Unlike the examples from the last section, the samples sizes are different for our two levels of Group
.
|>
my_data count(Group)
# A tibble: 2 × 2
Group n
<chr> <int>
1 Placebo 57
2 Smart Drug 63
Here’s how we can use that information to hand compute the sample estimate for Cohen’s
# Save the sample means for the groups
<- filter(my_data, Group == "Smart Drug") |> summarise(m = mean(Score)) |> pull()
y_bar_a <- filter(my_data, Group == "Placebo") |> summarise(m = mean(Score)) |> pull()
y_bar_b
# Save the sample sizes for the groups
<- filter(my_data, Group == "Smart Drug") |> count() |> pull()
n_a <- filter(my_data, Group == "Placebo") |> count() |> pull()
n_b
# Save the sample standard deviations
<- filter(my_data, Group == "Smart Drug") |> summarise(s = sd(Score)) |> pull()
s_a <- filter(my_data, Group == "Placebo") |> summarise(s = sd(Score)) |> pull()
s_b
# Compute and save the sample pooled standard deviation
<- sqrt(((n_a - 1) * s_a^2 + (n_b - 1) * s_b^2) / (n_a + n_b - 2))
s_p
# Compute Cohen's d
- y_bar_b) / s_p (y_bar_a
[1] 0.3518743
Although it’s a lot of work to compute a sample-size corrected Cohen’s Score ~ 0 + Group
syntax.
<- (y_bar_a + y_bar_b) / 2
mean_y
<- stanvar(mean_y, name = "mean_y") +
stanvars stanvar(s_p, name = "sd_y")
.11 <- brm(
fit19data = my_data,
family = gaussian,
~ 0 + Group,
Score prior = c(prior(normal(mean_y, sd_y * 5), class = b),
prior(cauchy(0, sd_y), class = sigma)),
iter = 3000, warmup = 1000, chains = 4, cores = 4,
seed = 19,
stanvars = stanvars,
file = "fits/fit19.11")
Review the model summary.
print(fit19.11)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Score ~ 0 + Group
Data: my_data (Number of observations: 120)
Draws: 4 chains, each with iter = 3000; warmup = 1000; thin = 1;
total post-warmup draws = 8000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
GroupPlacebo 100.10 2.97 94.23 105.93 1.00 7469 5932
GroupSmartDrug 107.79 2.79 102.32 113.30 1.00 7323 5908
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 22.33 1.50 19.66 25.53 1.00 7406 5734
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The s_p
value superimposed as a dashed line.
as_draws_df(fit19.11) |>
ggplot(aes(x = sigma)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = s_p,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(sigma[italic(p)]))
Nailed it! Now here’s how we might use the posterior samples to then compute the standardized mean difference.
as_draws_df(fit19.11) |>
mutate(d = (b_GroupSmartDrug - b_GroupPlacebo) / sigma) |>
ggplot(aes(x = d)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = (y_bar_a - y_bar_b) / s_p,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression((mu[italic(B)]-mu[italic(A)])/sigma[italic(p)]*", where "*sigma[italic(p)]%~~%sqrt(((italic(n[A])-1)*italic(s)[italic(A)]^2+(italic(n[B])-1)*italic(s)[italic(B)]^2)/(italic(n[A])+italic(n[B])-2)))) +
theme(axis.title.x = element_text(size = 7))
Sometimes fitting a model is easier than computing estimates, by hand.
19.6.2 Populations and samples.
You may have noticed that in our equation for
where
where
In the case of our IQ score data from the last section, we actually know the population mean and standard deviation for IQ are 100 and 15, respectively. We know this because the people who make IQ tests design them that way. Let’s see how well our sample statistics approximate the population parameters.
|>
my_data group_by(Group) |>
summarise(mean = mean(Score),
sd = sd(Score))
# A tibble: 2 × 3
Group mean sd
<chr> <dbl> <dbl>
1 Placebo 100. 17.9
2 Smart Drug 108. 25.4
# Pooled standard deviation
s_p
[1] 22.18458
Unsurprisingly, the values for the Smart Drug
group are notably different from the population parameters. But notice how close the values from the Placebo
group are to the population parameters. If they weren’t, we’d be concerned the Placebo
condition was not a valid control. Looks like it was.
However, notice that the mean and standard deviation for the Placebo
group are not the exact values of 100 and 15 the way they are in the population. If we wanted to compute a standardized mean difference between our Smart Drug
group and folks in the population, we could just plug those values directly into our effect size equation. Here’s what that would look like if we plug in the population mean for the control group.
- 100) / s_p (y_bar_a
[1] 0.3534559
The result is very close to the one above. But this time our equation for
where we used the population mean Placebo
control as a stand-in for the population, this is a more precise way to compute
- 100) / 15 (y_bar_a
[1] 0.5227513
Now our hand-computed estimate for
Here’s what this looks like if we work with the posterior from the last model, fit19.11
.
as_draws_df(fit19.11) |>
mutate(d = (b_GroupSmartDrug - 100) / 15) |>
ggplot(aes(x = d)) +
stat_dotsinterval(point_interval = mode_hdi, .width = 0.95,
color = pp[7], quantiles = 100,
slab_fill = pp[2], slab_size = 0) +
geom_vline(xintercept = (y_bar_a - 100) / 15,
color = pp[13], linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression((mu[italic(B)]-100)/15))
Note how we’ve now changed our effect size formula to
19.6.3 Report your effect sizes effectively.
Wrapping up, we’ve been practicing computing standardized mean differences (
Confusingly, you might see all these variants, and more, referred to as Cohen’s
Though we’ve followed Kruschke’s lead and focused on the Cohen’s
Session info
sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.4
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/Chicago
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidybayes_3.0.7 bayesplot_1.11.1 brms_2.22.0 Rcpp_1.0.14
[5] patchwork_1.3.0 ggridges_0.5.6 palettetown_0.1.1 lubridate_1.9.3
[9] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4
[13] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[17] tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 svUnit_1.0.6 farver_2.1.2
[4] loo_2.8.0 fastmap_1.1.1 TH.data_1.1-2
[7] tensorA_0.36.2.1 digest_0.6.37 estimability_1.5.1
[10] timechange_0.3.0 lifecycle_1.0.4 StanHeaders_2.32.10
[13] survival_3.8-3 magrittr_2.0.3 posterior_1.6.0
[16] compiler_4.4.3 rlang_1.1.5 tools_4.4.3
[19] utf8_1.2.4 yaml_2.3.8 knitr_1.49
[22] labeling_0.4.3 bridgesampling_1.1-2 htmlwidgets_1.6.4
[25] curl_6.0.1 pkgbuild_1.4.4 bit_4.0.5
[28] plyr_1.8.9 multcomp_1.4-26 abind_1.4-8
[31] withr_3.0.2 grid_4.4.3 stats4_4.4.3
[34] xtable_1.8-4 colorspace_2.1-1 inline_0.3.19
[37] emmeans_1.10.3 scales_1.3.0 MASS_7.3-64
[40] cli_3.6.4 mvtnorm_1.2-5 rmarkdown_2.29
[43] crayon_1.5.3 generics_0.1.3 RcppParallel_5.1.7
[46] rstudioapi_0.16.0 reshape2_1.4.4 tzdb_0.4.0
[49] rstan_2.32.6 splines_4.4.3 parallel_4.4.3
[52] matrixStats_1.4.1 vctrs_0.6.5 V8_4.4.2
[55] Matrix_1.7-2 sandwich_3.1-1 jsonlite_1.8.9
[58] hms_1.1.3 arrayhelpers_1.1-0 bit64_4.0.5
[61] ggdist_3.3.2 glue_1.8.0 codetools_0.2-20
[64] distributional_0.5.0 stringi_1.8.4 gtable_0.3.6
[67] QuickJSR_1.1.3 quadprog_1.5-8 munsell_0.5.1
[70] pillar_1.10.2 htmltools_0.5.8.1 Brobdingnag_1.2-9
[73] R6_2.6.1 vroom_1.6.5 evaluate_1.0.1
[76] lattice_0.22-6 backports_1.5.0 rstantools_2.4.0
[79] gridExtra_2.3 coda_0.19-4.1 nlme_3.1-167
[82] checkmate_2.3.2 xfun_0.49 zoo_1.8-12
[85] pkgconfig_2.0.3