6.2 An example: Sex discrimination in the workplace
Here we load a couple necessary packages, load the data, and take a glimpse()
.
library(tidyverse)
protest <- read_csv("data/protest/protest.csv")
glimpse(protest)
## Observations: 129
## Variables: 6
## $ subnum <int> 209, 44, 124, 232, 30, 140, 27, 64, 67, 182, 85, 109, 122, 69, 45, 28, 170, 66, 168, 97,...
## $ protest <int> 2, 0, 2, 2, 2, 1, 2, 0, 0, 0, 2, 2, 0, 1, 1, 0, 1, 2, 2, 1, 2, 1, 1, 2, 2, 0, 1, 1, 0, 1...
## $ sexism <dbl> 4.87, 4.25, 5.00, 5.50, 5.62, 5.75, 5.12, 6.62, 5.75, 4.62, 4.75, 6.12, 4.87, 5.87, 4.87...
## $ angry <int> 2, 1, 3, 1, 1, 1, 2, 1, 6, 1, 2, 5, 2, 1, 1, 1, 2, 1, 3, 4, 1, 1, 1, 5, 1, 5, 1, 1, 2, 1...
## $ liking <dbl> 4.83, 4.50, 5.50, 5.66, 6.16, 6.00, 4.66, 6.50, 1.00, 6.83, 5.00, 5.66, 5.83, 6.50, 4.50...
## $ respappr <dbl> 4.25, 5.75, 4.75, 7.00, 6.75, 5.50, 5.00, 6.25, 3.00, 5.75, 5.25, 7.00, 4.50, 6.25, 5.00...
Here are the ungrouped means and \(SD\)s for respappr
and liking
shown at the bottom of Table 6.1.
protest %>%
select(liking:respappr) %>%
gather(key, value) %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
## key mean sd
## <chr> <dbl> <dbl>
## 1 liking 5.64 1.05
## 2 respappr 4.87 1.35
We compute the summaries for respappr
and liking
, grouped by protest
, like so.
protest %>%
select(protest, liking:respappr) %>%
gather(key, value, -protest) %>%
group_by(protest, key) %>%
summarize(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 6 x 4
## # Groups: protest [3]
## protest key mean sd
## <int> <chr> <dbl> <dbl>
## 1 0 liking 5.31 1.30
## 2 0 respappr 3.88 1.46
## 3 1 liking 5.83 0.819
## 4 1 respappr 5.14 1.08
## 5 2 liking 5.75 0.936
## 6 2 respappr 5.49 0.936
It looks like Hayes has a typo in the \(SD\) for liking
when protest == 0
. It seemed he accidentally entered the value for when protest == 1
in that slot.
You’ll have to wait a minute to see where the adjusted \(Y\) values came from.
With a little ifelse()
, computing the dummies D1
and D2
is easy enough.
protest <-
protest %>%
mutate(D1 = ifelse(protest == 1, 1, 0),
D2 = ifelse(protest == 2, 1, 0))
We’re almost ready to fit the model. Let’s load brms.
library(brms)
This is the first time we’ve had a simple univariate regression model in a while. No special cbind()
syntax or multiple bf()
formulas. Just straight up brm()
.
model1 <-
brm(data = protest, family = gaussian,
liking ~ 1 + D1 + D2,
chains = 4, cores = 4)
fixef(model1)
## Estimate Est.Error Q2.5 Q97.5
## Intercept 5.3121491 0.1654625 4.984353592 5.6436028
## D1 0.5050451 0.2312916 0.051058290 0.9582424
## D2 0.4360147 0.2273677 0.002481348 0.8814708
Our \(R^2\) differes a bit from the OLS version in the text. This shouldn’t be surprising when it’s near the boundary.
bayes_R2(model1)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.05693229 0.03528896 0.005512156 0.1378399
Here’s its shape. For the plots in this chapter, we’ll take a few formatting cues from Edward Tufte, curtesy of the ggthemes package. The theme_tufte()
function will change the default font and remove some chart junk. We will take our color palette from Pokemon via the palettetown package.
library(ggthemes)
library(palettetown)
bayes_R2(model1, summary = F) %>%
as_tibble() %>%
ggplot(aes(x = R2)) +
geom_density(size = 0, fill = pokepal(pokemon = "plusle")[2]) +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = 0:1) +
xlab(expression(italic(R)^{2})) +
theme_tufte() +
theme(legend.title = element_blank(),
plot.background = element_rect(fill = pokepal(pokemon = "plusle")[8]))
To use the model-implied equations to compute the means for each group on the criterion, we’ll extract the posterior samples.
post <- posterior_samples(model1)
post %>%
transmute(Y_np = b_Intercept + b_D1*0 + b_D2*0,
Y_ip = b_Intercept + b_D1*1 + b_D2*0,
Y_cp = b_Intercept + b_D1*0 + b_D2*1) %>%
gather() %>%
# this line will order our output the same way Hayes did in the text (p. 197)
mutate(key = factor(key, levels = c("Y_np", "Y_ip", "Y_cp"))) %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value))
## # A tibble: 3 x 3
## key mean sd
## <fct> <dbl> <dbl>
## 1 Y_np 5.31 0.165
## 2 Y_ip 5.82 0.163
## 3 Y_cp 5.75 0.157
What Hayes called the “relative total effects” \(c_{1}\) and \(c_{2}\) are the D1
and D2
lines in our fixef()
output, above.
Here are the sub-models for the mediation model.
m_model <- bf(respappr ~ 1 + D1 + D2)
y_model <- bf(liking ~ 1 + D1 + D2 + respappr)
We fit.
model2 <-
brm(data = protest, family = gaussian,
m_model + y_model + set_rescor(FALSE),
chains = 4, cores = 4)
print(model2)
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: respappr ~ 1 + D1 + D2
## liking ~ 1 + D1 + D2 + respappr
## Data: protest (Number of observations: 129)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## respappr_Intercept 3.88 0.18 3.51 4.23 4000 1.00
## liking_Intercept 3.71 0.31 3.08 4.30 4000 1.00
## respappr_D1 1.27 0.26 0.75 1.77 4000 1.00
## respappr_D2 1.61 0.25 1.14 2.11 4000 1.00
## liking_D1 0.00 0.22 -0.42 0.44 4000 1.00
## liking_D2 -0.21 0.23 -0.68 0.25 3382 1.00
## liking_respappr 0.41 0.07 0.27 0.55 4000 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_respappr 1.18 0.07 1.05 1.34 4000 1.00
## sigma_liking 0.93 0.06 0.82 1.05 4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Behold the Bayesian \(R^2\) posteriors.
bayes_R2(model2, summary = F) %>%
as_tibble() %>%
gather() %>%
ggplot(aes(x = value, fill = key)) +
geom_density(size = 0, alpha = 2/3) +
annotate("text", x = .18, y = 6.75, label = "liking", color = pokepal(pokemon = "plusle")[2], family = "Times") +
annotate("text", x = .355, y = 6.75, label = "respappr", color = pokepal(pokemon = "plusle")[6], family = "Times") +
scale_y_continuous(NULL, breaks = NULL) +
scale_fill_manual(values = pokepal(pokemon = "plusle")[c(2, 6)]) +
coord_cartesian(xlim = 0:1) +
labs(title = expression(paste("The ", italic(R)^{2}, " densities overlap near perfectly, both hovering around .25.")),
x = NULL) +
theme_tufte() +
theme(legend.position = "none",
plot.background = element_rect(fill = pokepal(pokemon = "plusle")[8]))
To get the model summaries as presented in the second two columns in Table 6.2, we use posterior_samples()
, rename a bit, and summarize()
as usual.
post <-
posterior_samples(model2) %>%
mutate(a1 = b_respappr_D1,
a2 = b_respappr_D2,
b = b_liking_respappr,
c1_prime = b_liking_D1,
c2_prime = b_liking_D2,
i_m = b_respappr_Intercept,
i_y = b_liking_Intercept)
post %>%
select(a1:i_y) %>%
gather() %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 7 x 5
## key mean sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 a1 1.27 0.262 0.749 1.77
## 2 a2 1.62 0.251 1.14 2.11
## 3 b 0.411 0.072 0.273 0.554
## 4 c1_prime 0.001 0.222 -0.417 0.441
## 5 c2_prime -0.212 0.235 -0.677 0.25
## 6 i_m 3.88 0.181 3.51 4.23
## 7 i_y 3.71 0.311 3.08 4.30
Working with the \(\overline{M}_{ij}\) formulas in page 199 is quite similar to what we did above.
post %>%
transmute(M_np = b_respappr_Intercept + b_respappr_D1*0 + b_respappr_D2*0,
M_ip = b_respappr_Intercept + b_respappr_D1*1 + b_respappr_D2*0,
M_cp = b_respappr_Intercept + b_respappr_D1*0 + b_respappr_D2*1) %>%
gather() %>%
# this line will order our output the same way Hayes did in the text (p. 199)
mutate(key = factor(key, levels = c("M_np", "M_ip", "M_cp"))) %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value))
## # A tibble: 3 x 3
## key mean sd
## <fct> <dbl> <dbl>
## 1 M_np 3.88 0.181
## 2 M_ip 5.15 0.185
## 3 M_cp 5.49 0.178
The \(\overline{Y}^*_{ij}\) formulas are more of the same.
post <-
post %>%
mutate(Y_np = b_liking_Intercept + b_liking_D1*0 + b_liking_D2*0 + b_liking_respappr*mean(protest$respappr),
Y_ip = b_liking_Intercept + b_liking_D1*1 + b_liking_D2*0 + b_liking_respappr*mean(protest$respappr),
Y_cp = b_liking_Intercept + b_liking_D1*0 + b_liking_D2*1 + b_liking_respappr*mean(protest$respappr))
post %>%
select(starts_with("Y_")) %>%
gather() %>%
mutate(key = factor(key, levels = c("Y_np", "Y_ip", "Y_cp"))) %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value))
## # A tibble: 3 x 3
## key mean sd
## <fct> <dbl> <dbl>
## 1 Y_np 5.71 0.162
## 2 Y_ip 5.71 0.141
## 3 Y_cp 5.50 0.143
Note, these are where the adjusted \(Y\) values came from in Table 6.1.
This is as fine a spot as any to introduce coefficient plots. Both brms and the bayesplot package offer convenience functions for coefficient plots. It’s good to know how to make them by hand. Here’s ours for those last three \(\overline{Y}^*_{ij}\)-values.
post %>%
select(starts_with("Y_")) %>%
gather() %>%
ggplot(aes(x = key, y = value, color = key)) +
stat_summary(geom = "pointrange",
fun.y = median,
fun.ymin = function(x){quantile(x, probs = .025)},
fun.ymax = function(x){quantile(x, probs = .975)},
size = .75) +
stat_summary(geom = "linerange",
fun.ymin = function(x){quantile(x, probs = .25)},
fun.ymax = function(x){quantile(x, probs = .75)},
size = 1.5) +
scale_color_manual(values = pokepal(pokemon = "plusle")[c(3, 7, 9)]) +
coord_flip() +
theme_tufte() +
labs(x = NULL, y = NULL) +
theme(axis.ticks.y = element_blank(),
legend.position = "none",
plot.background = element_rect(fill = pokepal(pokemon = "plusle")[8]))
The points are the posterior medians, the thick inner lines the 50% intervals, and the thinner outer lines the 95% intervals. For kicks, we distinguished the three values by color.
If we want to examine \(R^2\) change for dropping the dummy variables, we’ll first fit a model that omits them.
model3 <-
brm(data = protest, family = gaussian,
liking ~ 1 + respappr,
chains = 4, cores = 4)
Here are the competing \(R^2\) distributions.
R2s <-
bayes_R2(model2, resp = "liking", summary = F) %>%
as_tibble() %>%
rename(R2 = R2_liking) %>%
bind_rows(
bayes_R2(model3, summary = F) %>%
as_tibble()
) %>%
mutate(fit = rep(c("model2", "model3"), each = 4000))
R2s %>%
ggplot(aes(x = R2, fill = fit)) +
geom_density(size = 0, alpha = 2/3) +
scale_fill_manual(values = pokepal(pokemon = "plusle")[c(6, 7)]) +
annotate("text", x = .15, y = 6.75, label = "model3", color = pokepal(pokemon = "plusle")[7], family = "Times") +
annotate("text", x = .35, y = 6.75, label = "model2", color = pokepal(pokemon = "plusle")[6], family = "Times") +
scale_y_continuous(NULL, breaks = NULL) +
coord_cartesian(xlim = 0:1) +
labs(title = expression(paste("The ", italic(R)^{2}, " densities for LIKING overlap a lot.")),
x = NULL) +
theme_tufte() +
theme(legend.position = "none",
plot.background = element_rect(fill = pokepal(pokemon = "plusle")[8]))
If you want to compare then with a change score, do something like this.
R2s %>%
mutate(iter = rep(1:4000, times = 2)) %>%
spread(key = fit, value = R2) %>%
mutate(difference = model2 - model3) %>%
ggplot(aes(x = difference)) +
geom_density(size = 0, fill = pokepal(pokemon = "plusle")[4]) +
geom_vline(xintercept = 0, color = pokepal(pokemon = "plusle")[8]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = expression(paste("The ", Delta, italic(R)^{2}, " distribution")),
subtitle = "Doesn't appear we have a lot of change.",
x = NULL) +
theme_tufte() +
theme(legend.title = element_blank(),
plot.background = element_rect(fill = pokepal(pokemon = "plusle")[8]))
Here’s how to compute the summaries for \(a_{1}b\) and \(a_{2}b\), including the Bayesian HMC intervals.
post %>%
mutate(a1b = a1*b,
a2b = a2*b) %>%
select(a1b:a2b) %>%
gather() %>%
group_by(key) %>%
summarize(mean = mean(value),
sd = sd(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 5
## key mean sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 a1b 0.52 0.142 0.263 0.822
## 2 a2b 0.664 0.157 0.381 0.991