# 5 The Many Variables & The Spurious Waffles

Correlation in general is not surprising. In large data sets, every pair of variables has a statistically discernible non-zero correlation. But since most correlations do not indicate causal relationships, we need tools for distinguishing mere association from evidence of causation. This is why so much effort is devoted to

multiple regression, using more than one predictor variable to simultaneously model an outcome. (McElreath, 2020a, p. 123,emphasisin the original)

In his endnote #80 (p. 562), McElreath wrote: “See Meehl (1990), in particular the ‘crud factor’ described on page 204.” For a fun look at some dubious correlations, check out the examples at https://www.tylervigen.com/spurious-correlations.

But back to the text, McElreath’s listed reasons for multivariable regression include:

- statistical control for confounds
- multiple/complex causation
- interactions

We’ll approach the first two in this chapter. Interactions are reserved for Chapter 7.

#### 5.0.0.1 Rethinking: Causal inference.

“Despite its central importance, there is no unified approach to causal inference yet in the sciences” (p. 124). To dip into the topic, you might check out the recent blog post by Finn Lattimore and David Rohde, *Causal inference with Bayes rule*; McElreath blog series on causal inference, starting with *Regression, Fire, and Dangerous Things (1/3)*; or McElreath’s epic 3-hour introductory lecture on causal inference called *Science Before Statistics: Causal Inference*.

## 5.1 Spurious associations

Load the Waffle House data.

```
library(tidyverse)
data(WaffleDivorce, package = "rethinking")
<- WaffleDivorce d
```

Did you notice how we used the `package`

argument within the `data()`

function, there? That allowed us to load the `WaffleDivorce`

without actually loading the **rethinking** package. Since we generally don’t want to have both **rethinking** and **brms** loaded up at the same time, using the `package`

argument will save us a line of code.

Now standardize the focal variables with the `rethinking::standardize()`

function.

```
<-
d %>%
d mutate(d = rethinking::standardize(Divorce),
m = rethinking::standardize(Marriage),
a = rethinking::standardize(MedianAgeMarriage))
```

Because we avoided directly loading the **rethinking** package, we did not have immediate access to McElreath’s handy `standardize()`

function. If you want to use a function from a package without loading that package, you can use the double colon operator `::`

. You can learn more about the double colon operator here. Now load **brms**.

```
rm(WaffleDivorce)
library(brms)
```

I’m not going to show the output, but you might go ahead and investigate the data with the typical functions. E.g.,

```
head(d)
glimpse(d)
```

Now we have our data, we can reproduce Figure 5.1. One convenient way to get the handful of sate labels into the plot was with the `geom_text_repel()`

function from the **ggrepel** package (Slowikowski, 2022). But first, we spent the last few chapters warming up with **ggplot2**. Going forward, each chapter will have its own plot theme. In this chapter, we’ll characterize the plots with `theme_bw() + theme(panel.grid = element_rect())`

and coloring based off of `"firebrick"`

.

```
library(ggrepel)
%>%
d ggplot(aes(x = WaffleHouses/Population, y = Divorce)) +
stat_smooth(method = "lm", fullrange = T, linewidth = 1/2,
color = "firebrick4", fill = "firebrick", alpha = 1/5) +
geom_point(size = 1.5, color = "firebrick4", alpha = 1/2) +
geom_text_repel(data = d %>% filter(Loc %in% c("ME", "OK", "AR", "AL", "GA", "SC", "NJ")),
aes(label = Loc),
size = 3, seed = 1042) + # this makes it reproducible
scale_x_continuous("Waffle Houses per million", limits = c(0, 55)) +
ylab("Divorce rate") +
coord_cartesian(xlim = c(0, 50), ylim = c(5, 15)) +
theme_bw() +
theme(panel.grid = element_blank())
```

Since these are geographically-based data, we might plot our three major variables in a map format. The **tigris** package (Walker, 2022) provides functions for retrieving latitude and longitude data for the 50 states and we can plot then with the `ggplot2::geom_sf()`

function. We’ll use the `right_join()`

function to combine those data with our primary data `d`

^{2}.

```
library(tigris)
# get the map data
<- states(cb = TRUE, resolution = "20m") %>%
d_states shift_geometry() %>%
# add the primary data
right_join(d %>%
mutate(NAME = Location %>% as.character()) %>%
select(d:a, NAME),
by = "NAME") %>%
# convert to the long format for faceting
pivot_longer(cols = c("d", "m", "a"), names_to = "variable")
```

Now plot.

```
%>%
d_states ggplot() +
geom_sf(aes(fill = value, geometry = geometry),
size = 0) +
scale_fill_gradient(low = "#f8eaea", high = "firebrick4") +
theme_void() +
theme(legend.position = "none",
strip.text = element_text(margin = margin(0, 0, .5, 0))) +
facet_wrap(~ variable, labeller = label_both)
```

One of the advantages of this visualization method is it just became clear that Nevada is missing from the `WaffleDivorce`

data. Execute `d %>% distinct(Location)`

to see for yourself and click here to find out why it’s missing. Those missing data should motivate the skills we’ll cover in Chapter 15. But let’s get back on track.

Here’s the standard deviation for `MedianAgeMarriage`

in its current metric.

`sd(d$MedianAgeMarriage)`

`## [1] 1.24363`

Our first statistical model follows the form

\[\begin{align*} \text{divorce_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{median_age_at_marriage_std}_i \\ \alpha & \sim \operatorname{Normal}(0, 0.2) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.5) \\ \sigma & \sim \operatorname{Exponential}(1), \end{align*}\]

where the `_std`

suffix indicates the variables are standardized (i.e., zero centered, with a standard deviation of one). Let’s fit the first univariable model.

```
.1 <-
b5brm(data = d,
family = gaussian,
~ 1 + a,
d prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
sample_prior = T,
file = "fits/b05.01")
```

Did you notice the `sample_prior = T`

line? This told **brms** to take draws from both the posterior distribution (as usual) and from the prior predictive distribution. If you look at McElreath’s **R** code 5.4, you’ll see he plotted 50 draws from the prior predictive distribution of his `m5.1`

. For our **brms** workflow, our first step is the extract our prior draws with the well-named `prior_draws()`

function.

```
<- prior_draws(b5.1)
prior
%>% glimpse() prior
```

```
## Rows: 4,000
## Columns: 3
## $ Intercept <dbl> 0.262235852, 0.435592131, -0.294742095, -0.007232040, 0.058308258, 0.050358902, …
## $ b <dbl> 0.01116464, -0.18996254, 0.43071360, -0.60435682, -0.31401810, 0.22439532, -0.06…
## $ sigma <dbl> 2.20380842, 0.57847121, 1.00816092, 2.94427170, 0.30736807, 0.94694471, 0.660626…
```

We ended up with 4,000 draws from the prior predictive distribution, much like `as_draws_df()`

would return 4,000 draws from the posterior. Next we’ll use `slice_sample()`

to take a random sample from our `prior`

object. After just a little more wrangling, we’ll be in good shape to plot our version of Figure 5.3.

```
set.seed(5)
%>%
prior slice_sample(n = 50) %>%
rownames_to_column("draw") %>%
expand_grid(a = c(-2, 2)) %>%
mutate(d = Intercept + b * a) %>%
ggplot(aes(x = a, y = d)) +
geom_line(aes(group = draw),
color = "firebrick", alpha = .4) +
labs(x = "Median age marriage (std)",
y = "Divorce rate (std)") +
coord_cartesian(ylim = c(-2, 2)) +
theme_bw() +
theme(panel.grid = element_blank())
```

To get the posterior predictions from our **brms** model, we’ll use `fitted()`

in place of `link()`

.

```
# determine the range of `a` values we'd like to feed into `fitted()`
<- tibble(a = seq(from = -3, to = 3.2, length.out = 30))
nd
# now use `fitted()` to get the model-implied trajectories
fitted(b5.1,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
# plot
ggplot(aes(x = a)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
geom_point(data = d,
aes(y = d),
size = 2, color = "firebrick4") +
labs(x = "Median age marriage (std)",
y = "Divorce rate (std)") +
coord_cartesian(xlim = range(d$a),
ylim = range(d$d)) +
theme_bw() +
theme(panel.grid = element_blank())
```

That’ll serve as our version of the right panel of Figure 5.2. To paraphrase McElreath, “if you inspect the [`print()`

] output, you’ll see that posterior for \([\beta_\text{a}]\) is reliably negative” (p. 127). Let’s see.

`print(b5.1)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: d ~ 1 + a
## Data: d (Number of observations: 50)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.00 0.10 -0.20 0.19 1.00 4143 2663
## a -0.57 0.12 -0.80 -0.34 1.00 3660 2775
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.82 0.09 0.68 1.02 1.00 3413 2900
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

On the standardized scale, -0.57 95% CI [-0.79, -0.34] is pretty negative, indeed.

We’re ready to fit our second univariable model.

```
.2 <-
b5brm(data = d,
family = gaussian,
~ 1 + m,
d prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.02")
```

The summary suggests \(\beta_\text{m}\) is of a smaller magnitude.

`print(b5.2)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: d ~ 1 + m
## Data: d (Number of observations: 50)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.00 0.11 -0.22 0.22 1.00 4273 2828
## m 0.35 0.13 0.09 0.61 1.00 3555 2749
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.95 0.10 0.78 1.17 1.00 3881 2939
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Now we’ll wangle and plot our version of the left panel in Figure 5.2.

```
<- tibble(m = seq(from = -2.5, to = 3.5, length.out = 30))
nd
fitted(b5.2, newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = m)) +
geom_smooth(aes(y = Estimate, ymin = Q2.5, ymax = Q97.5),
stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
geom_point(data = d,
aes(y = d),
size = 2, color = "firebrick4") +
labs(x = "Marriage rate (std)",
y = "Divorce rate (std)") +
coord_cartesian(xlim = range(d$m),
ylim = range(d$d)) +
theme_bw() +
theme(panel.grid = element_blank())
```

But merely comparing parameter means between different bivariate regressions is no way to decide which predictor is better. Both of these predictors could provide independent value, or they could be redundant, or one could eliminate the value of the other.

To make sense of this, we’re going to have to think causally. And then, only after we’ve done some thinking, a bigger regression model that includes both age at marriage and marriage rate will help us. (pp. 127–128)

### 5.1.1 Think before you regress.

It is helpful to introduce a particular type of causal graph known as a

DAG, short fordirected acyclic graph.Graphmeans it is nodes and connections.Directedmeans the connections have arrows that indicate directions of causal influence. Andacyclicmeans that causes do not eventually flow back on themselves. A DAG is a way of describing qualitative causal relationships among variables. It isn’t as detailed as a full model description, but it contains information that a purely statistical model does not. Unlike a statistical model, a DAG will tell you the consequences of intervening to change a variable. But only if the DAG is correct. There is no inference without assumption. (p. 128,emphasisin the original)

If you’re interested in making directed acyclic graphs (DAG) in **R**, the **dagitty** (Textor et al., 2016, 2021) and **ggdag** (Barrett, 2022b) packages are handy. Our approach will focus on **ggdag**.

```
# library(dagitty)
library(ggdag)
```

If all you want is a quick and dirty DAG for our three variables, you might execute something like this.

```
set.seed(5)
dagify(M ~ A,
~ A + M) %>%
D ggdag(node_size = 8)
```

We can pretty it up a little, too.

```
<-
dag_coords tibble(name = c("A", "M", "D"),
x = c(1, 3, 2),
y = c(2, 2, 1))
<-
p1 dagify(M ~ A,
~ A + M,
D coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
p1
```

We could have left out the `coords`

argument and let the `dagify()`

function set the layout of the nodes on its own. But since we were picky and wanted to ape McElreath, we first specified our coordinates in a tibble and then included that tibble in the `coords`

argument. For more on the topic, check out the Barrett’s (2022a) vignette, *An introduction to ggdag*.

Buy anyway, our DAG

represents a heuristic causal model. Like other models, it is an analytical assumption. The symbols \(A\), \(M\), and \(D\) are our observed variables. The arrows show directions of influence. What this DAG says is:

- \(A\) directly influences \(D\)
- \(M\) directly influences \(D\)
- \(A\) directly influences \(M\)
These statements can then have further implications. In this case, age of marriage influences divorce in two ways. First it has a direct effect, \(A \rightarrow D\). Perhaps a direct effect would arise because younger people change faster than older people and are therefore more likely to grow incompatible with a partner. Second, it has an indirect effect by influencing the marriage rate, which then influences divorce, \(A \rightarrow M \rightarrow D\). If people get married earlier, then the marriage rate may rise, because there are more young people. (p. 128)

Considering alternative models, “It could be that the association between \(M\) and \(D\) arises entirely from \(A\)’s influence on both \(M\) and \(D\). Like this:” (p. 129)

```
<-
p2 dagify(M ~ A,
~ A,
D coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
p2
```

This DAG is also consistent with the posterior distributions of models [

`b5.1`

] and [`b5.2`

]. Why? Because both \(M\) and \(D\) “listen” to \(A\). They have information from \(A\). So when you inspect the association between \(D\) and \(M\), you pick up that common information that they both got from listening to \(A\). You’ll see a more formal way to deduce this, in the next chapter.So which is it? Is there a direct effect of marriage rate, or rather is age at marriage just driving both, creating a spurious correlation between marriage rate and divorce rate? To find out, we need to consider carefully what each DAG implies. That’s what’s next. (p. 129)

#### 5.1.1.1 Rethinking: What’s a cause?

Questions of causation can become bogged down in philosophical debates. These debates are worth having. But they don’t usually intersect with statistical concerns. Knowing a cause in statistics means being able to correctly predict the consequences of an intervention. There are contexts in which even this is complicated. (p. 129)

### 5.1.2 Testable implications.

So far, we have entertained two DAGs. Here we use **patchwork** to combine them into one plot.

```
library(patchwork)
| p2 p1
```

McElreath encouraged us to examine the correlations among these three variables with `cor()`

.

```
%>%
d select(d:a) %>%
cor()
```

```
## d m a
## d 1.0000000 0.3737314 -0.5972392
## m 0.3737314 1.0000000 -0.7210960
## a -0.5972392 -0.7210960 1.0000000
```

If you just want the lower triangle, you can use the `lowerCor()`

function from the **psych** package (Revelle, 2022).

```
library(psych)
%>%
d select(d:a) %>%
lowerCor(digits = 3)
```

```
## d m a
## d 1.000
## m 0.374 1.000
## a -0.597 -0.721 1.000
```

Our second DAG, above, suggests “that \(D\) is independent of \(M\), conditional on \(A\)” (p. 130). We can use the `dagitty::impliedConditionalIndependencies()`

function to express that conditional independence in formal notation.

```
library(dagitty)
dagitty('dag{ D <- A -> M }') %>%
impliedConditionalIndependencies()
```

`## D _||_ M | A`

The lack of conditional dependencies in the first DAG may be expressed this way.

```
dagitty('dag{D <- A -> M -> D}') %>%
impliedConditionalIndependencies()
```

Okay, that was a bit of a tease. “There are no conditional independencies, so there is no output to display” (p. 131). To close out this section,

once you fit a multiple regression to predict divorce using both marriage rate and age at marriage, the model addresses the questions:

- After I already know marriage rate, what additional value is there in also knowing age at marriage?
- After I already know age at marriage, what additional value is there in also knowing marriage rate?
The parameter estimates corresponding to each predictor are the (often opaque) answers to these questions. The questions above are descriptive, and the answers are also descriptive. It is only the derivation of the testable implications above that gives these descriptive results a causal meaning. But that meaning is still dependent upon believing the DAG. (p. 131)

### 5.1.3 Multiple regression notation.

We can write the statistical formula for our first multivariable model as

\[\begin{align*} \text{Divorce_std}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{Marriage_std}_i + \beta_2 \text{MedianAgeMarriage_std}_i \\ \alpha & \sim \operatorname{Normal}(0, 0.2) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.5) \\ \beta_2 & \sim \operatorname{Normal}(0, 0.5) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]

### 5.1.4 Approximating the posterior.

Much like we used the `+`

operator to add single predictors to the intercept, we just use more `+`

operators in the `formula`

argument to add more predictors. Also notice we’re using the same prior `prior(normal(0, 1), class = b)`

for both predictors. Within the **brms** framework, they are both of `class = b`

. But if we wanted their priors to differ, we’d make two `prior()`

statements and differentiate them with the `coef`

argument. You’ll see examples of that later on.

```
.3 <-
b5brm(data = d,
family = gaussian,
~ 1 + m + a,
d prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.03")
```

Behold the summary.

`print(b5.3)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: d ~ 1 + m + a
## Data: d (Number of observations: 50)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.00 0.10 -0.19 0.20 1.00 3683 2454
## m -0.06 0.16 -0.38 0.26 1.00 2641 2654
## a -0.61 0.16 -0.93 -0.29 1.00 2799 2710
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.83 0.09 0.68 1.02 1.00 3492 2455
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

The **brms** package doesn’t have a convenience function like `rethinking::coeftab()`

. However, we can make something similar with a little deft wrangling and **ggplot2** code.

```
# first, extract and rename the necessary posterior parameters
bind_cols(
as_draws_df(b5.1) %>%
transmute(`b5.1_beta[A]` = b_a),
as_draws_df(b5.2) %>%
transmute(`b5.2_beta[M]` = b_m),
as_draws_df(b5.3) %>%
transmute(`b5.3_beta[M]` = b_m,
`b5.3_beta[A]` = b_a)
%>%
) # convert them to the long format, group, and get the posterior summaries
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
# since the `key` variable is really two variables in one, here we split them up
separate(col = name, into = c("fit", "parameter"), sep = "_") %>%
# plot!
ggplot(aes(x = mean, xmin = ll, xmax = ul, y = fit)) +
geom_vline(xintercept = 0, color = "firebrick", alpha = 1/5) +
geom_pointrange(color = "firebrick") +
labs(x = "posterior", y = NULL) +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_rect(fill = "transparent", color = "transparent")) +
facet_wrap(~ parameter, ncol = 1, labeller = label_parsed)
```

Don’t worry, coefficient plots won’t always be this complicated. We’ll walk out simpler ones toward the end of the chapter.

The substantive interpretation of all those coefficients is: “*Once we know median age at marriage for a State, there is little or no additional predictive power in also knowing the rate of marriage in that State*” (p. 134, *emphasis* in the original). This coheres well with one of our `impliedConditionalIndependencies()`

statements, from above.

```
dagitty('dag{ D <- A -> M }') %>%
impliedConditionalIndependencies()
```

`## D _||_ M | A`

#### 5.1.4.1 Overthinking: Simulating the divorce example.

Okay, let’s simulate our divorce data in a **tidyverse** sort of way.

```
# how many states would you like?
<- 50
n
set.seed(5)
<-
sim_d tibble(age = rnorm(n, mean = 0, sd = 1)) %>% # sim A
mutate(mar = rnorm(n, mean = -age, sd = 1), # sim A -> M
div = rnorm(n, mean = age, sd = 1)) # sim A -> D
head(sim_d)
```

```
## # A tibble: 6 × 3
## age mar div
## <dbl> <dbl> <dbl>
## 1 -0.841 2.30 -2.84
## 2 1.38 -1.20 2.52
## 3 -1.26 2.28 -0.580
## 4 0.0701 -0.662 0.279
## 5 1.71 -1.82 1.65
## 6 -0.603 -0.322 0.291
```

We simulated those data based on this formulation.

```
dagitty('dag{divorce <- age -> marriage}') %>%
impliedConditionalIndependencies()
```

`## dvrc _||_ mrrg | age`

Here are the quick `pairs()`

plots.

`pairs(sim_d, col = "firebrick4")`

If we use the `update()`

function, we can refit the last models in haste.

```
.1_sim <-
b5update(b5.1,
newdata = sim_d,
formula = div ~ 1 + age,
seed = 5,
file = "fits/b05.01_sim")
.2_sim <-
b5update(b5.2,
newdata = sim_d,
formula = div ~ 1 + mar,
seed = 5,
file = "fits/b05.02_sim")
.3_sim <-
b5update(b5.3,
newdata = sim_d,
formula = div ~ 1 + mar + age,
seed = 5,
file = "fits/b05.03_sim")
```

The steps for our homemade `coefplot()`

plot are basically the same. Just switch out some of the names.

```
bind_cols(
as_draws_df(b5.1_sim) %>%
transmute(`b5.1_beta[A]` = b_age),
as_draws_df(b5.2_sim) %>%
transmute(`b5.2_beta[M]` = b_mar),
as_draws_df(b5.3_sim) %>%
transmute(`b5.3_beta[M]` = b_mar,
`b5.3_beta[A]` = b_age)
%>%
) pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
# since the `key` variable is really two variables in one, here we split them up
separate(name, into = c("fit", "parameter"), sep = "_") %>%
# plot!
ggplot(aes(x = mean, xmin = ll, xmax = ul, y = fit)) +
geom_vline(xintercept = 0, color = "firebrick", alpha = 1/5) +
geom_pointrange(color = "firebrick") +
labs(x = "posterior", y = NULL) +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_blank()) +
facet_wrap(~ parameter, ncol = 1, labeller = label_parsed)
```

Well, okay. This is the same basic pattern, but with the signs switched and with a little simulation variability thrown in. But you get the picture.

### 5.1.5 Plotting multivariate posteriors.

“Let’s pause for a moment, before moving on. There are a lot of moving parts here: three variables, some strange DAGs, and three models. If you feel at all confused, it is only because you are paying attention” (p. 133).

Preach, brother.

Down a little further, McElreath gave us this deflationary delight: “There is a huge literature detailing a variety of plotting techniques that all attempt to help one understand multiple linear regression. None of these techniques is suitable for all jobs, and most do not generalize beyond linear regression” (pp. 134–135). Now you’re inspired, let’s learn three:

- predictor residual plots
- posterior prediction plots
- counterfactual plots

#### 5.1.5.1 Predictor residual plots.

To get ready to make our residual plots, we’ll predict one predictor, `m`

, with another one, `a`

.

```
.4 <-
b5brm(data = d,
family = gaussian,
~ 1 + a,
m prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.04")
```

`print(b5.4)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: m ~ 1 + a
## Data: d (Number of observations: 50)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.00 0.09 -0.19 0.17 1.00 3337 2718
## a -0.69 0.10 -0.89 -0.49 1.00 3961 2856
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.71 0.07 0.59 0.87 1.00 3746 2498
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

With `fitted()`

, we compute the expected values for each state (with the exception of Nevada). Since the `a`

values for each state are in the date we used to fit the model, we’ll omit the `newdata`

argument.

```
<-
f fitted(b5.4) %>%
data.frame() %>%
bind_cols(d)
glimpse(f)
```

```
## Rows: 50
## Columns: 20
## $ Estimate <dbl> 0.4168429, 0.4723741, 0.1391869, 0.9721549, -0.4161251, 0.1947181, -0.86…
## $ Est.Error <dbl> 0.11084576, 0.11575588, 0.09359310, 0.17181804, 0.10902234, 0.09588333, …
## $ Q2.5 <dbl> 0.1919818448, 0.2397930892, -0.0511494288, 0.6390384012, -0.6351883273, …
## $ Q97.5 <dbl> 0.63066723, 0.69604942, 0.32323792, 1.30586005, -0.20011387, 0.38252344,…
## $ Location <fct> Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, D…
## $ Loc <fct> AL, AK, AZ, AR, CA, CO, CT, DE, DC, FL, GA, HI, ID, IL, IN, IA, KS, KY, …
## $ Population <dbl> 4.78, 0.71, 6.33, 2.92, 37.25, 5.03, 3.57, 0.90, 0.60, 18.80, 9.69, 1.36…
## $ MedianAgeMarriage <dbl> 25.3, 25.2, 25.8, 24.3, 26.8, 25.7, 27.6, 26.6, 29.7, 26.4, 25.9, 26.9, …
## $ Marriage <dbl> 20.2, 26.0, 20.3, 26.4, 19.1, 23.5, 17.1, 23.1, 17.7, 17.0, 22.1, 24.9, …
## $ Marriage.SE <dbl> 1.27, 2.93, 0.98, 1.70, 0.39, 1.24, 1.06, 2.89, 2.53, 0.58, 0.81, 2.54, …
## $ Divorce <dbl> 12.7, 12.5, 10.8, 13.5, 8.0, 11.6, 6.7, 8.9, 6.3, 8.5, 11.5, 8.3, 7.7, 8…
## $ Divorce.SE <dbl> 0.79, 2.05, 0.74, 1.22, 0.24, 0.94, 0.77, 1.39, 1.89, 0.32, 0.58, 1.27, …
## $ WaffleHouses <int> 128, 0, 18, 41, 0, 11, 0, 3, 0, 133, 381, 0, 0, 2, 17, 0, 6, 64, 66, 0, …
## $ South <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, …
## $ Slaves1860 <int> 435080, 0, 0, 111115, 0, 0, 0, 1798, 0, 61745, 462198, 0, 0, 0, 0, 0, 2,…
## $ Population1860 <int> 964201, 0, 0, 435450, 379994, 34277, 460147, 112216, 75080, 140424, 1057…
## $ PropSlaves1860 <dbl> 4.5e-01, 0.0e+00, 0.0e+00, 2.6e-01, 0.0e+00, 0.0e+00, 0.0e+00, 1.6e-02, …
## $ d <dbl> 1.6542053, 1.5443643, 0.6107159, 2.0935693, -0.9270579, 1.0500799, -1.64…
## $ m <dbl> 0.02264406, 1.54980162, 0.04897436, 1.65512283, -0.26698927, 0.89154405,…
## $ a <dbl> -0.6062895, -0.6866993, -0.2042408, -1.4103870, 0.5998567, -0.2846505, 1…
```

After a little data processing, we can make the upper left panel of Figure 5.4.

```
<-
p1 %>%
f ggplot(aes(x = a, y = m)) +
geom_point(size = 2, shape = 1, color = "firebrick4") +
geom_segment(aes(xend = a, yend = Estimate),
linewidth = 1/4) +
geom_line(aes(y = Estimate),
color = "firebrick4") +
geom_text_repel(data = . %>% filter(Loc %in% c("WY", "ND", "ME", "HI", "DC")),
aes(label = Loc),
size = 3, seed = 14) +
labs(x = "Age at marriage (std)",
y = "Marriage rate (std)") +
coord_cartesian(ylim = range(d$m)) +
theme_bw() +
theme(panel.grid = element_blank())
p1
```

We get the residuals with the well-named `residuals()`

function. Much like with `brms::fitted()`

, `brms::residuals()`

returns a four-vector matrix with the number of rows equal to the number of observations in the original data (by default, anyway). The vectors have the familiar names: `Estimate`

, `Est.Error`

, `Q2.5`

, and `Q97.5`

. See the **brms** reference manual (Bürkner, 2022i) for details.

With our residuals in hand, we just need a little more data processing to make lower left panel of Figure 5.4.

```
<-
r residuals(b5.4) %>%
# to use this in ggplot2, we need to make it a tibble or data frame
data.frame() %>%
bind_cols(d)
<-
p3 %>%
r ggplot(aes(x = Estimate, y = d)) +
stat_smooth(method = "lm", fullrange = T,
color = "firebrick4", fill = "firebrick4",
alpha = 1/5, linewidth = 1/2) +
geom_vline(xintercept = 0, linetype = 2, color = "grey50") +
geom_point(size = 2, color = "firebrick4", alpha = 2/3) +
geom_text_repel(data = . %>% filter(Loc %in% c("WY", "ND", "ME", "HI", "DC")),
aes(label = Loc),
size = 3, seed = 5) +
scale_x_continuous(limits = c(-2, 2)) +
coord_cartesian(xlim = range(r$Estimate)) +
labs(x = "Marriage rate residuals",
y = "Divorce rate (std)") +
theme_bw() +
theme(panel.grid = element_blank())
p3
```

To get the `MedianAgeMarriage_s`

residuals, we have to fit the corresponding model where `m`

predicts `a`

.

```
.4b <-
b5brm(data = d,
family = gaussian,
~ 1 + m,
a prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.04b")
```

With `b5.4b`

in hand, we’re ready to make the upper right panel of Figure 5.4.

```
<-
p2 fitted(b5.4b) %>%
data.frame() %>%
bind_cols(d) %>%
ggplot(aes(x = m, y = a)) +
geom_point(size = 2, shape = 1, color = "firebrick4") +
geom_segment(aes(xend = m, yend = Estimate),
linewidth = 1/4) +
geom_line(aes(y = Estimate),
color = "firebrick4") +
geom_text_repel(data = . %>% filter(Loc %in% c("DC", "HI", "ID")),
aes(label = Loc),
size = 3, seed = 5) +
labs(x = "Marriage rate (std)",
y = "Age at marriage (std)") +
coord_cartesian(ylim = range(d$a)) +
theme_bw() +
theme(panel.grid = element_blank())
p2
```

And now we’ll get the new batch of residuals, do a little data processing, and make a plot corresponding to the final panel of Figure 5.4.

```
<-
r residuals(b5.4b) %>%
data.frame() %>%
bind_cols(d)
<-
p4 %>%
r ggplot(aes(x = Estimate, y = d)) +
stat_smooth(method = "lm", fullrange = T,
color = "firebrick4", fill = "firebrick4",
alpha = 1/5, linewidth = 1/2) +
geom_vline(xintercept = 0, linetype = 2, color = "grey50") +
geom_point(size = 2, color = "firebrick4", alpha = 2/3) +
geom_text_repel(data = . %>% filter(Loc %in% c("ID", "HI", "DC")),
aes(label = Loc),
size = 3, seed = 5) +
scale_x_continuous(limits = c(-2, 3)) +
coord_cartesian(xlim = range(r$Estimate),
ylim = range(d$d)) +
labs(x = "Age at marriage residuals",
y = "Divorce rate (std)") +
theme_bw() +
theme(panel.grid = element_blank())
p4
```

Here we close out the section by combining our four subplots into one glorious whole with a little **patchwork** syntax.

`+ p2 + p3 + p4 + plot_annotation(title = "Understanding multiple regression through residuals") p1 `

##### 5.1.5.1.1 Rethinking: Residuals are parameters, not data.

There is a tradition, especially in parts of biology, of using residuals from one model as data in another model. For example, a biologist might regress brain size on body size and then use the brain size residuals as data in another model. This procedure is always a mistake. Residuals are not known. They are parameters, variables with unobserved values. Treating them as known values throws away uncertainty. (p. 137)

Let’s hammer this point home. Recall how `brms::residuals()`

returns four columns: `Estimate`

, `Est.Error`

, `Q2.5`

, and `Q97.5`

.

```
%>%
r glimpse()
```

```
## Rows: 50
## Columns: 20
## $ Estimate <dbl> -0.58903636, 0.38918106, -0.16873543, -0.26149791, 0.41633575, 0.3349249…
## $ Est.Error <dbl> 0.09007649, 0.18147491, 0.09015909, 0.19089755, 0.09437421, 0.12750315, …
## $ Q2.5 <dbl> -0.76631179, 0.03230000, -0.34731730, -0.63777776, 0.23116991, 0.0856600…
## $ Q97.5 <dbl> -0.413936585, 0.741616393, 0.004905611, 0.109295277, 0.599561771, 0.5784…
## $ Location <fct> Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, D…
## $ Loc <fct> AL, AK, AZ, AR, CA, CO, CT, DE, DC, FL, GA, HI, ID, IL, IN, IA, KS, KY, …
## $ Population <dbl> 4.78, 0.71, 6.33, 2.92, 37.25, 5.03, 3.57, 0.90, 0.60, 18.80, 9.69, 1.36…
## $ MedianAgeMarriage <dbl> 25.3, 25.2, 25.8, 24.3, 26.8, 25.7, 27.6, 26.6, 29.7, 26.4, 25.9, 26.9, …
## $ Marriage <dbl> 20.2, 26.0, 20.3, 26.4, 19.1, 23.5, 17.1, 23.1, 17.7, 17.0, 22.1, 24.9, …
## $ Marriage.SE <dbl> 1.27, 2.93, 0.98, 1.70, 0.39, 1.24, 1.06, 2.89, 2.53, 0.58, 0.81, 2.54, …
## $ Divorce <dbl> 12.7, 12.5, 10.8, 13.5, 8.0, 11.6, 6.7, 8.9, 6.3, 8.5, 11.5, 8.3, 7.7, 8…
## $ Divorce.SE <dbl> 0.79, 2.05, 0.74, 1.22, 0.24, 0.94, 0.77, 1.39, 1.89, 0.32, 0.58, 1.27, …
## $ WaffleHouses <int> 128, 0, 18, 41, 0, 11, 0, 3, 0, 133, 381, 0, 0, 2, 17, 0, 6, 64, 66, 0, …
## $ South <int> 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, …
## $ Slaves1860 <int> 435080, 0, 0, 111115, 0, 0, 0, 1798, 0, 61745, 462198, 0, 0, 0, 0, 0, 2,…
## $ Population1860 <int> 964201, 0, 0, 435450, 379994, 34277, 460147, 112216, 75080, 140424, 1057…
## $ PropSlaves1860 <dbl> 4.5e-01, 0.0e+00, 0.0e+00, 2.6e-01, 0.0e+00, 0.0e+00, 0.0e+00, 1.6e-02, …
## $ d <dbl> 1.6542053, 1.5443643, 0.6107159, 2.0935693, -0.9270579, 1.0500799, -1.64…
## $ m <dbl> 0.02264406, 1.54980162, 0.04897436, 1.65512283, -0.26698927, 0.89154405,…
## $ a <dbl> -0.6062895, -0.6866993, -0.2042408, -1.4103870, 0.5998567, -0.2846505, 1…
```

In the residual plots from the lower two panels of Figure 5.4, we focused on the means of the residuals (i.e., `Estimate`

). However, we can express the uncertainty in the residuals by including error bars for the 95% intervals. Here’s what that might look like with a slight reworking of the lower right panel of Figure 5.4.

```
%>%
r ggplot(aes(x = Estimate, y = d)) +
stat_smooth(method = "lm", fullrange = T,
color = "firebrick4", fill = "firebrick4",
alpha = 1/5, linewidth = 1/2) +
geom_vline(xintercept = 0, linetype = 2, color = "grey50") +
# the only change is here
geom_pointrange(aes(xmin = Q2.5, xmax = Q97.5),
color = "firebrick4", alpha = 2/3) +
geom_text_repel(data = . %>% filter(Loc %in% c("ID", "HI", "DC")),
aes(label = Loc),
size = 3, seed = 5) +
scale_x_continuous(limits = c(-2, 3)) +
coord_cartesian(xlim = range(r$Estimate),
ylim = range(d$d)) +
labs(x = "Age at marriage residuals",
y = "Divorce rate (std)") +
theme_bw() +
theme(panel.grid = element_blank())
```

Look at that. If you were to fit a follow-up model based on only the point estimates (posterior means) of those residuals, you’d be ignoring a lot of uncertainty. For more on the topic of residuals, see Freckleton (2002), *On the misuse of residuals in ecology: regression of residuals vs. multiple regression*.

#### 5.1.5.2 Posterior prediction plots.

“It’s important to check the model’s implied predictions against the observed data” (p. 137). For more on the topic, check out Gabry and colleagues’ (2019) *Visualization in Bayesian workflow* or Simpson’s related blog post, *Touch me, I want to feel your data*.

The code below will make our version of Figure 5.5.

```
fitted(b5.3) %>%
data.frame() %>%
# un-standardize the model predictions
mutate_all(~. * sd(d$Divorce) + mean(d$Divorce)) %>%
bind_cols(d) %>%
ggplot(aes(x = Divorce, y = Estimate)) +
geom_abline(linetype = 2, color = "grey50", linewidth = 0.5) +
geom_point(size = 1.5, color = "firebrick4", alpha = 3/4) +
geom_linerange(aes(ymin = Q2.5, ymax = Q97.5),
linewidth = 1/4, color = "firebrick4") +
geom_text(data = . %>% filter(Loc %in% c("ID", "UT", "RI", "ME")),
aes(label = Loc),
hjust = 1, nudge_x = - 0.25) +
labs(x = "Observed divorce", y = "Predicted divorce") +
theme_bw() +
theme(panel.grid = element_blank())
```

It’s easy to see from this arrangement of the simulations that the model under-predicts for States with very high divorce rates while it over-predicts for States with very low divorce rates. That’s normal. This is what regression does–it is skeptical of extreme values, so it expects regression towards the mean. But beyond this general regression to the mean, some States are very frustrating to the model, lying very far from the diagonal. (p. 139)

##### 5.1.5.2.1 Rethinking: Stats, huh, yeah what is it good for?

Often people want statistical modeling to do things that statistical modeling cannot do. For example, we’d like to know whether an effect is “real” or rather spurious. Unfortunately, modeling merely quantifies uncertainty in the precise way that the model understands the problem. Usually answers to large world questions about truth and causation depend upon information not included in the model. For example, any observed correlation between an outcome and predictor could be eliminated or reversed once another predictor is added to the model. But if we cannot think of the right variable, we might never notice. Therefore all statistical models are vulnerable to and demand critique, regardless of the precision of their estimates and apparent accuracy of their predictions. (p. 139)

##### 5.1.5.2.2 Overthinking: Simulating spurious association.

```
<- 100 # number of cases
n
set.seed(5) # setting the seed makes the results reproducible
<-
d_spur tibble(x_real = rnorm(n), # x_real as Gaussian with mean 0 and SD 1 (i.e., the defaults)
x_spur = rnorm(n, x_real), # x_spur as Gaussian with mean = x_real
y = rnorm(n, x_real)) # y as Gaussian with mean = x_real
```

Here are the quick `pairs()`

plots.

`pairs(d_spur, col = "firebrick4")`

We may as well fit and evaluate a model.

```
.0_spur <-
b5brm(data = d_spur,
family = gaussian,
~ 1 + x_real + x_spur,
y prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.00_spur")
```

```
fixef(b5.0_spur) %>%
round(digits = 2)
```

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept -0.01 0.09 -0.18 0.16
## x_real 0.93 0.15 0.65 1.22
## x_spur 0.08 0.09 -0.10 0.26
```

If we let “r” stand for `x_rel`

and “s” stand for `x_spur`

, here’s how we might depict that our simulation in a DAG.

```
<-
dag_coords tibble(name = c("r", "s", "y"),
x = c(1, 3, 2),
y = c(2, 2, 1))
dagify(s ~ r,
~ r,
y coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
```

#### 5.1.5.3 Counterfactual plots.

A second sort of inferential plot displays the causal implications of the model. I call these plots

counterfactual, because they can be produced for any values of the predictor variables you like, even unobserved combinations like very high median age of marriage and very high marriage rate. There are no States with this combination, but in a counterfactual plot, you can ask the model for a prediction for such a State. (p. 140,emphasisin the original)

Take another look at one of the DAGs from back in Section 5.1.2.

```
<-
dag_coords tibble(name = c("A", "M", "D"),
x = c(1, 3, 2),
y = c(2, 2, 1))
dagify(M ~ A,
~ A + M,
D coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
```

The full statistical model implied in this DAG requires we have two criterion variables, \(D\) and \(M\). To simultaneously model the effects of \(A\) on \(M\) and \(D\) AND the effects of \(A\) on \(M\) with **brms**, we’ll need to invoke the multivariate syntax. There are several ways to do this with **brms**, which Bürkner outlines in his (2022d) vignette, *Estimating multivariate models with brms*. At this point, it’s important to recognize we have two regression models. As a first step, we might specify each model separately in a `bf()`

function and save them as objects.

```
<- bf(d ~ 1 + a + m)
d_model <- bf(m ~ 1 + a) m_model
```

Next we will combine our `bf()`

objects with the `+`

operator within the `brm()`

function. For a model like this, we also specify `set_rescor(FALSE)`

to prevent **brms** from adding a residual correlation between `d`

and `m`

. Also, notice how each prior statement includes a `resp`

argument. This clarifies which sub-model the prior refers to.

```
.3_A <-
b5brm(data = d,
family = gaussian,
+ m_model + set_rescor(FALSE),
d_model prior = c(prior(normal(0, 0.2), class = Intercept, resp = d),
prior(normal(0, 0.5), class = b, resp = d),
prior(exponential(1), class = sigma, resp = d),
prior(normal(0, 0.2), class = Intercept, resp = m),
prior(normal(0, 0.5), class = b, resp = m),
prior(exponential(1), class = sigma, resp = m)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.03_A")
```

Look at the summary.

`print(b5.3_A)`

```
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: d ~ 1 + a + m
## m ~ 1 + a
## Data: d (Number of observations: 50)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## d_Intercept 0.00 0.10 -0.20 0.20 1.00 5428 2829
## m_Intercept -0.00 0.09 -0.18 0.18 1.00 5395 3085
## d_a -0.61 0.15 -0.91 -0.30 1.00 3712 3233
## d_m -0.06 0.16 -0.37 0.26 1.00 3514 2872
## m_a -0.69 0.10 -0.90 -0.49 1.00 5957 2665
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_d 0.82 0.08 0.68 1.01 1.00 5243 3246
## sigma_m 0.71 0.08 0.58 0.88 1.00 4792 3021
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Note our parameters now all have either a `d_`

or an `m_`

prefix to help clarify which sub-model they were for. The `m_a`

row shows how strongly and negatively associated `a`

is to `m`

. Here’s how we might use `predict()`

to make our version of the counterfactual plot in the left panel of Figure 5.6.

```
<- tibble(a = seq(from = -2, to = 2, length.out = 30),
nd m = 0)
<-
p1 predict(b5.3_A,
resp = "d",
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = a, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_smooth(stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
labs(subtitle = "Total counterfactual effect of A on D",
x = "manipulated A",
y = "counterfactual D") +
coord_cartesian(ylim = c(-2, 2)) +
theme_bw() +
theme(panel.grid = element_blank())
p1
```

Because the plot is based on a multivariate model, we used the `resp`

argument within `predict()`

to tell **brms** which of our two criterion variables (`d`

or `m`

) we were interested in. Unlike McElreath’s **R** code 5.20, we included predictor values for both `a`

and `m`

. This is because **brms** requires we provide values for all predictors in a model when using `predict()`

. Even though we set all the `m`

values to 0 for the counterfactual, it was necessary to tell `predict()`

that’s exactly what we wanted.

Let’s do that all again, this time making the counterfactual for `d`

. While we’re at it, we’ll combine this subplot with the last one to make the full version of Figure 5.6.

```
<- tibble(a = seq(from = -2, to = 2, length.out = 30))
nd
<-
p2 predict(b5.3_A,
resp = "m",
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = a, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_smooth(stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
labs(subtitle = "Counterfactual effect of A on M",
x = "manipulated A",
y = "counterfactual M") +
coord_cartesian(ylim = c(-2, 2)) +
theme_bw() +
theme(panel.grid = element_blank())
+ p2 + plot_annotation(title = "Counterfactual plots for the multivariate divorce model") p1
```

With our **brms** + **tidyverse** paradigm, we might compute “the expected causal effect of increasing median age at marriage from 20 to 30” (p. 142) like this.

```
# new data frame, standardized to mean 26.1 and std dev 1.24
<- tibble(a = (c(20, 30) - 26.1) / 1.24,
nd m = 0)
predict(b5.3_A,
resp = "d",
newdata = nd,
summary = F) %>%
data.frame() %>%
set_names("a20", "a30") %>%
mutate(difference = a30 - a20) %>%
summarise(mean = mean(difference))
```

```
## mean
## 1 -4.899465
```

The trick with simulating counterfactuals is to realize that when we manipulate some variable \(X\), we break the causal influence of other variables on \(X\). This is the same as saying we modify the DAG so that no arrows enter \(X\). Suppose for example that we now simulate the effect of manipulating \(M.\) (p. 143)

Here’s how to plot that DAG.

```
<-
dag_coords tibble(name = c("A", "M", "D"),
x = c(1, 3, 2),
y = c(2, 2, 1))
dagify(D ~ A + M,
coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
```

Here’s the new counterfactual plot focusing on \(M \rightarrow D\), holding \(A = 0\), Figure 5.7.

```
<- tibble(m = seq(from = -2, to = 2, length.out = 30),
nd a = 0)
predict(b5.3_A,
resp = "d",
newdata = nd) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = m, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_smooth(stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
labs(subtitle = "Total counterfactual effect of M on D",
x = "manipulated M",
y = "counterfactual D") +
coord_cartesian(ylim = c(-2, 2)) +
theme_bw() +
theme(panel.grid = element_blank())
```

##### 5.1.5.3.1 Overthinking: Simulating counterfactuals.

Just like McElreath showed how to compute the counterfactuals without his `sim()`

function, we can make ours without `brms::predict()`

. First we’ll start out extracting the posterior draws.

```
<-
post as_draws_df(b5.3_A)
```

Here we use `expand_grid()`

to elongate the output from above by a factor of thirty, each time corresponding to one of the levels of `a = seq(from = -2, to = 2, length.out = 30)`

. In the two `mutate()`

lines that follow, we plug the model formulas into the `rnorm()`

function to take random draws from posterior predictive distribution. The rest is just wrangling and summarizing.

```
set.seed(5)
<-
post %>%
post expand_grid(a = seq(from = -2, to = 2, length.out = 30)) %>%
mutate(m_sim = rnorm(n(), mean = b_m_Intercept + b_m_a * a, sd = sigma_m)) %>%
mutate(d_sim = rnorm(n(), mean = b_d_Intercept + b_d_a * a + b_d_m * m_sim, sd = sigma_d)) %>%
pivot_longer(ends_with("sim")) %>%
group_by(a, name) %>%
summarise(mean = mean(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975))
# what did we do?
head(post)
```

```
## # A tibble: 6 × 5
## # Groups: a [3]
## a name mean ll ul
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 -2 d_sim 1.14 -0.575 2.87
## 2 -2 m_sim 1.38 -0.0700 2.86
## 3 -1.86 d_sim 1.04 -0.665 2.68
## 4 -1.86 m_sim 1.28 -0.201 2.79
## 5 -1.72 d_sim 0.978 -0.735 2.65
## 6 -1.72 m_sim 1.17 -0.249 2.65
```

Now we plot.

```
%>%
post mutate(dv = if_else(name == "d_sim", "predictions for D", "predictions for M")) %>%
ggplot(aes(x = a, y = mean, ymin = ll, ymax = ul)) +
geom_smooth(stat = "identity",
fill = "firebrick", color = "firebrick4", alpha = 1/5, linewidth = 1/4) +
labs(title = "Hand-made counterfactual plots for the multivariate divorce model",
x = "manipulated A",
y = "counterfactual") +
coord_cartesian(ylim = c(-2, 2)) +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_blank()) +
facet_wrap(~ dv)
```

## 5.2 Masked relationship

A second reason to use more than one predictor variable is to measure the direct influences of multiple factors on an outcome, when none of those influences is apparent from bivariate relationships. This kind of problem tends to arise when there are two predictor variables that are correlated with one another. However, one of these is positively correlated with the outcome and the other is negatively correlated with it. (p. 144)

Let’s load the Hinde & Milligan (2011) `milk`

data.

```
data(milk, package = "rethinking")
<- milk
d rm(milk)
glimpse(d)
```

```
## Rows: 29
## Columns: 8
## $ clade <fct> Strepsirrhine, Strepsirrhine, Strepsirrhine, Strepsirrhine, Strepsirrhine, …
## $ species <fct> Eulemur fulvus, E macaco, E mongoz, E rubriventer, Lemur catta, Alouatta se…
## $ kcal.per.g <dbl> 0.49, 0.51, 0.46, 0.48, 0.60, 0.47, 0.56, 0.89, 0.91, 0.92, 0.80, 0.46, 0.7…
## $ perc.fat <dbl> 16.60, 19.27, 14.11, 14.91, 27.28, 21.22, 29.66, 53.41, 46.08, 50.58, 41.35…
## $ perc.protein <dbl> 15.42, 16.91, 16.85, 13.18, 19.50, 23.58, 23.46, 15.80, 23.34, 22.33, 20.85…
## $ perc.lactose <dbl> 67.98, 63.82, 69.04, 71.91, 53.22, 55.20, 46.88, 30.79, 30.58, 27.09, 37.80…
## $ mass <dbl> 1.95, 2.09, 2.51, 1.62, 2.19, 5.25, 5.37, 2.51, 0.71, 0.68, 0.12, 0.47, 0.3…
## $ neocortex.perc <dbl> 55.16, NA, NA, NA, NA, 64.54, 64.54, 67.64, NA, 68.85, 58.85, 61.69, 60.32,…
```

You might inspect the primary variables in the data with the `pairs()`

function.

```
%>%
d select(kcal.per.g, mass, neocortex.perc) %>%
pairs(col = "firebrick4")
```

By just looking at that mess, do you think you could describe the associations of `mass`

and `neocortex.perc`

with the criterion, `kcal.per.g`

? I couldn’t. It’s a good thing we have math.

Let’s standardize our variables by hand.

```
<-
d %>%
d mutate(kcal.per.g_s = (kcal.per.g - mean(kcal.per.g)) / sd(kcal.per.g),
log_mass_s = (log(mass) - mean(log(mass))) / sd(log(mass)),
neocortex.perc_s = (neocortex.perc - mean(neocortex.perc, na.rm = T)) / sd(neocortex.perc, na.rm = T))
```

McElreath has us starting off our first `milk`

model with more permissive priors than we’ve used in the past. Although we should note that from a historical perspective, these priors are pretty informative. Times keep changing.

```
.5_draft <-
b5brm(data = d,
family = gaussian,
~ 1 + neocortex.perc_s,
kcal.per.g_s prior = c(prior(normal(0, 1), class = Intercept),
prior(normal(0, 1), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
sample_prior = T,
file = "fits/b05.05_draft")
```

Similar to the **rethinking** example in the text, **brms** warned that “Rows containing NAs were excluded from the model.” This isn’t necessarily a problem; the model fit just fine. But we should be ashamed of ourselves and look eagerly forward to Chapter 15 where we’ll learn how to do better.

To compliment how McElreath removed cases with missing values on our variables of interest with base **R** `complete.cases()`

, here we’ll do so with `tidyr::drop_na()`

and a little help with `ends_with()`

.

```
<-
dcc %>%
d drop_na(ends_with("_s"))
# how many rows did we drop?
nrow(d) - nrow(dcc)
```

`## [1] 12`

We’ll use `update()`

to refit the model with the altered data.

```
.5_draft <-
b5update(b5.5_draft,
newdata = dcc,
seed = 5)
```

“Before considering the posterior predictions, let’s consider those priors. As in many simple linear regression problems, these priors are harmless. But are they reasonable?” (p. 146). Let’s find out with our version of Figure 5.8.a.

```
set.seed(5)
prior_draws(b5.5_draft) %>%
slice_sample(n = 50) %>%
rownames_to_column() %>%
expand_grid(neocortex.perc_s = c(-2, 2)) %>%
mutate(kcal.per.g_s = Intercept + b * neocortex.perc_s) %>%
ggplot(aes(x = neocortex.perc_s, y = kcal.per.g_s)) +
geom_line(aes(group = rowname),
color = "firebrick", alpha = .4) +
coord_cartesian(ylim = c(-2, 2)) +
labs(x = "neocortex percent (std)",
y = "kilocal per g (std)",
subtitle = "Intercept ~ dnorm(0, 1)\nb ~ dnorm(0, 1)") +
theme_bw() +
theme(panel.grid = element_blank())
```

That’s a mess. How’d the posterior turn out?

`print(b5.5_draft)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 1 + neocortex.perc_s
## Data: dcc (Number of observations: 17)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.10 0.27 -0.44 0.63 1.00 3333 2518
## neocortex.perc_s 0.16 0.27 -0.40 0.71 1.00 3076 2077
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.14 0.21 0.81 1.61 1.00 2953 2747
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Let’s tighten up our priors and fit `b5.5`

.

```
.5 <-
b5brm(data = dcc,
family = gaussian,
~ 1 + neocortex.perc_s,
kcal.per.g_s prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
sample_prior = T,
file = "fits/b05.05")
```

Now make our version of Figure 5.8.b.

```
set.seed(5)
prior_draws(b5.5) %>%
slice_sample(n = 50) %>%
rownames_to_column() %>%
expand_grid(neocortex.perc_s = c(-2, 2)) %>%
mutate(kcal.per.g_s = Intercept + b * neocortex.perc_s) %>%
ggplot(aes(x = neocortex.perc_s, y = kcal.per.g_s, group = rowname)) +
geom_line(color = "firebrick", alpha = .4) +
coord_cartesian(ylim = c(-2, 2)) +
labs(subtitle = "Intercept ~ dnorm(0, 0.2)\nb ~ dnorm(0, 0.5)",
x = "neocortex percent (std)",
y = "kilocal per g (std)") +
theme_bw() +
theme(panel.grid = element_blank())
```

Look at the posterior summary.

`print(b5.5)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 1 + neocortex.perc_s
## Data: dcc (Number of observations: 17)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.04 0.16 -0.28 0.34 1.00 3569 2651
## neocortex.perc_s 0.12 0.24 -0.36 0.58 1.00 3028 2556
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.11 0.20 0.80 1.58 1.00 2917 2283
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

The results are very similar to those returned earlier from `print(b5.5_draft)`

. It’s not in the text, but let’s compare the parameter estimates between the two models with another version of our homemade `coeftab()`

plot.

```
# wrangle
bind_rows(
as_draws_df(b5.5_draft) %>% select(b_Intercept:sigma),
as_draws_df(b5.5) %>% select(b_Intercept:sigma)
%>%
) mutate(fit = rep(c("b5.5_draft", "b5.5"), each = n() / 2)) %>%
pivot_longer(-fit) %>%
group_by(name, fit) %>%
summarise(mean = mean(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
mutate(fit = factor(fit, levels = c("b5.5_draft", "b5.5"))) %>%
# plot
ggplot(aes(x = mean, y = fit, xmin = ll, xmax = ul)) +
geom_pointrange(color = "firebrick") +
geom_hline(yintercept = 0, color = "firebrick", alpha = 1/5) +
labs(x = "posterior",
y = NULL) +
theme_bw() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
strip.background = element_blank()) +
facet_wrap(~ name, ncol = 1)
```

The results were quite similar, but the posteriors from `b5.5`

are more precise. Let’s get back on track with the text and make the top left panel of Figure 5.9. Just for kicks, we’ll superimpose 50% intervals atop 95% intervals for the next few plots. Here’s Figure 5.9, top left.

```
<- tibble(neocortex.perc_s = seq(from = -2.5, to = 2, length.out = 30))
nd
fitted(b5.5,
newdata = nd,
probs = c(.025, .975, .25, .75)) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = neocortex.perc_s, y = Estimate)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "firebrick", alpha = 1/5) +
geom_smooth(aes(ymin = Q25, ymax = Q75),
stat = "identity",
fill = "firebrick4", color = "firebrick4", alpha = 1/5, linewidth = 1/2) +
geom_point(data = dcc,
aes(x = neocortex.perc_s, y = kcal.per.g_s),
size = 2, color = "firebrick4") +
coord_cartesian(xlim = range(dcc$neocortex.perc_s),
ylim = range(dcc$kcal.per.g_s)) +
labs(x = "neocortex percent (std)",
y = "kilocal per g (std)") +
theme_bw() +
theme(panel.grid = element_blank())
```

Do note the `probs`

argument in the `fitted()`

code, above.

Now we use `log_mass_s`

as the new sole predictor.

```
.6 <-
b5brm(data = dcc,
family = gaussian,
~ 1 + log_mass_s,
kcal.per.g_s prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
sample_prior = T,
file = "fits/b05.06")
```

`print(b5.6)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 1 + log_mass_s
## Data: dcc (Number of observations: 17)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.05 0.16 -0.26 0.36 1.00 3320 2765
## log_mass_s -0.28 0.21 -0.69 0.13 1.00 3285 2586
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 1.06 0.19 0.75 1.52 1.00 3388 2359
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Make Figure 5.9, top right.

```
<- tibble(log_mass_s = seq(from = -2.5, to = 2.5, length.out = 30))
nd
fitted(b5.6,
newdata = nd,
probs = c(.025, .975, .25, .75)) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = log_mass_s, y = Estimate)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "firebrick", alpha = 1/5) +
geom_smooth(aes(ymin = Q25, ymax = Q75),
stat = "identity",
fill = "firebrick4", color = "firebrick4", alpha = 1/5, linewidth = 1/2) +
geom_point(data = dcc,
aes(y = kcal.per.g_s),
size = 2, color = "firebrick4") +
coord_cartesian(xlim = range(dcc$log_mass_s),
ylim = range(dcc$kcal.per.g_s)) +
labs(x = "log body mass (std)",
y = "kilocal per g (std)") +
theme_bw() +
theme(panel.grid = element_blank())
```

Finally, we’re ready to fit with both predictors included in a multivariable model. The statistical formula is

\[\begin{align*} \text{kcal.per.g_s}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{neocortex.perc_s}_i + \beta_2 \text{log_mass_s} \\ \alpha & \sim \operatorname{Normal}(0, 0.2) \\ \beta_1 & \sim \operatorname{Normal}(0, 0.5) \\ \beta_2 & \sim \operatorname{Normal}(0, 0.5) \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]

Fit the model.

```
.7 <-
b5brm(data = dcc,
family = gaussian,
~ 1 + neocortex.perc_s + log_mass_s,
kcal.per.g_s prior = c(prior(normal(0, 0.2), class = Intercept),
prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.07")
```

`print(b5.7)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 1 + neocortex.perc_s + log_mass_s
## Data: dcc (Number of observations: 17)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.06 0.15 -0.23 0.34 1.00 3250 2504
## neocortex.perc_s 0.60 0.29 -0.00 1.12 1.00 2389 2369
## log_mass_s -0.64 0.25 -1.10 -0.10 1.00 2315 2397
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.87 0.18 0.59 1.29 1.00 2587 2564
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Once again, let’s roll out our homemade `coefplot()`

plot code.

```
bind_cols(
as_draws_df(b5.5) %>%
transmute(`b5.5_beta[N]` = b_neocortex.perc_s),
as_draws_df(b5.6) %>%
transmute(`b5.6_beta[M]` = b_log_mass_s),
as_draws_df(b5.7) %>%
transmute(`b5.7_beta[N]` = b_neocortex.perc_s,
`b5.7_beta[M]` = b_log_mass_s)
%>%
) pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
separate(name, into = c("fit", "parameter"), sep = "_") %>%
# complete(fit, parameter) %>%
ggplot(aes(x = mean, y = fit, xmin = ll, xmax = ul)) +
geom_pointrange(color = "firebrick") +
geom_hline(yintercept = 0, color = "firebrick", alpha = 1/5) +
ylab(NULL) +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_rect(fill = "transparent", color = "transparent")) +
facet_wrap(~ parameter, ncol = 1, labeller = label_parsed)
```

On page 151, McElreath suggested we look at a pairs plot to get a sense of the zero-order correlations. We did that once with the raw data. Here it is, again, but with the transformed variables.

```
%>%
dcc select(ends_with("_s")) %>%
pairs(col = "firebrick4")
```

Have you noticed how un-**tidyverse**-like those `pairs()`

plots are? I have. Within the **tidyverse**, you can make custom pairs plots with the **GGally** package (Schloerke et al., 2021), which will also compute the point estimates for the bivariate correlations. Here’s a default-style plot.

```
library(GGally)
%>%
dcc select(ends_with("_s")) %>%
ggpairs()
```

But you can customize these, too. E.g.,

```
# define custom functions
<- function(data, mapping, ...) {
my_diag ggplot(data = data, mapping = mapping) +
geom_density(fill = "firebrick4", linewidth = 0)
}
<- function(data, mapping, ...) {
my_lower ggplot(data = data, mapping = mapping) +
geom_smooth(method = "lm", color = "firebrick4", linewidth = 1,
se = F) +
geom_point(color = "firebrick", alpha = .8, size = 1/3)
}
# plot
%>%
dcc select(ends_with("_s")) %>%
ggpairs(upper = list(continuous = wrap("cor", family = "sans", color = "black")),
# plug those custom functions into `ggpairs()`
diag = list(continuous = my_diag),
lower = list(continuous = my_lower)) +
theme_bw() +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
strip.background = element_rect(fill = "white", color = "white"))
```

What the regression model does is ask if species that have high neocortex percent

for their body masshave higher milk energy. Likewise, the model asks if species with high body massfor their neocortex percenthave higher milk energy. Bigger species, like apes, have milk with less energy. But species with more neocortex tend to have richer milk. The fact that these two variables, body size and neocortex, are correlated across species makes it hard to see these relationships, unless we account for both.Some DAGs will help. (p. 148,

emphasisin the original)

Here are three. I’m not aware we can facet `dagify()`

objects. But we can take cues from Chapter 4 to link our three DAGs like McElreath did his. first, we’ll recognize the **ggplot2** code will be nearly identical for each DAG. So we can just wrap the **ggplot2** code into a compact function, like so.

```
<- function(d) {
gg_dag
%>%
d ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.2, 0.2)) +
theme_bw() +
theme(panel.grid = element_blank())
}
```

Now we’ll make the three individual DAGs, saving each.

```
# left DAG
<-
dag_coords tibble(name = c("M", "N", "K"),
x = c(1, 3, 2),
y = c(2, 2, 1))
<-
p1 dagify(N ~ M,
~ M + N,
K coords = dag_coords) %>%
gg_dag()
# middle DAG
<-
p2 dagify(M ~ N,
~ M + N,
K coords = dag_coords) %>%
gg_dag()
# right DAG
<-
dag_coords tibble(name = c("M", "N", "K", "U"),
x = c(1, 3, 2, 2),
y = c(2, 2, 1, 2))
<-
p3 dagify(M ~ U,
~ U,
N ~ M + N,
K coords = dag_coords) %>%
gg_dag() +
geom_point(x = 2, y = 2,
shape = 1, size = 10, stroke = 1.25, color = "firebrick4")
```

Now we combine our `gg_dag()`

plots together with **patchwork** syntax.

`+ p2 + p3 p1 `

Which of these graphs is right? We can’t tell from the data alone, because these graphs imply the same set of

conditional independencies. In this case, there are no conditional independencies–each DAG above implies that all pairs of variables are associated, regardless of what we condition on. A set of DAGs with the same conditional independencies is known as aMarkov equivalenceset. (p. 151,emphasisin the original).

Let’s make the counterfactual plots at the bottom of Figure 5.9. Here’s the one on the left.

```
<- tibble(neocortex.perc_s = seq(from = -2.5, to = 2, length.out = 30),
nd log_mass_s = 0)
<-
p1 fitted(b5.7,
newdata = nd,
probs = c(.025, .975, .25, .75)) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = neocortex.perc_s, y = Estimate)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "firebrick", alpha = 1/5) +
geom_smooth(aes(ymin = Q25, ymax = Q75),
stat = "identity",
fill = "firebrick4", color = "firebrick4", alpha = 1/5, linewidth = 1/2) +
coord_cartesian(xlim = range(dcc$neocortex.perc_s),
ylim = range(dcc$kcal.per.g_s)) +
labs(subtitle = "Counterfactual holding M = 0",
x = "neocortex percent (std)",
y = "kilocal per g (std)")
```

Now make Figure 5.9, bottom right, and combine the two.

```
<- tibble(log_mass_s = seq(from = -2.5, to = 2.5, length.out = 30),
nd neocortex.perc_s = 0)
<-
p2 fitted(b5.7,
newdata = nd,
probs = c(.025, .975, .25, .75)) %>%
data.frame() %>%
bind_cols(nd) %>%
ggplot(aes(x = log_mass_s, y = Estimate)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "firebrick", alpha = 1/5) +
geom_smooth(aes(ymin = Q25, ymax = Q75),
stat = "identity",
fill = "firebrick4", color = "firebrick4", alpha = 1/5, linewidth = 1/2) +
coord_cartesian(xlim = range(dcc$log_mass_s),
ylim = range(dcc$kcal.per.g_s)) +
labs(subtitle = "Counterfactual holding N = 0",
x = "log body mass (std)",
y = "kilocal per g (std)")
# combine
+ p2 +
p1 plot_annotation(title = "Figure 5.9 [bottom row]. Milk energy and neocortex among primates.") &
theme_bw() &
theme(panel.grid = element_blank())
```

#### 5.2.0.1 Overthinking: Simulating a masking relationship.

As a refresher, here’s our focal DAG.

```
<-
dag_coords tibble(name = c("M", "N", "K"),
x = c(1, 3, 2),
y = c(2, 2, 1))
dagify(N ~ M,
~ M + N,
K coords = dag_coords) %>%
gg_dag()
```

Now simulate data consistent with that DAG.

```
# how many cases would you like?
<- 100
n
set.seed(5)
<-
d_sim tibble(m = rnorm(n, mean = 0, sd = 1)) %>%
mutate(n = rnorm(n, mean = m, sd = 1)) %>%
mutate(k = rnorm(n, mean = n - m, sd = 1))
```

Use `ggpairs()`

to get a sense of what we just simulated.

```
%>%
d_sim ggpairs(upper = list(continuous = wrap("cor", family = "sans", color = "firebrick4")),
diag = list(continuous = my_diag),
lower = list(continuous = my_lower)) +
theme_bw() +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
strip.background = element_rect(fill = "white", color = "white"))
```

Here we fit the simulation models with a little help from the `update()`

function.

```
.7_sim <-
b5update(b5.7,
newdata = d_sim,
formula = k ~ 1 + n + m,
seed = 5,
file = "fits/b05.07_sim")
.5_sim <-
b5update(b5.7_sim,
formula = k ~ 1 + n,
seed = 5,
file = "fits/b05.05_sim")
.6_sim <-
b5update(b5.7_sim,
formula = k ~ 1 + m,
seed = 5,
file = "fits/b05.06_sim")
```

Compare the coefficients.

`fixef(b5.5_sim) %>% round(digits = 2)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept -0.02 0.10 -0.21 0.18
## n 0.58 0.08 0.43 0.73
```

`fixef(b5.6_sim) %>% round(digits = 2)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept 0.00 0.12 -0.23 0.23
## m 0.17 0.15 -0.12 0.46
```

`fixef(b5.7_sim) %>% round(digits = 2)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept -0.01 0.09 -0.18 0.17
## n 0.98 0.09 0.79 1.16
## m -0.88 0.15 -1.16 -0.59
```

Due to space considerations, I’m not going to show the code corresponding to the other two DAGs from the **R** code 5.43 block. Rather, I’ll leave that as an exercise for the interested reader.

Let’s do the preliminary work to making our DAGs.

```
.7 <- dagitty("dag{ M -> K <- N M -> N }" )
dag5
coordinates(dag5.7) <- list(x = c(M = 0, K = 1, N = 2),
y = c(M = 0.5, K = 1, N = 0.5))
```

If you just want a quick default plot, `ggdag::ggdag_equivalent_dags()`

is the way to go.

`ggdag_equivalent_dags(dag5.7)`

However, if you’d like to customize your DAGs, start with the `ggdag::node_equivalent_dags()`

function and build from there.

```
.7 %>%
dag5node_equivalent_dags() %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = "firebrick", alpha = 1/4, size = 10) +
geom_dag_text(color = "firebrick") +
geom_dag_edges(edge_color = "firebrick") +
scale_x_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(0.1, 0.1)) +
theme_bw() +
theme(panel.grid = element_blank(),
strip.background = element_blank()) +
facet_wrap(~ dag, labeller = label_both)
```

These all demonstrate Markov equivalence. I should note that I got help from the great Malcolm Barrett on how to make this plot with **ggdag**.

## 5.3 Categorical variables

Many readers will already know that variables like this, routinely called

factors, can easily be included in linear models. But what is not widely understood is how these variables are represented in a model… Knowing how the machine (golem) works both helps you interpret the posterior distribution and gives you additional power in building the model. (p. 153,emphasisin the original)

### 5.3.1 Binary categories.

Reload the `Howell1`

data.

```
data(Howell1, package = "rethinking")
<- Howell1
d rm(Howell1)
```

If you forgot what these data were like, take a `glimpse()`

.

```
%>%
d glimpse()
```

```
## Rows: 544
## Columns: 4
## $ height <dbl> 151.7650, 139.7000, 136.5250, 156.8450, 145.4150, 163.8300, 149.2250, 168.9100, 147…
## $ weight <dbl> 47.82561, 36.48581, 31.86484, 53.04191, 41.27687, 62.99259, 38.24348, 55.47997, 34.…
## $ age <dbl> 63.0, 63.0, 65.0, 41.0, 51.0, 35.0, 32.0, 27.0, 19.0, 54.0, 47.0, 66.0, 73.0, 20.0,…
## $ male <int> 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0,…
```

The

`male`

variable is our new predictor, an example of aindicator variable. Indicator variables—sometimes also called “dummy” variables–are devices for encoding unordered categories into quantitative models. There is no sense here in which “male” is one more than “female.” The purpose of the`male`

variable is to indicate when a person in the sample is “male.” So it takes the value 1 whenever the person is male, but it takes the value 0 when the person belongs to any other category. It doesn’t matter which category is indicated by the 1. The model won’t care. But correctly interpreting the model demands that you remember, so it’s a good idea to name the variable after the category assigned the 1 value. (p. 154,emphasisin the original)

The statistical model including a `male`

dummy might follow the form

\[\begin{align*} \text{height}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{male}_i \\ \alpha & \sim \operatorname{Normal}(178, 20) \\ \beta_1 & \sim \operatorname{Normal}(0, 10) \\ \sigma & \sim \operatorname{Uniform}(0, 50), \end{align*}\]

where \(\beta_1\) is the expected (i.e., average) difference between males and females for `height`

. Here we simulate from our priors and `summarise()`

the results.

```
set.seed(5)
<-
prior tibble(mu_female = rnorm(1e4, mean = 178, sd = 20)) %>%
mutate(mu_male = mu_female + rnorm(1e4, mean = 0, sd = 10))
%>%
prior pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
mutate_if(is.double, round, digits = 2)
```

```
## # A tibble: 2 × 5
## name mean sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 mu_female 178. 20.2 138. 219.
## 2 mu_male 178. 22.5 133. 222.
```

We might visualize the two prior predictive distributions as overlapping densities.

```
%>%
prior pivot_longer(everything()) %>%
ggplot(aes(x = value, fill = name, color = name)) +
geom_density(linewidth = 2/3, alpha = 2/3) +
scale_fill_manual(NULL, values = c("firebrick4", "black")) +
scale_color_manual(NULL, values = c("firebrick4", "black")) +
scale_y_continuous(NULL, breaks = NULL) +
xlab("prior predictive distribution for our dummy groups") +
theme_bw() +
theme(panel.grid = element_blank(),
legend.position = c(.82, .83))
```

Yep, this parameterization makes \(\alpha + \beta_1\) more uncertain than \(\alpha\). A nice alternative is to make an index variable. We’ll call it `sex`

, for which 1 = *female* and 2 = *male.* “No order is implied. These are just labels” (p. 155).

```
<-
d %>%
d mutate(sex = ifelse(male == 1, 2, 1))
head(d)
```

```
## height weight age male sex
## 1 151.765 47.82561 63 1 2
## 2 139.700 36.48581 63 0 1
## 3 136.525 31.86484 65 0 1
## 4 156.845 53.04191 41 1 2
## 5 145.415 41.27687 51 0 1
## 6 163.830 62.99259 35 1 2
```

We can update our statistical model to include `sex`

with the formula

\[\begin{align*} \text{height}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha_{\text{sex}[i]} \\ \alpha_j & \sim \operatorname{Normal}(178, 20) & \text{for } j = 1 \; \& \; 2 \\ \sigma & \sim \operatorname{Uniform}(0, 50), \end{align*}\]

where now we have rows indexed by \(i\) and two levels of `sex`

indexed by \(j\). Again, for our version of this model, we will continue using the simple \(\lambda = 1\) exponential prior on \(\sigma\), rather than the uniform. The exponential is just much easier on Stan than the uniform. But if you prefer to go uniform, have at it.

One more thing before we fit our model: Notice McElreath’s `a[sex]`

notation in his **R** code 5.48. I’m not aware that **brms** will accommodate this notation. The fix is easy. Just save `sex`

as a factor.

```
<-
d %>%
d mutate(sex = factor(sex))
```

We’re ready to fit the model.

```
.8 <-
b5brm(data = d,
family = gaussian,
~ 0 + sex,
height prior = c(prior(normal(178, 20), class = b),
prior(uniform(0, 50), class = sigma, ub = 50)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.08")
```

Behold the summary.

`print(b5.8)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: height ~ 0 + sex
## Data: d (Number of observations: 544)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sex1 134.88 1.65 131.65 138.10 1.00 4351 3126
## sex2 142.57 1.73 139.15 145.94 1.00 3573 2797
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 27.42 0.85 25.80 29.13 1.00 4062 2955
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Note that for us, there was no `depth=2`

argument to get all the model output. When you fit a model with **brms** that excludes the typical intercept parameter–when you use the `0 + ...`

syntax–, you’ll get a separate intercept for each of your factor variables. The `brm()`

function noticed there were two levels for our `sex`

factor, and therefore gave use two intercepts: `sex1`

and `sex2`

. Here’s how you might compute the difference score.

```
library(tidybayes)
as_draws_df(b5.8) %>%
mutate(diff_fm = b_sex1 - b_sex2) %>%
pivot_longer(cols = c(b_sex1:sigma, diff_fm)) %>%
group_by(name) %>%
mean_qi(value, .width = .89)
```

```
## # A tibble: 4 × 7
## name value .lower .upper .width .point .interval
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 b_sex1 135. 132. 138. 0.89 mean qi
## 2 b_sex2 143. 140. 145. 0.89 mean qi
## 3 diff_fm -7.68 -11.5 -3.74 0.89 mean qi
## 4 sigma 27.4 26.1 28.8 0.89 mean qi
```

Note how we used `tidybayes::mean_qi()`

to summarize our difference variable, `diff_fm`

. Anyway, “this kind of calculation is called a **contrast**. No matter how many categories you have, you can use samples from the posterior to compute the contrast between any two” (p. 156, **emphasis** in the original).

### 5.3.2 Many categories.

Binary categories are easy, whether you use an indicator variable or instead an index variable. But when there are more than two categories, the indicator variable approach explodes. You’ll need a new indicator variable for each new category. If you have \(k\) unique categories, you need \(k - 1\) indicator variables. Automated tools like R’s

`lm`

do in fact go this route, constructing \(k - 1\) indicator variables for you and returning \(k - 1\) parameters (in addition to the intercept).But we’ll instead stick with the index variable approach. It does not change at all when you add more categories. You do get more parameters, of course, just as many as in the indicator variable approach. But the model specification looks just like it does in the binary case. (p. 156)

We’ll practice with `milk`

.

```
data(milk, package = "rethinking")
<- milk
d rm(milk)
```

With the **tidyverse**, we can peek at `clade`

with `distinct()`

in the place of base **R** `unique()`

.

```
%>%
d distinct(clade)
```

```
## clade
## 1 Strepsirrhine
## 2 New World Monkey
## 3 Old World Monkey
## 4 Ape
```

Rather than make the `clade_id`

index variable, like McElreath did in the text, we’ll just use the `clade`

factor. It will actually work easier within the **brms** framework. We will, however, standardize the `kcal.per.g`

variable, again.

```
<-
d %>%
d mutate(kcal.per.g_s = (kcal.per.g - mean(kcal.per.g)) / sd(kcal.per.g))
```

Our statistical model follows the form

\[\begin{align*} \text{kcal.per.g_s}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha_{\text{clade}[i]} \\ \alpha_j & \sim \operatorname{Normal}(0, 0.5), & \text{for } j = 1, \dots, 4 \\ \sigma & \sim \operatorname{Exponential}(1). \end{align*}\]

Now fit that model.

```
.9 <-
b5brm(data = d,
family = gaussian,
~ 0 + clade,
kcal.per.g_s prior = c(prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.09")
```

`print(b5.9)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 0 + clade
## Data: d (Number of observations: 29)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## cladeApe -0.46 0.24 -0.93 0.03 1.00 4645 2351
## cladeNewWorldMonkey 0.35 0.24 -0.14 0.82 1.00 4349 2532
## cladeOldWorldMonkey 0.63 0.28 0.08 1.16 1.00 4367 2640
## cladeStrepsirrhine -0.55 0.29 -1.11 0.05 1.00 4537 2578
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.81 0.12 0.61 1.08 1.00 3400 3161
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Up to this point, all of our coefficient plots have been of a rather complicated type. We tried to mimic McElreath’s `coeftab()`

plots without the aid of the **rethinking** convenience function. But now the coefficient plot from McElreath’s **R** code 5.42 is of a much simpler type. We can finally take it easy and use some of the convenience functions available to us within our framework.

The `mcmc_plot()`

function is an easy way to get a default coefficient plot. You just put the **brms** fit object into the function.

`mcmc_plot(b5.9, variable = "^b_", regex = TRUE)`

There are numerous ways to make a coefficient plot. Another is with the `mcmc_intervals()`

function from the **bayesplot** package (Gabry et al., 2019; Gabry & Mahr, 2022). A nice feature of the **bayesplot** package is its convenient way to alter the color scheme with the `color_scheme_set()`

function. Here, for example, we’ll make the theme `red`

. But note how the `mcmc_intervals()`

function requires you to work with the `as_draws_df()`

instead of the brmsfit object.

```
library(bayesplot)
color_scheme_set("red")
<- as_draws_df(b5.9)
post
%>%
post select(starts_with("b_")) %>%
mcmc_intervals(prob = .5,
point_est = "median") +
labs(title = "My fancy bayesplot-based coefficient plot") +
theme_bw() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
panel.grid = element_blank())
```

Because **bayesplot** returns a **ggplot2** object, the plot was adjustable with familiar **ggplot2** syntax. For more ideas, check out Gabry’s (2022) vignette, *Plotting MCMC draws using the bayesplot package*.

The `tidybayes::stat_pointinterval()`

function offers a third way, this time with a more ground-up **ggplot2** workflow.

```
library(tidybayes)
%>%
post select(starts_with("b")) %>%
set_names(distinct(d, clade) %>% arrange(clade) %>% pull()) %>%
pivot_longer(everything()) %>%
ggplot(aes(x = value, y = reorder(name, value))) + # note how we used `reorder()` to arrange the coefficients
geom_vline(xintercept = 0, color = "firebrick4", alpha = 1/10) +
stat_pointinterval(point_interval = mode_hdi, .width = .89,
size = 1, color = "firebrick4") +
labs(title = "My tidybayes-based coefficient plot",
x = "expected kcal (std)",
y = NULL) +
theme_bw() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
panel.grid = element_blank())
```

Okay, let’s simulate some “made up categories” (p. 157). We’ll use names rather than numeric indices.

```
<- c("Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin")
houses
set.seed(63)
<-
d %>%
d mutate(house = sample(rep(houses, each = 8), size = n()))
```

Here we attempt to fit the model with **brms** using the naïve approach. ⚠️

```
.10 <-
b5brm(data = d,
family = gaussian,
~ 0 + clade + house,
kcal.per.g_s prior = c(prior(normal(0, 0.5), class = b),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.10")
```

Yep, the parameter summary suggests Slytherin stood out.

`print(b5.10)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 0 + clade + house
## Data: d (Number of observations: 29)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## cladeApe -0.43 0.26 -0.94 0.06 1.00 4510 3425
## cladeNewWorldMonkey 0.33 0.26 -0.18 0.84 1.00 3689 2790
## cladeOldWorldMonkey 0.50 0.30 -0.08 1.07 1.00 4758 3304
## cladeStrepsirrhine -0.50 0.30 -1.06 0.12 1.00 3895 2750
## houseHufflepuff -0.17 0.28 -0.72 0.40 1.00 4577 3203
## houseRavenclaw -0.13 0.27 -0.66 0.40 1.00 3567 3257
## houseSlytherin 0.49 0.30 -0.12 1.09 1.00 4052 3005
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.77 0.12 0.58 1.04 1.00 4050 2783
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Even though that last model fit well and is a fine and valid way to examine the relevant variables, it’s a bit misleading. If you look closely at the output, you’ll see there are rows for all four of the `clade`

levels, but only rows for three of the four levels of `house`

. This is easy to miss because McElreath didn’t show the `precis()`

output from his `m5.10`

, but if you run and inspect his code for yourself you’ll see it returns all rows for all four of the `house`

variable.

What gives?

**brms** syntax is flexible in that it provided multiple ways to fit and post-process models. The `formula`

syntax we use most of the type with **brms** is called the *design formula* syntax. In the first edition of his text (2015, pp. 159–161), McElreath contrasted design formula syntax, which is widely used in base **R** `lm()`

, the **lme4** package, and **brms**, with the more explicit syntax he uses in his **rethinking** package.

I bring this all up because the **rethinking** package has no problem handling multiple index variables. Just use McElreath’s slick bracket code like he did with his `m5.10`

(i.e., `mu <- a[clade_id] + h[house]`

). In such a model, there is no default intercept. That is, McElreath’s `m5.10`

followed the statistical formula of

\[\begin{align*} \text{kcal.per.g_s}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha_{\color{#8B1A1A}{\text{clade}[i]}} + \alpha_{\color{#8B1A1A}{\text{house}[i]}} \\ \alpha_{\color{#8B1A1A}{\text{clade}, j}} & \sim \operatorname{Normal}(0, 0.5), && \color{#8B1A1A}{\text{for } j = 1, \dots, 4} \\ \alpha_{\color{#8B1A1A}{\text{house}, k}} & \sim \operatorname{Normal}(0, 0.5), && \color{#8B1A1A}{\text{for } k = 1, \dots, 4} \\ \sigma & \sim \operatorname{Exponential}(1), \end{align*}\]

where there is an \(\alpha_\text{clade}\) “intercept” for each of the four levels of `clade`

and an \(\alpha_\text{house}\) “intercept” for each of the four levels of `house`

. But there is no **overall** intercept, \(\alpha\), that stands for the expected value when all the predictors are set to 0.

When we use the typical `formula`

syntax with **brms**, we can suppress the overall intercept when for a single index variable with the `<criterion> ~ 0 + <index variable>`

syntax. That’s exactly what we did with our `b5.9`

model. The catch is this approach only works with one index variable within **brms**. Even though we suppressed the default intercept with our `b5.10`

formula, `kcal.per.g_s ~ 0 + clade + house`

, we ended up loosing the first category of the second variable, `house`

.

Fortunately, we have a fix, which I learned about from one of Bürkner’s answers to Paul Dong’s thread in the Stan Forums, *Indexing approach to predictors*. The solution is the use the non-linear syntax. If you recall, our first encounter with the **brms** non-linear syntax was with model `b4.3b`

in Section 4.4.2.1 where we used it to model the exponent of a parameter. Here we’ll use it to model our two index variables while excluding an overall or reference-category intercept. I’m still going to wait until Section 6.2.1 before walking out the non-linear syntax in detail. But if you’re burning with curiosity, check out Bürkner’s (2022e) vignette, *Estimating non-linear models with brms*. For now, we focus on how to faithfully replicate McElreath’s `m510`

with `brms::brm()`

. In the code below, we defined our two model parameters, `a`

and `h`

, in the first `bf()`

line, connected each to a index variable in the next two lines, and closed out the `bf()`

statement with `nl = TRUE`

to alert `brm()`

we were using the non-linear syntax.

```
.11 <-
b5brm(data = d,
family = gaussian,
bf(kcal.per.g_s ~ 0 + a + h,
~ 0 + clade,
a ~ 0 + house,
h nl = TRUE),
prior = c(prior(normal(0, 0.5), nlpar = a),
prior(normal(0, 0.5), nlpar = h),
prior(exponential(1), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.11")
```

Here’s the summary of the correct model.

`print(b5.11)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 0 + a + h
## a ~ 0 + clade
## h ~ 0 + house
## Data: d (Number of observations: 29)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a_cladeApe -0.40 0.28 -0.95 0.15 1.00 3540 3363
## a_cladeNewWorldMonkey 0.36 0.29 -0.22 0.91 1.00 2890 3309
## a_cladeOldWorldMonkey 0.52 0.31 -0.08 1.13 1.00 3627 3094
## a_cladeStrepsirrhine -0.46 0.32 -1.08 0.16 1.00 3319 2727
## h_houseGryffindor -0.09 0.28 -0.66 0.46 1.00 2941 2943
## h_houseHufflepuff -0.19 0.29 -0.75 0.39 1.00 3720 3335
## h_houseRavenclaw -0.15 0.29 -0.71 0.39 1.00 3048 3124
## h_houseSlytherin 0.47 0.31 -0.14 1.08 1.00 3665 3327
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.78 0.12 0.58 1.05 1.00 4273 3298
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Finally, here’s how we might visualize the results in a faceted coefficient plot.

```
as_draws_df(b5.11) %>%
pivot_longer(starts_with("b_")) %>%
mutate(name = str_remove(name, "b_") %>%
str_remove(., "clade") %>%
str_remove(., "house") %>%
str_replace(., "World", " World ")) %>%
separate(name, into = c("predictor", "level"), sep = "_") %>%
mutate(predictor = if_else(predictor == "a", "predictor: clade", "predictor: house")) %>%
ggplot(aes(x = value, y = reorder(level, value))) + # note how we used `reorder()` to arrange the coefficients
geom_vline(xintercept = 0, color = "firebrick4", alpha = 1/10) +
stat_pointinterval(point_interval = mode_hdi, .width = .89,
size = 1, color = "firebrick4") +
labs(x = "expected kcal (std)",
y = NULL) +
theme_bw() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
strip.background = element_blank()) +
facet_wrap(~ predictor, scales = "free_y")
```

Just for fun, we used 89% intervals.

#### 5.3.2.1 Rethinking: Differences and statistical significance.

A common error in interpretation of parameter estimates is to suppose that because one parameter is sufficiently far from zero–is “significant”–and another parameter is not–is “not significant”–that the difference between the parameters is also significant. This is not necessarily so. This isn’t just an issue for non-Bayesian analysis: If you want to know the distribution of a difference, then you must compute that difference, a

contrast. (p. 158,emphasisin the original)

This reminds me of a paper by Gelman and Stern (2006), *The difference between “significant” and “not significant” is not itself statistically significant*.

## 5.4 ~~Summary~~ Bonus: We can model categorical variables in more ways than one

One of the noteworthy changes in the second edition of *Statistical rethinking* is McElreath’s consistent use of the index variable approach. It’s a fine approach and I’m glad he emphasized it. But as McElreath alluded to in the early part of Section 5.3.1, we have other modeling options at our disposal. I’d like to briefly walk through three more.

### 5.4.1 You can use a dummy.

The first alternative is the one McElreath mentioned directly at the top of page 154. If you have \(K\) categories in a nominal variable, you can depict the categories in the model with \(K - 1\) dummy variables. To start simple, consider the case of the \(K = 2\) variable `sex`

we used in `b5.8`

.

```
.8$data %>%
b5head()
```

```
## height sex
## 1 151.765 2
## 2 139.700 1
## 3 136.525 1
## 4 156.845 2
## 5 145.415 1
## 6 163.830 2
```

We can convert our index variable `sex`

into two dummy variables, which we might call `male`

and `female`

.

```
<-
d .8$data %>%
b5mutate(male = if_else(sex == "1", 1, 0),
female = if_else(sex == "2", 1, 0))
head(d)
```

```
## height sex male female
## 1 151.765 2 0 1
## 2 139.700 1 1 0
## 3 136.525 1 1 0
## 4 156.845 2 0 1
## 5 145.415 1 1 0
## 6 163.830 2 0 1
```

As with the index variable `sex`

, no order is implied by the 0’s and 1’s in our `male`

and `female`

dummies. But remember, we only need \(K - 1\) dummies, not both. If we wanted to fit the model using the `female`

dummy variable, we might express the statistical model as

\[\begin{align*} \text{height}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{female}_i \\ \alpha & \sim \operatorname{Normal}(178, 20) \\ \beta_1 & \sim \operatorname{Normal}(0, 10) \\ \sigma & \sim \operatorname{Uniform}(0, 50), \end{align*}\]

where \(\alpha\) would be the intercept (i.e., the mean) for males and \(\beta_1\) would be the expected change in the mean for females, relative to males. This parameterization makes males the reference category. Here’s how to fit the model with **brms**.

```
.8b <-
b5brm(data = d,
family = gaussian,
~ 1 + female,
height prior = c(prior(normal(178, 20), class = Intercept),
prior(normal(0, 10), class = b),
prior(uniform(0, 50), class = sigma, ub = 50)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.08b")
```

`print(b5.8b)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: height ~ 1 + female
## Data: d (Number of observations: 544)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 134.98 1.64 131.74 138.19 1.00 3812 2983
## female 7.22 2.30 2.88 11.86 1.00 3719 2773
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 27.41 0.83 25.85 29.05 1.00 3335 2716
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

I’m not going to show it, here, but this approach generalizes to instances where you have 3 or more categories. In the case of the monkey model `b5.9`

, we had a \(K = 4\) variable `clade`

, which could have been entered in the model with three dummy variables instead. Whichever of the four levels of `clade`

that is not entered into the model as a dummy, that’s the level that would become the reference category.

### 5.4.2 Consider contrast coding.

Contrast coding offers something of a blend between the index and dummy approach. Consider out new contrast variable, `sex_c`

.

```
<-
d %>%
d mutate(sex_c = if_else(sex == "1", -0.5, 0.5))
head(d)
```

```
## height sex male female sex_c
## 1 151.765 2 0 1 0.5
## 2 139.700 1 1 0 -0.5
## 3 136.525 1 1 0 -0.5
## 4 156.845 2 0 1 0.5
## 5 145.415 1 1 0 -0.5
## 6 163.830 2 0 1 0.5
```

`sex_c == -0.5`

for males and `sex_c == 0.5`

for females. We might express our revised statistical model as

\[\begin{align*} \text{height}_i & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \beta_1 \text{sex_c}_i \\ \alpha & \sim \operatorname{Normal}(178, 20) \\ \beta_1 & \sim \operatorname{Normal}(0, 10) \\ \sigma & \sim \operatorname{Uniform}(0, 50), \end{align*}\]

where \(\alpha\) is now the average of the mean heights of males and females and \(\beta_1\) is the difference when subtracting the mean `height`

of males from the mean `height`

of females. Fit the model.

```
.8c <-
b5brm(data = d,
family = gaussian,
~ 1 + sex_c,
height prior = c(prior(normal(178, 20), class = Intercept),
prior(normal(0, 10), class = b),
prior(uniform(0, 50), class = sigma, ub = 50)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.08c")
```

Inspect the summary.

`print(b5.8c)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: height ~ 1 + sex_c
## Data: d (Number of observations: 544)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 138.59 1.20 136.29 140.94 1.00 4018 2824
## sex_c 7.22 2.30 2.88 11.86 1.00 3719 2773
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 27.41 0.83 25.85 29.05 1.00 3335 2716
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

If you take a look, you’ll discover that posterior summary for \(\beta_1\) is basically the same, here, as it was for `b5.8b`

. The intercept \(\alpha\) is a little more complicated. To build a sense of what it means, first consider the simple mean for `height`

, ignoring `sex`

, computed directly from the data.

```
%>%
d summarise(mean_height = mean(height))
```

```
## mean_height
## 1 138.2636
```

Now here is the average of the mean of `height`

when first computed separately by levels of `sex`

.

```
%>%
d group_by(sex) %>%
summarise(group_mean = mean(height)) %>%
summarise(average_of_the_group_means_in_height = mean(group_mean))
```

```
## # A tibble: 1 × 1
## average_of_the_group_means_in_height
## <dbl>
## 1 138.
```

Our posterior for \(\alpha\), above, is designed to capture the `average_of_the_group_means_in_height`

, not `mean_height`

. In cases where the sample sizes in the two groups were equal, these two would be same. Since we have different numbers of males and females in our data, the two values differ a bit.

```
%>%
d count(sex) %>%
mutate(percent = 100 * n / sum(n))
```

```
## sex n percent
## 1 1 287 52.75735
## 2 2 257 47.24265
```

Anyway, here’s how we might wrangle the posterior samples to make our marginal posteriors for males, females, and their difference.

```
as_draws_df(b5.8c) %>%
mutate(male = b_Intercept - b_sex_c * 0.5,
female = b_Intercept + b_sex_c * 0.5,
`female - male` = b_sex_c) %>%
pivot_longer(male:`female - male`) %>%
ggplot(aes(x = value, y = 0)) +
stat_halfeye(.width = .95,
fill = "firebrick4", normalize = "panels") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("height") +
theme_bw() +
theme(panel.grid = element_blank()) +
facet_wrap(~ name, scales = "free")
```

This approach can also be generalized to contexts where you have three or more groups. I’m not going to show you how to do so, here. It’s a deep rabbit hole and I’m just not interested in shoveling that hard, today. For those who want more, I’ll provide some references at the end.

### 5.4.3 There’s always the multilevel ANOVA approach.

It’s a little absurd I’m covering this, here, because (a) I’m not a fan of the ANOVA framework and (b) McElreath didn’t properly introduce the multilevel framework until Chapter 13. But we’ll throw this in, anyway. For those of you who are new to the multilevel model, make a mental note of this section, back away slowly, and come back after you’ve worked through the later chapters in this text. For those of y’all who have some experience with the multilevel model, perhaps this will offer an application you hadn’t considered.

For this approach, let’s consider our \(K = 4\) variable `clade`

from back in `b5.9`

. We used `clade`

as an index variable to model four means for standardized kilocalories per gram of milk `kcal.per.g_s`

.

```
<-
d .9$data
b5
head(d)
```

```
## kcal.per.g_s clade
## 1 -0.9400408 Strepsirrhine
## 2 -0.8161263 Strepsirrhine
## 3 -1.1259125 Strepsirrhine
## 4 -1.0019980 Strepsirrhine
## 5 -0.2585112 Strepsirrhine
## 6 -1.0639553 New World Monkey
```

Our multilevel ANOVA approach will give us an overall intercept (mean) for `kcal.per.g_s`

as well as four `clade`

-specific deviations from that overall intercept. One way to express this might be

\[\begin{align*} \text{kcal.per.g_s} & \sim \operatorname{Normal}(\mu_i, \sigma) \\ \mu_i & = \alpha + \color{#8B1A1A}{u_{j[i]}} \\ \alpha & \sim \operatorname{Normal}(0, 0.5) \\ \sigma & \sim \operatorname{Exponential}(1) \\ \color{#8B1A1A}{u_j} & \color{#8B1A1A}\sim \color{#8B1A1A}{\operatorname{Normal}(0, \sigma_\text{clade})}, & \color{#8B1A1A}{\text{for } j = 1, \dots, 4} \\ \color{#8B1A1A}{\sigma_\text{clade}} & \color{#8B1A1A}\sim \color{#8B1A1A}{\operatorname{Exponential}(1)}, \end{align*}\]

where \(\alpha\) is the overall intercept (mean) and the four `clade`

-specific deviations from that mean are captured by the four levels of \(u_j\), which are themselves modeled as normally distributed with a mean of zero (because they are deviations, after all) and a standard deviation \(\sigma_\text{clade}\). Here’s how to fit such a model with **brms**.

```
.9b <-
b5brm(data = d,
family = gaussian,
~ 1 + (1 | clade),
kcal.per.g_s prior = c(prior(normal(0, 0.5), class = Intercept),
prior(exponential(1), class = sigma),
prior(exponential(1), class = sd)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 5,
file = "fits/b05.09b")
```

Check the results.

`print(b5.9b)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: kcal.per.g_s ~ 1 + (1 | clade)
## Data: d (Number of observations: 29)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~clade (Number of levels: 4)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.86 0.44 0.29 2.00 1.00 1079 1489
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.01 0.32 -0.68 0.62 1.00 1476 1731
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.80 0.12 0.61 1.07 1.00 2257 2410
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

The ‘Group-Level Effects:’ section in that output won’t make much sense until we cover the material in Chapter 13. Here’s how one might wrangle the posterior samples to make a coefficient plot similar to the one we made for the original `b5.9`

.

```
as_draws_df(b5.9b) %>%
mutate(Ape = b_Intercept + `r_clade[Ape,Intercept]`,
`New World Monkey` = b_Intercept + `r_clade[New.World.Monkey,Intercept]`,
`Old World Monkey` = b_Intercept + `r_clade[Old.World.Monkey,Intercept]`,
Strepsirrhine = b_Intercept + `r_clade[Strepsirrhine,Intercept]`) %>%
pivot_longer(Ape:Strepsirrhine) %>%
ggplot(aes(x = value, y = reorder(name, value))) +
geom_vline(xintercept = 0, color = "firebrick4", alpha = 1/10) +
stat_pointinterval(point_interval = mode_hdi, .width = .89,
size = 1, color = "firebrick4") +
labs(title = "My tidybayes-based coefficient plot for b5.9b",
x = "expected kcal (std)",
y = NULL) +
theme_bw() +
theme(axis.text.y = element_text(hjust = 0),
axis.ticks.y = element_blank(),
panel.grid = element_blank())
```

As we will learn, an advantage of this approach is we get improved estimates due to multilevel partial pooling. Technically, you can apply this approach to cases with as few as \(K = 2\) categories. In practice, you’ll probably want to use this only when \(K \geq 3\).

### 5.4.4 Would you like to learn more?

To learn more about the dummy coding approach, McElreath covered it in Chapter 5 of his first (2015) edition, Hayes covered it in Chapter 2 of his (2017) text, and it’s been covered in many other texts, such as the modern classic by Cohen et al. (2013). I have ebook translations for the first of those two (Kurz, 2023b, 2023c). If you like learning by video, the great Erin Buchanan has a video or two (e.g., here) on her Statistics of DOOM chanel.

The specific type of contrast coding we used is part of a more general approach. You can learn more about that in Cohen et al (2013, Chapter 8) or from this vignette by the good folks at the UCLA Institute for Digital Research and Education (*R Library Contrast Coding Systems for Categorical Variables*, n.d.).

In addition to waiting for Chapter 13 in this ebook, you can learn more about the multilevel ANOVA approach in Chapters 19 and 24 of Kruschke’s (2015), in the corresponding chapters in my (2023a) ebook translation of the same, and in some of Gelman’s peer-reviewed work (Gelman, 2005, 2006).

## Session info

`sessionInfo()`

```
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] bayesplot_1.10.0 tidybayes_3.0.2 GGally_2.1.2 dagitty_0.3-1 psych_2.2.9
## [6] patchwork_1.1.2 ggdag_0.2.7 tigris_1.6.1 ggrepel_0.9.2 brms_2.18.0
## [11] Rcpp_1.0.9 forcats_0.5.1 stringr_1.4.1 dplyr_1.0.10 purrr_1.0.1
## [16] readr_2.1.2 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 tidyselect_1.2.0 lme4_1.1-31 htmlwidgets_1.5.4
## [5] grid_4.2.2 maptools_1.1-4 munsell_0.5.0 codetools_0.2-18
## [9] units_0.8-0 DT_0.24 miniUI_0.1.1.1 withr_2.5.0
## [13] Brobdingnag_1.2-8 colorspace_2.0-3 highr_0.9 knitr_1.40
## [17] uuid_1.1-0 rstudioapi_0.13 stats4_4.2.2 labeling_0.4.2
## [21] emmeans_1.8.0 rstan_2.21.8 mnormt_2.1.0 polyclip_1.10-0
## [25] farver_2.1.1 bridgesampling_1.1-2 coda_0.19-4 vctrs_0.5.1
## [29] generics_0.1.3 TH.data_1.1-1 xfun_0.35 R6_2.5.1
## [33] markdown_1.1 graphlayouts_0.8.1 HDInterval_0.2.4 gamm4_0.2-6
## [37] projpred_2.2.1 reshape_0.8.9 cachem_1.0.6 assertthat_0.2.1
## [41] promises_1.2.0.1 scales_1.2.1 multcomp_1.4-20 ggraph_2.0.6
## [45] googlesheets4_1.0.1 gtable_0.3.1 rethinking_2.21 processx_3.8.0
## [49] tidygraph_1.2.1 sandwich_3.0-2 rlang_1.0.6 splines_4.2.2
## [53] rgdal_1.5-30 gargle_1.2.0 broom_1.0.2 checkmate_2.1.0
## [57] inline_0.3.19 reshape2_1.4.4 abind_1.4-5 modelr_0.1.8
## [61] threejs_0.3.3 crosstalk_1.2.0 backports_1.4.1 httpuv_1.6.5
## [65] tensorA_0.36.2 tools_4.2.2 bookdown_0.28 ellipsis_0.3.2
## [69] RColorBrewer_1.1-3 jquerylib_0.1.4 posterior_1.3.1 proxy_0.4-27
## [73] plyr_1.8.7 base64enc_0.1-3 classInt_0.4-7 ps_1.7.2
## [77] prettyunits_1.1.1 viridis_0.6.2 zoo_1.8-10 haven_2.5.1
## [81] fs_1.5.2 magrittr_2.0.3 ggdist_3.2.1 colourpicker_1.1.1
## [85] reprex_2.0.2 googledrive_2.0.0 mvtnorm_1.1-3 matrixStats_0.63.0
## [89] arrayhelpers_1.1-0 hms_1.1.1 shinyjs_2.1.0 mime_0.12
## [93] evaluate_0.18 xtable_1.8-4 shinystan_2.6.0 readxl_1.4.1
## [97] gridExtra_2.3 shape_1.4.6 rstantools_2.2.0 compiler_4.2.2
## [101] KernSmooth_2.23-20 V8_4.2.1 crayon_1.5.2 minqa_1.2.5
## [105] StanHeaders_2.21.0-7 htmltools_0.5.3 mgcv_1.8-41 later_1.3.0
## [109] tzdb_0.3.0 RcppParallel_5.1.5 lubridate_1.8.0 DBI_1.1.3
## [113] tweenr_2.0.0 dbplyr_2.2.1 MASS_7.3-58.1 rappdirs_0.3.3
## [117] sf_1.0-7 boot_1.3-28 Matrix_1.5-1 cli_3.6.0
## [121] parallel_4.2.2 igraph_1.3.4 pkgconfig_2.0.3 foreign_0.8-83
## [125] sp_1.5-0 svUnit_1.0.6 xml2_1.3.3 dygraphs_1.1.1.6
## [129] bslib_0.4.0 estimability_1.4.1 rvest_1.0.2 distributional_0.3.1
## [133] callr_3.7.3 digest_0.6.31 rmarkdown_2.16 cellranger_1.1.0
## [137] curl_4.3.2 shiny_1.7.2 gtools_3.9.4 nloptr_2.0.3
## [141] lifecycle_1.0.3 nlme_3.1-160 jsonlite_1.8.4 cmdstanr_0.5.3
## [145] viridisLite_0.4.1 fansi_1.0.3 pillar_1.8.1 lattice_0.20-45
## [149] loo_2.5.1 fastmap_1.1.0 httr_1.4.4 pkgbuild_1.3.1
## [153] survival_3.4-0 emo_0.0.0.9000 glue_1.6.2 xts_0.12.1
## [157] shinythemes_1.2.0 ggforce_0.4.1 class_7.3-20 stringi_1.7.8
## [161] sass_0.4.2 e1071_1.7-11
```

### References

*An introduction to ggdag*. https://CRAN.R-project.org/package=ggdag/vignettes/intro-to-ggdag.html

*ggdag: Analyze and create elegant directed acyclic graphs*. https://CRAN.R-project.org/package=ggdag

*Estimating multivariate models with brms*. https://CRAN.R-project.org/package=brms/vignettes/brms_multivariate.html

*Estimating non-linear models with brms*. https://CRAN.R-project.org/package=brms/vignettes/brms_nonlinear.html

*brms reference manual, Version 2.18.0*. https://CRAN.R-project.org/package=brms/brms.pdf

*Applied multiple regression/correlation analysis for the behavioral sciences*(Third Edition). Routledge. https://doi.org/10.4324/9780203774441

*Journal of Animal Ecology*,

*71*(3), 542–545. https://doi.org/10.1046/j.1365-2656.2002.00618.x

*Plotting MCMC draws using the bayesplot package*. https://CRAN.R-project.org/package=bayesplot/vignettes/plotting-mcmc-draws.html

*bayesplot: Plotting for Bayesian models*. https://CRAN.R-project.org/package=bayesplot

*Journal of the Royal Statistical Society: Series A (Statistics in Society)*,

*182*(2), 389–402. https://doi.org/10.1111/rssa.12378

*Annals of Statistics*,

*33*(1), 1–53. https://doi.org/10.1214/009053604000001048

*Bayesian Analysis*,

*1*(3), 515–534. https://doi.org/10.1214/06-BA117A

*The American Statistician*,

*60*(4), 328–331. https://doi.org/10.1198/000313006X152649

*Introduction to mediation, moderation, and conditional process analysis: A regression-based approach*. Guilford publications. https://www.guilford.com/books/Introduction-to-Mediation-Moderation-and-Conditional-Process-Analysis/Andrew-Hayes/9781462534654

*Evolutionary Anthropology: Issues, News, and Reviews*,

*20*(1), 9–23. https://doi.org/10.1002/evan.20289

*Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan*. Academic Press. https://sites.google.com/site/doingbayesiandataanalysis/

*Doing Bayesian data analysis in brms and the tidyverse*(Version 1.1.0). https://bookdown.org/content/3686/

*Statistical Rethinking with brms, ggplot2, and the tidyverse*(version 1.3.0). https://bookdown.org/content/3890/

*Recoding Introduction to mediation, moderation, and conditional process analysis*(version 1.3.0). https://bookdown.org/content/b472c7b3-ede5-40f0-9677-75c3704c7e5c/

*Statistical rethinking: A Bayesian course with examples in R and Stan*(Second Edition). CRC Press. https://xcelab.net/rm/statistical-rethinking/

*Statistical rethinking: A Bayesian course with examples in R and Stan*. CRC press. https://xcelab.net/rm/statistical-rethinking/

*Psychological Reports*,

*66*(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195

*R Library Contrast Coding Systems for categorical variables*. (n.d.). UCLA: Statistical Consulting Group. Retrieved October 14, 2020, from https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/

*psych: Procedures for psychological, psychometric, and personality research*. https://CRAN.R-project.org/package=psych

*GGally: Extension to ’ggplot2’*. https://CRAN.R-project.org/package=GGally

*ggrepel: Automatically position non-overlapping text labels with ’ggplot2’*. https://CRAN.R-project.org/package=ggrepel

*dagitty: Graphical analysis of structural causal models*. https://CRAN.R-project.org/package=dagitty

*International Journal of Epidemiology*,

*45*(6), 1887–1894. https://doi.org/10.1093/ije/dyw341

*Tigris: Load census TIGER/Line shapefiles*[Manual]. https://github.com/walkerke/tigris

Note, there is a small difficulty with this

**tigris**-oriented workflow. The issue is that the data frame is of class sf, and objects of class sf don’t pivot nicely with the`pivot_longer()`

function unless you specifically refer to the columns within quotes, as in`pivot_longer(cols = c("d", "m", "a"))`

. The details are in GitHub issue #1171 for the**tidyr**package.↩︎