## 2.4 Multiple linear regression

here’s nothing particularly special about jumping from univariable to multivariable models in brms. You just keep tacking on predictors with the `+`

operator.

```
model4 <-
brm(data = glbwarm, family = gaussian,
govact ~ 1 + negemot + posemot + ideology + sex + age,
chains = 4, cores = 4)
```

```
print(model4)
#> Family: gaussian
#> Links: mu = identity; sigma = identity
#> Formula: govact ~ 1 + negemot + posemot + ideology + sex + age
#> Data: glbwarm (Number of observations: 815)
#> Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup samples = 4000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> Intercept 4.06 0.20 3.66 4.45 4000 1.00
#> negemot 0.44 0.03 0.39 0.49 4000 1.00
#> posemot -0.03 0.03 -0.08 0.03 4000 1.00
#> ideology -0.22 0.03 -0.27 -0.16 4000 1.00
#> sex -0.01 0.08 -0.16 0.14 4000 1.00
#> age -0.00 0.00 -0.01 0.00 4000 1.00
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
#> sigma 1.07 0.03 1.02 1.12 4000 1.00
#>
#> Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
#> is a crude measure of effective sample size, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```

Here is the posterior mean, what you might call the Bayesian point estimate, for someone with:

- negative emotions = 3,
- positive emotions = 4,
`ideology`

= 2,- is male (i.e.,
`sex`

= 1), and - is 30 years of
`age`

```
fixef(model4)[1] +
fixef(model4)[2]*3 +
fixef(model4)[3]*4 +
fixef(model4)[4]*2 +
fixef(model4)[5]*1 +
fixef(model4)[6]*30
#> [1] 4.79
```

Here’s the same deal for a man of the same profile, but with one point higher on `negemot`

.

```
fixef(model4)[1] +
fixef(model4)[2]*4 +
fixef(model4)[3]*4 +
fixef(model4)[4]*2 +
fixef(model4)[5]*1 +
fixef(model4)[6]*30
#> [1] 5.23
```

If you want a full expression of the model uncertaintly in terms of the shape of the posterior distribution and the 95% intervals, you’ll probably just want to use `posterior_samples()`

and do a little data processing.

```
post <- posterior_samples(model4)
post <-
post %>%
mutate(our_posterior = b_Intercept + b_negemot*4 + b_posemot*4 + b_ideology*2 + b_sex*1 + b_age*30)
# This intermediary step will make it easier to specify the break points and their labels for the x-axis
post_summary <-
quantile(post$our_posterior, probs = c(.025, .5, .975)) %>%
as_tibble() %>%
mutate(labels = value %>%
round(digits = 3) %>%
as.character())
ggplot(data = post,
aes(x = our_posterior)) +
geom_density(fill = "black") +
geom_vline(xintercept = post_summary$value,
size = c(.5, .75, .5), linetype = c(2, 1, 2), color = "white") +
scale_x_continuous(NULL,
breaks = post_summary$value,
labels = post_summary$labels) +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "The expected govact score for a 30-year-old man for whom\nnegemot and posemot both equal 4 and ideology equals 2.\nThe solid and dashed white vertical lines are the posterior\nmedian and 95% intervals, respectively.") +
theme_bw()
```

In the text, Hayes showed that individuals based on these two profiles would be expected to differ by 0.441 (i.e., 5.244 - 4.803 = 0.441). That’s fine if you’re only working with OLS point estimates. But a proper Bayesian approach would express the difference in terms of an entire poster distribution, or at least a point estimate accompanied by some sort of intervals. Here we’ll just work with the posterior to create a difference distribution. You could do that with a little deft `posterior_samples()`

wrangling. Here we’ll employ `fitted()`

.

```
nd <-
tibble(negemot = c(3, 4),
posemot = 4,
ideology = 2,
sex = 1,
age = 30)
fitted(model4,
newdata = nd,
summary = F) %>%
as_tibble() %>%
rename(condition_a = V1,
contition_b = V2) %>%
mutate(difference = contition_b - condition_a) %>%
ggplot(aes(x = difference)) +
geom_density(fill = "black", color = "transparent") +
scale_y_continuous(NULL, breaks = NULL) +
ggtitle("The posterior density for the difference between\nthe two conditions.") +
theme_bw()
```

### 2.4.1 The standardized regression model.

brms doesn’t automatically give us the standardized coefficients the way OLS output often does, we’ll have to be proactive. One solution is just to standardized the data themselves and then re-fit the model with those standardized variables. That leads us to the issue of how one standardized variables to begin with. Recall that standardizing entails subtracting the mean of a variable from that variable and then dividing that value by the standard deviation. We don’t want to do that by hand. So one handy way is to make a custom function to do that work for us.

```
sandardize <- function(x){
(x - mean(x))/sd(x)
}
```

Go here to learn more about making custom functions in R.

Here we’ll employ our custom `standardize()`

function to make standardized versions of our variables.

```
glbwarm <-
glbwarm %>%
mutate(posemot_z = sandardize(posemot),
negemot_z = sandardize(negemot),
ideology_z = sandardize(ideology),
sex_z = sandardize(sex),
age_z = sandardize(age))
```

Now we’ve got us our standardized variables, let’s fit a standardized model.

```
model4_z <-
brm(data = glbwarm, family = gaussian,
govact_z ~ 1 + negemot_z + posemot_z + ideology_z + sex_z + age_z,
chains = 4, cores = 4)
```

Here are the newly standardized coefficient summaries, minus the `Intercept`

.

```
fixef(model4_z)[-1, ] %>% round(3)
#> Estimate Est.Error Q2.5 Q97.5
#> negemot_z 0.495 0.030 0.435 0.553
#> posemot_z -0.027 0.028 -0.082 0.025
#> ideology_z -0.243 0.031 -0.304 -0.182
#> sex_z -0.003 0.028 -0.059 0.051
#> age_z -0.016 0.029 -0.073 0.041
```

Our coefficients match up nicely with those in the text. Just as with Hayes’s OLS estimates, we should not attempt to interpret the standardized `sex_z`

coefficient from our Bayesian model.

Here’s how we’d fit a *partially*-standardized model–a model in which all variables except for `sex`

are standardized.

```
model4_z_p <-
update(model4_z,
newdata = glbwarm,
formula = govact_z ~ 1 + negemot_z + posemot_z + ideology_z + sex + age_z,
chains = 4, cores = 4)
```

And here are the coefficient summaries, including the `Intercept`

, for the partially-standardized model.

```
fixef(model4_z_p) %>% round(digits = 3)
#> Estimate Est.Error Q2.5 Q97.5
#> Intercept 0.003 0.040 -0.075 0.081
#> negemot_z 0.494 0.029 0.438 0.550
#> posemot_z -0.027 0.028 -0.082 0.027
#> ideology_z -0.243 0.030 -0.302 -0.184
#> sex -0.006 0.055 -0.114 0.101
#> age_z -0.015 0.028 -0.069 0.038
```

As Hayes wrote, now `sex`

= -0.006 has a sensible interpretation. “We can say that men and women differ by [-0.006] standard deviations in their support for government action when all other variables in the model are held constant (p. 53).”

On page 54, Hayes gave us the equation to transform unstandardized coefficients to standardized ones:

\[\tilde{b}_{i} = b_{i}\left(\frac{SD_{X_{i}}}{SD_{Y}}\right)\]

Let’s give it a whirl with `negemot`

.

```
# Here's the coefficient for `negemot` from the standardized model, `model4_z`
fixef(model4_z)["negemot_z", "Estimate"]
#> [1] 0.495
# Here's the coefficient for `negemot` from the unstandardized model, `model4`
fixef(model4)["negemot", "Estimate"]
#> [1] 0.441
# And here we use Hayes's formula to standardize the unstandardized coefficient
fixef(model4)["negemot", "Estimate"]*(sd(glbwarm$negemot)/sd(glbwarm$govact))
#> [1] 0.495
```

Looks like we got it within rounding error–pretty good! However, that was just the posterior mean, the Bayesian point estimate. If we want to more fully express the uncertainty around the mean–and we do–, we’ll need to work with the posterior draws.

```
# the posterior draws from the unstandardized model
posterior_samples(model4) %>%
# using Hayes's formula to standardize `b_negemot`
mutate(`hand-made b_negemot_z` = b_negemot*(sd(glbwarm$negemot)/sd(glbwarm$govact))) %>%
# taking on the `b_negemot_z` column from the standardized `model4_z` models posterior draws
bind_cols(posterior_samples(model4_z) %>%
select(b_negemot_z)) %>%
# isolating those two columns
select(`hand-made b_negemot_z`, b_negemot_z) %>%
# converting the data to the long format and grouping by `key`
gather() %>%
group_by(key) %>%
# here we summarize the results
summarise(mean = mean(value),
sd = sd(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
#> # A tibble: 2 x 5
#> key mean sd ll ul
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 b_negemot_z 0.495 0.03 0.435 0.553
#> 2 hand-made b_negemot_z 0.495 0.029 0.437 0.552
```

Our summary confirms that we can apply Hayes’s formula to a `posterior_samples()`

column in order to get fuller summary statistics for a hand-converted standardized coefficient. This would be in full compliance with, say, APA recommendations to include 95% intervals with all effect sizes–the standardized regression coefficient being the effect size, here.