# Chapter 10 Moderation: testing interaction effects

```
library(tidyverse)
library(knitr)
library(broom)
```

## 10.1 Interaction with one numeric and one dichotomous variable

Suppose there is a linear relationship between age (in years) and vocabulary (the number of words one knows): the older you get, the more words you know. Suppose we have the following linear regression equation for this relationship:

\[\begin{aligned} \widehat{\texttt{vocab}} = 205 + 500 \times \texttt{age} \end{aligned}\]

According to this equation, the expected number of words for a newborn
baby (`age = 0`

) equals 205. This may sound silly, but suppose this
model is a very good prediction model for vocabulary size in children
between 2 and 5 years of age. Then this equation tells us that the
expected increase in vocabulary size is 500 words per year.

This model is meant for everybody in the Netherlands. But suppose that one researcher expects that the increase in words is much faster in children from high socio-economic status (SES) families than in children from low SES families. He believes that vocabulary will be larger in higher SES children than in low SES children. In other words, he expects an effect of SES, over and above the effect of age:

\[\begin{aligned} \widehat{\texttt{vocab}} = b_0 + b_1 \times \texttt{age} + b_2 \times \texttt{SES}\end{aligned}\]

This *main effect* of `SES`

is yet unknown and denoted by \(b_2\). Note
that this linear equation is an example of multiple regression.

Let’s use some numerical example. Suppose age is coded in years, and SES is dummy coded, with a 1 for high SES and a 0 for low SES. Let \(b_2\), the effect of SES over and above age, be 10. Then we can write out the linear equation for low SES and high SES separately.

\[\begin{aligned} low SES: \widehat{\texttt{vocab}} &= 200 + 500 \times \texttt{age} + 10 \times 0 \\ &= 200 + 500 \times \texttt{age} \\ high SES: \widehat{\texttt{vocab}} &= 200 + 500 \times \texttt{age} + 10 \times 1 \\ &= (200+10) + 500 \times \texttt{age} \\ &= 210 + 500 \times \texttt{age}\end{aligned}\]

Figure 10.1 depicts the two regression lines for the high and low SES children separately. We see that the effect of SES involves a change in the intercept: the intercept equals 200 for low SES children and the intercept for high SES children equals \(210\). The difference in intercept is indicated by the coefficient for SES. Note that the two regression lines are parallel: for every age, the difference between the two lines is equal to 10. For every age therefore, the predicted number of words is 10 words more for high SES children than for low SES children.

So far, this is an example of multiple regression that we already saw in
Chapter 4.
But suppose that such a model does not describe the data that we
actually have, or does not make the right predictions based on on our
theories. Suppose our researcher also expects that the *yearly increase*
in vocabulary is a bit lower than 500 words in low SES families, and a
little bit higher than 500 words in high SES families. In other words,
he believes that `SES`

might *moderate* (affect or change) the slope
coefficient for `age`

. Let’s call the slope coefficient in this case
\(b_1\). In the above equation this slope parameter is equal to 500, but
let’s now let itself have a linear relationship with `SES`

:

\[\begin{aligned} b_1 = a + b_3 \times \texttt{SES}\end{aligned}\]

In words: the slope coefficient for the regression of `vocab`

on `age`

,
is itself linearly related to `SES`

: we predict the slope on the basis
of `SES`

. We model that by including a slope \(b_3\), but also an
intercept \(a\). Now we have *two* linear equations for the relationship
between `vocab`

, `age`

and `SES`

:

\[\begin{aligned} \widehat{\texttt{vocab}} &= b_0 + b_1 \times \texttt{age} + b_2 \times \texttt{SES} \\ b_1 &= a + b_3 \times \texttt{SES}\end{aligned}\]

We can rewrite this by plugging the second equation into the first one (substitution):

\[\begin{aligned} \widehat{\texttt{vocab}} = b_0 + (a + b_3 \times \texttt{SES}) \times \texttt{age} + b_2 \times \texttt{SES} \end{aligned}\]

Multiplying this out gets us:

\[\begin{aligned} \widehat{\texttt{vocab}} = b_0 + a \times \texttt{age} + b_3 \times \texttt{SES} \times \texttt{age} + b_2 \times \texttt{SES}\end{aligned}\]

If we rearrange the terms a bit, we get:

\[\begin{aligned} \widehat{\texttt{vocab}} = b_0 + a \times \texttt{age} + b_2 \times \texttt{SES} + b_3 \times \texttt{SES} \times \texttt{age}\end{aligned}\]

Now this very much looks like a regression equation with one intercept
and *three* slope coefficients: one for `age`

(\(a\)), one for `SES`

(\(b_2\)) and one for \(\texttt{SES} \times \texttt{age}\) (\(b_3\)).

We might want to change the label \(a\) into \(b_1\) to get a more familiar looking form:

\[\begin{aligned} \widehat{\texttt{vocab}} = b_0 + b_1\times \texttt{age} + b_2 \times \texttt{SES} + b_3 \times \texttt{SES} \times \texttt{age}\end{aligned}\]

So the first slope coefficient is the increase in vocabulary for every
year that `age`

increases (\(b_1\)), the second slope coefficient is the
increase in vocabulary for an increase of 1 on the `SES`

variable
(\(b_2\)), and the third slope coefficient is the increase in vocabulary
for every increase of 1 on the *product* of `SES`

and `age`

(\(b_3\)).

What does this mean exactly?

Suppose we find the following parameter values for the regression equation:

\[\begin{aligned} \widehat{\texttt{vocab}} = 200 + 450 \times \texttt{age} + 125 \times \texttt{SES} + 100 \times \texttt{SES} \times \texttt{age} \end{aligned}\]

If we code low SES children as `SES = 0`

, and high SES children as
`SES = 1`

, we can write the above equation into two regression
equations, one for low SES children (`SES = 0`

) and one for high SES
children (`SES = 1`

):

\[\begin{aligned} low SES: \widehat{\texttt{vocab}} &= 200 + 450 \times \texttt{age} \nonumber \\ high SES: \widehat{\texttt{vocab}} &= 200 + 450 \times \texttt{age} + 125 + 100 \times \texttt{age} \nonumber\\ &= (200 + 125) + (450 + 100) \times \texttt{age} \nonumber\\ &= 325 + 550 \times \texttt{age} \nonumber\end{aligned}\]

Then for low SES children, the intercept is 200 and the regression slope for age is 450, so they learn 450 words per year. For high SES children, we see the same intercept of 200, with an extra 125 (this is the main effect of SES). So effectively their intercept is now 325. For the regression slope, we now have \(450 \times \texttt{age}+ 100 \times \texttt{age}\) which is of course equal to \(550 \times \texttt{age}\). So we see that the high SES group has both a different intercept, and a different slope: the increase in vocabulary is 550 per year: somewhat steeper than in low SES children. So yes, the researcher was right: vocabulary increase per year is faster in high SES children than in low SES children.

These two different regression lines are depicted in Figure
10.2. It can be clearly seen that the lines
have two different intercepts and two different slopes. That they have
two different slopes can be seen from the fact that the lines are not
parallel. One has a slope of 450 words per year and the other has a
slope of 550 words per year. This difference in slope of 100 is exactly
the size of the slope coefficient pertaining to the product
\(\texttt{SES} \times \texttt{age}\), \(b_3\). Thus, the interpretation of
the regression coefficient for a product of two variables is that it
represents *the difference in slope*.

The observation that the slope coefficient is different for different
groups is called an *interaction effect*, or *interaction* for short.
Other words for this phenomenon are *modification* and *moderation*. In
this case, `SES`

is called the *modifier variable*: it modifies the
relationship between `age`

on vocabulary. Note however that you could
also interpret `age`

as the modifier variable: the effect of `SES`

is
larger for older children than for younger children. In the plot you see
that the difference between vocabulary for high and low SES children of
age 6 is larger than it is for children of age 2.

## 10.2 Interaction effect with a dummy variable in R

Let’s look at some example output for an R data set where we have a
categorical variable that is not dummy-coded yet. The data set is on
chicks and their weight during the first days of their lives. Weight is
measured in grams. The chicks were given one of four different diets.
Here we use only the data from chicks on two different diets 1 and 2. We
select only the Diet 1 and 2 data. We store the Diet 1 and 2 data under
the name `chick_data`

. When we have a quick look at the data with
`glimpse()`

, we see that `Diet`

is a factor (`<fct>`

).

```
<- ChickWeight %>%
chick_data filter(Diet == 1 | Diet == 2)
%>%
chick_data glimpse()
```

```
## Rows: 340
## Columns: 4
## $ weight <dbl> 42, 51, 59, 64, 76, 93, 106, 125, 149, 171, 199, 205, 40, 49, 5…
## $ Time <dbl> 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 21, 0, 2, 4, 6, 8, 10, 1…
## $ Chick <ord> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ Diet <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
```

The general regression of `weight`

on `Time`

is shown in Figure
10.3. This regression line for the entire
sample of chicks has a slope of around 8 grams per day. Now we want to
know whether this slope is the same for chicks in the Diet 1 and Diet 2
groups, in other words, do chicks grow as fast with Diet 1 as with Diet
2? We might expect that `Diet`

*moderates* the effect of `Time`

on
`weight`

. We use the following code to study this
\(\texttt{Diet} \times \texttt{Time}\) interaction effect, by having R
automatically create a dummy variable for the factor `Diet`

. In the
model we specify that we want a main effect of `Time`

, a main effect of
`Diet`

, and an interaction effect of `Time`

by `Diet`

:

```
<- chick_data %>%
out lm(weight ~ Time + Diet + Time:Diet, data = .)
%>%
out tidy(conf.int = TRUE)
```

```
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 30.9 4.50 6.88 2.95e-11 22.1 39.8
## 2 Time 6.84 0.361 19.0 4.89e-55 6.13 7.55
## 3 Diet2 -2.30 7.69 -0.299 7.65e- 1 -17.4 12.8
## 4 Time:Diet2 1.77 0.605 2.92 3.73e- 3 0.577 2.96
```

In the regression table, we see the effect of the numeric `Time`

variable, which has a slope of 6.84. For every increase of 1 in `Time`

,
there is a corresponding expected increase of 6.84 grams in weight.
Next, we see that R created a dummy variable `Diet2`

. That means this
dummy codes 1 for Diet 2 and 0 for Diet 1. From the output we see that
if a chick gets Diet 2, its weight is -2.3 grams heavier (that means,
Diet 2 results in a lower weight).

Next, R created a dummy variable `Time:Diet2`

, by multiplying the
variables `Time`

and `Diet2`

. Results show that this interaction effect
is 1.77.

These results can be plugged into the following regression equation:

\[\begin{aligned} \widehat{\texttt{weight}} = 30.93 + 6.84 \times \texttt{Time} -2.3 \times \texttt{Diet2} + 1.77 \times \texttt{Time} \times \texttt{Diet2} \end{aligned}\]

If we fill in 1s for the `Diet2`

dummy variable, we get the equation for
chicks with Diet 2:

\[\begin{aligned} \widehat{\texttt{weight}} &= 30.93 + 6.84 \times \texttt{Time} -2.3 \times 1 + 1.77 \times \texttt{Time} \times 1 \nonumber \\ &= 28.63 + 8.61 \times \texttt{Time} \end{aligned}\]

If we fill in 0s for the `Diet2`

dummy variable, we get the equation for
chicks with Diet 1:

\[\begin{aligned} \widehat{\texttt{weight}} = 30.93 + 6.84 \times \texttt{Time} \end{aligned}\]

When comparing these two regression lines for chicks with Diet 1 and
Diet 2, we see that the slope for `Time`

is 1.77 steeper for Diet 2
chicks than for Diet 1 chicks. In this particular random sample of
chicks, the chicks on Diet 1 grow 6.84 grams per day (on average), but
chicks on Diet 2 grow \(6.84+1.77= 8.61\) grams per day (on average).

We visualised these results in Figure 10.4. There we see two regression lines: one for the red data points (chicks on Diet 1) and one for the blue data points (chicks on Diet 2). These two regression lines are the same as those regression lines we found when filling in either 1s and 0s in the general linear model. Note that the lines are not parallel, like in Chapter 6. Each regression line is the least squares regression line for the subsample of chicks on a particular diet.

We see that the difference in slope is 1.77 grams per day. This is what
we observe in *this* particular sample of chicks. However, what does
that tell us about the difference in slope for chicks in general, that
is, the population of all chicks? For that, we need to look at the
confidence interval. In the regression table above, we also see the 95%
confidence intervals for all model parameters. The 95% confidence
interval for the \(\texttt{Time} \times \text{Diet2}\) interaction effect
is (0.58, 2.96). That means that plausible values for this interaction
effect are those values between 0.58 and 2.96.

It is also possible to do null-hypothesis testing for interaction
effects. One could test whether this difference of 1.77 is possible *if
the value in the entire population of chicks equals 0*? In other words,
is the value of 1.77 significantly different from 0?

The null-hypothesis is

\[H_0: \beta_{\texttt{Time} \times \texttt{Diet2}} = 0\]

The regression table shows that the null-hypothesis for the interaction effect has a \(t\)-value of \(t=2.92\), with a \(p\)-value of \(3.73 \times 10^{-3} = 0.00373\). For research reports one always also reports the degrees of freedom for a statistical test. The (residual) degrees of freedom can be found in R by typing

`$df.residual out`

`## [1] 336`

We can report that

"we reject the null-hypothesis and conclude that there is evidence that the \(\texttt{Time} \times \texttt{Diet2}\) interaction effect is not 0, \(t(336)= 2.92, p = .004\)."

Summarising, in this section, we established that `Diet`

moderates the
effect of `Time`

on `weight`

: we found a significant diet by time
interaction effect. The difference in growth rate is 1.77 grams per day,
with a 95% confidence interval from 0.58 to 2.96. In more natural
English: diet has an effect on the growth rate in chicks.

In this section we discussed the situation that regression slopes might
be different in two groups: the regression slope might be steeper in one
group than in the other group. So suppose that we had a numerical
predictor \(X\) for a numerical dependent variable \(Y\), we said that a
particular dummy variable \(Z\) *moderated* the effect of \(X\) on \(Y\). This
moderation was quantified by an *interaction* effect. So suppose we have
the following linear equation:

\[\begin{aligned} Y = b_0 + b_1 \times X + b_2 \times Z + b_3 \times X \times Z + e \nonumber\end{aligned}\]

Then, we call \(b_0\) the intercept, \(b_1\) the main effect of \(X\), \(b_2\) the main effect of \(Z\), and \(b_3\) the interaction effect of \(X\) and \(Z\) (alternatively, the \(X\) by \(Z\) interaction effect).

## 10.3 Interaction effects with a categorical variable in R

In the previous section, we looked at the difference in slopes between
two groups. But what we can do for two groups, we can do for multiple
groups. The data set on chicks contains data on chicks with 4 different
diets. When we perform the same analysis using all data in
`ChickWeight`

, we obtain the regression table

```
<- ChickWeight %>%
out lm(weight ~ Time + Diet + Time:Diet, data = .)
%>%
out tidy(conf.int = TRUE)
```

```
## # A tibble: 8 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 30.9 4.25 7.28 1.09e-12 22.6 39.3
## 2 Time 6.84 0.341 20.1 3.31e-68 6.17 7.51
## 3 Diet2 -2.30 7.27 -0.316 7.52e- 1 -16.6 12.0
## 4 Diet3 -12.7 7.27 -1.74 8.15e- 2 -27.0 1.59
## 5 Diet4 -0.139 7.29 -0.0191 9.85e- 1 -14.5 14.2
## 6 Time:Diet2 1.77 0.572 3.09 2.09e- 3 0.645 2.89
## 7 Time:Diet3 4.58 0.572 8.01 6.33e-15 3.46 5.70
## 8 Time:Diet4 2.87 0.578 4.97 8.92e- 7 1.74 4.01
```

The regression table for four diets is substantially larger than for two
diets. It contains one slope parameter for the numeric variable `Time`

,
three different slopes for the factor variable `Diet`

and three
different interaction effects for the `Time`

by `Diet`

interaction.

The full linear model equation is

\[\begin{aligned} \widehat{\texttt{weight}} = 30.93 + 6.84 \times \texttt{Time} -2.3 \times \texttt{Diet2} -12.68 \times \texttt{Diet3} -0.14 \times \texttt{Diet4} \nonumber\\ + 1.77 \times \texttt{Time} \times \texttt{Diet2} + 4.58 \times \texttt{Time} \times \texttt{Diet3} + 2.87 \times \texttt{Time} \times \texttt{Diet4} \nonumber \end{aligned}\]

You see that R created dummy variables for Diet 2, Diet 3 and Diet 4. We
can use this equation to construct a separate linear model for the Diet
1 data. Chicks with Diet 1 have 0s for the dummy variables `Diet2`

,
`Diet3`

and `Diet4`

. If we fill in these 0s, we obtain

\[\widehat{\texttt{weight}} = 30.93 + 6.84 \times \texttt{Time}\]

For the chicks on Diet 2, we have 1s for the dummy variable `Diet2`

and
0s for the other dummy variables. Hence we have

\[\begin{aligned} \widehat{\texttt{weight}} &= 30.93 + 6.84 \times \texttt{Time} -2.3 \times 1 + 1.77 \times \texttt{Time} \times 1 \nonumber\\ &= 30.93 + 6.84 \times \texttt{Time} -2.3 + 1.77 \times \texttt{Time} \nonumber \\ &= (30.93 -2.3) + (6.84 + 1.77) \times \texttt{Time} \nonumber\\ &= 28.63 + 8.61 \times \texttt{Time}\nonumber \end{aligned}\]

Here we see exactly the same equation for Diet 2 as in the previous section where we only analysed two diet groups. The difference between the two slopes in the Diet 1 and Diet 2 groups is again 1.77. The only difference for this interaction effect is the standard error, and therefore the confidence interval is also slightly different. We will come back to this issue in Chapter 7.

For the chicks on Diet 3, we have 1s for the dummy variable `Diet3`

and
0s for the other dummy variables. The regression equation is then

\[\begin{aligned} \widehat{\texttt{weight}} &= 30.93 + 6.84 \times \texttt{Time} -12.68 \times 1 + 4.58 \times \texttt{Time} \times 1 \nonumber\\ &= (30.93 -12.68) + (6.84 + 4.58) \times \texttt{Time} \nonumber\\ &= 18.25 + 11.42 \times \texttt{Time}\nonumber \end{aligned}\]

We see that the intercept is again different than for the Diet 1 chicks.
We also see that the slope is different: it is now 4.58 steeper than for
the Diet 1 chicks. This difference in slopes is exactly equal to the
`Time`

by `Diet3`

interaction effect. This is also what we saw in the
Diet 2 group. Therefore, we can say that an interaction effect for a
specific diet group says something about how much steeper the slope is
in that group, compared to the reference group. The reference group is
the group for which all the dummy variables are 0. Here, that is the
Diet 1 group.

Based on that knowledge, we can expect that the slope in the Diet 4
group is equal to the slope in the reference group (6.84) plus the
`Time`

by `Diet4`

interaction effect, 2.87, so 9.71.

We can do the same for the intercept in the Diet 4 group. The intercept
is equal to the intercept in the reference group (30.93) plus the main
effect of the `Diet4`

dummy variable, -0.14, which is 30.79.

The linear equation is then for the Diet 4 chicks:

\[\widehat{\texttt{weight}} = 30.79 + 9.71 \times \texttt{Time}\]

The four regression lines are displayed in Figure 10.5. The steepest regression line is the one for the Diet 3 chicks: they are the fastest growing chicks. The slowest growing chicks are those on Diet 1. The confidence intervals in the regression table tell us that the difference between the growth rate with Diet 4 compared to Diet 1 is somewhere between 1.74 and 4.01 grams per day.

Suppose we want to test the null hypothesis that all four slopes are the
same. This implies that the `Time`

by `Diet`

interaction effects are all
equal to 0. We can test this null hypothesis

\[H_0: \beta_{\texttt{Time} \times \texttt{Diet2}} = \beta_{\texttt{Time} \times \texttt{Diet2}} = \beta_{\texttt{Time} \times \texttt{Diet4}} = 0\]

by running an ANOVA. That is, we apply the `anova()`

function to the
results of an `lm()`

analysis:

```
<- out %>%
out_anova anova()
%>%
out_anova tidy()
```

```
## # A tibble: 4 x 6
## term df sumsq meansq statistic p.value
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Time 1 2042344. 2042344. 1760. 2.11e-176
## 2 Diet 3 129876. 43292. 37.3 5.07e- 22
## 3 Time:Diet 3 80804. 26935. 23.2 3.47e- 14
## 4 Residuals 570 661532. 1161. NA NA
```

In the output we see a `Time`

by `Diet`

interaction effect with 3
degrees of freedom. That term refers to the null-hypothesis that all
three interaction effects are equal to 0. The \(F\)-statistic associated
with that null-hypothesis equals 23.2. The residual degrees of freedom
equals 570, so that we can report:

"The slopes for the four different diets were significantly different from each other, \(F(3,570) = 23.2, MSE = 26935, p < .001\)."

## 10.4 Interaction between two dichotomous variables in R

In the previous section we discussed the situation that regression slopes might be different in two four groups. In Chapter 6 we learned that we could also look at slopes for dummy variables. The slope is then equal to the difference in group means, that is, the slope is the increase in the group mean of one group compared to the reference group.

Now we discuss the situation where we have two dummy variables, and want to do inference on their interaction. Does one dummy variable moderate the effect of the other dummy variable?

Let’s have a look at a data set on penguins. It can be found in the
`palmerpenguins`

package.

```
# install.packages("palmerpenguins")
library(palmerpenguins)
%>%
penguins str ()
```

```
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
## $ species : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ island : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ bill_length_mm : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
## $ bill_depth_mm : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
## $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
## $ body_mass_g : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
## $ sex : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
## $ year : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
```

We see there is a `species`

factor with three levels, and a `sex`

factor
with two levels. Let’s select only the Adelie and Chinstrap species.

```
<- penguins %>%
penguin_data filter(species %in% c("Adelie", "Chinstrap"))
```

Suppose that we are interested in differences in flipper length across
species. We then could run a linear model, with `flipper_length_mm`

as
the dependent variable, and `species`

as independent variable.

```
<- penguin_data %>%
out lm(flipper_length_mm ~ species, data = .)
%>%
out tidy(conf.int = TRUE)
```

```
## # A tibble: 2 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 190. 0.548 347. 8.31e-300 189. 191.
## 2 speciesChinstrap 5.87 0.983 5.97 9.38e- 9 3.93 7.81
```

The output shows that in this sample, the Chinstrap penguins have on average larger flippers than Adelie penguins. The confidence intervals tell us that this difference in flipper length is somewhere between 6.17 and 7.51. But suppose that this is not what we want to know. The real question might be whether this difference is different for male and female penguins. Maybe there is a larger difference in flipper length in females than in males?

This difference or change in the effect of one independent variable
(`species`

) as a function of another independent variable (`sex`

) should
remind us of *moderation*: maybe sex moderates the effect of species on
flipper length.

In order to study such moderation, we have to analyse the `sex`

by
`species`

interaction effect. By now you should know how to do that in
R:

```
<- penguin_data %>%
out lm(flipper_length_mm ~ species + sex + species:sex, data = .)
%>%
out tidy(conf.int = TRUE)
```

```
## # A tibble: 4 x 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 188. 0.707 266. 2.15e-267 186. 189.
## 2 speciesChinstrap 3.94 1.25 3.14 1.92e- 3 1.47 6.41
## 3 sexmale 4.62 1.00 4.62 6.76e- 6 2.65 6.59
## 4 speciesChinstrap:se… 3.56 1.77 2.01 4.60e- 2 0.0638 7.06
```

In the output we see an intercept of 31. Next, we see an effect of a
dummy variable, coding 1s for Chinstrap penguins (`speciesChinstrap`

).
We also see an effect of a dummy variable coding 1s for male penguins
(`sexmale`

). Then, we see an interaction effect of these two dummy
effects. That means that this dummy variable codes 1s for the specific
combination of Chinstrap penguins that are male
(`speciesChinstrap:sexmale`

).

\[\begin{aligned} \widehat{\texttt{flipperlength}} = 31 + 6.84 \times \texttt{speciesChinstrap} + \nonumber\\ -2.3 \times \texttt{sexmale} + -12.68 \times \texttt{speciesChinstrap} \times \texttt{sexmale} \nonumber\end{aligned}\]

From this we can make the following predictions. The predicted flipper length for female Adelie penguins is

\[31 + 6.84 \times 0 + -2.3 \times 0 + -12.68 \times 0 \times 0 = 31\]

The predicted flipper length for male Adelie penguins is

\[\begin{aligned} 31 + 6.84 \times 0 + -2.3 \times 1 + -12.68 \times 0 \times 1 \nonumber\\ = 31 + -2.3 = 28.7 \nonumber\end{aligned}\]

The predicted flipper length for female Chinstrap penguins is

\[\begin{aligned} 31 + 6.84 \times 1 + -2.3 \times 0 + -12.68 \times 0 \times 0 \nonumber\\ = 31 + 6.84 = 37.84\nonumber \end{aligned}\]

and the predicted flipper length for male Chinstrap penguins is

\[\begin{aligned} 31 &+ 6.84 \times 1 + -2.3 \times 1 + -12.68 \times 1 \times 1 \nonumber\\ &= 31 + 6.84 + -2.3 + -12.68 \nonumber\\ &= 22.86 \nonumber \end{aligned}\]

These predicted flipper lengths for each male/species combination are
actually the group means. It is generally best to plot these means with
a *means and errors plot*. For that we first need to compute means by R.
With `left_join()`

we add these means to the data set. These
diamond-shaped means (`shape = 18`

) are plotted with intervals that are
twice (`mult = 2`

) the standard error of those means
(`geom = "errorbar"`

).

```
left_join(penguin_data, # adding group means to the data set
%>%
penguin_data group_by(species, sex) %>%
summarise(mean = mean(flipper_length_mm))
%>%
) ggplot(aes(x = sex, y = flipper_length_mm, colour = species)) +
geom_jitter(position = position_jitterdodge(), # the raw data
shape = 1,
alpha = 0.6) +
geom_point(aes(y = mean), # the groups means
position = position_jitterdodge(jitter.width = 0),
shape = 18,
size = 5) +
stat_summary(fun.data = mean_se, # computing errorbars
fun.args = list(mult = 2),
geom = "errorbar",
width = 0.2,
position = position_jitterdodge(jitter.width = 0),
size = 1) +
scale_color_brewer(palette = "Set1")
```

This plot shows also the data on penguins with unknown sex (`sex = NA`

).
If we leave these out, we get

```
left_join(penguin_data, # adding group means to the data set
%>%
penguin_data group_by(species, sex) %>%
summarise(mean = mean(flipper_length_mm))
%>%
) filter(!is.na(sex)) %>%
ggplot(aes(x = sex, y = flipper_length_mm, colour = species)) +
geom_jitter(position = position_jitterdodge(), # the raw data
shape = 1,
alpha = 0.6) +
geom_point(aes(y = mean), # the groups means
position = position_jitterdodge(jitter.width = 0),
shape = 18,
size = 5) +
stat_summary(fun.data = mean_se, # computing errorbars
fun.args = list(mult = 2),
geom = "errorbar",
width = 0.2,
position = position_jitterdodge(jitter.width = 0),
size = 1) +
scale_color_brewer(palette = "Set1")
```

Comparing the Adelie and the Chinstrap data, we see that for both males and females, the Adelie penguins have smaller flippers than the Chinstrap penguins. Comparing males and females, we see that the males have generally larger flippers than females. More interestingly in relation to this chapter, the means in the males are farther apart than the means in the females. Thus, in males the effect of species is larger than in females. This is the interaction effect, and this difference in the difference in means is equal to -12.68 in this data set. With a confidence level of 95% we can say that the moderating effect of sex on the effect of species is probably somewhere between -26.95 and 1.59 mm in the population of all penguins.

## 10.5 Moderation involving two numeric variables in R

In all previous examples, we saw at least one categorical variable. We saw that for different levels of a dummy variable, the slope of another variable varied. We also saw that for different levels of a dummy variable, the effect of another dummy variable varied. In this section, we look at how the slope of a numeric variable can vary, as a function of the level of another numeric variable.

As an example data set, we look at the `mpg`

data frame, available in
the `ggplot2`

package. It contains data on 234 cars. Let’s analyse the
dependent variable `cty`

(city miles per gallon) as a function of the
numeric variables `cyl`

(number of cylinders) and `displ`

(engine
displacement). First we plot the relationship between engine
displacement and city miles per gallon. We use colours, based on the
number of cylinders. We see that there is in general a negative slope:
the higher the displacement value, the lower the city miles per gallon.

```
%>%
mpg ggplot(aes(x = displ, y = cty)) +
geom_point(aes(colour = cyl)) +
geom_smooth(method = "lm", se = F)
```

When we run separate linear models for the different number of cylinders, we get

```
%>%
mpg ggplot(aes(x = displ, y = cty, colour = cyl, group = cyl)) +
geom_point() +
geom_smooth(method = "lm", se = F)
```

We see that the slope is different, depending on the number of cylinders: the more cylinders, the less negative is the slope: very negative for cars with low number of cylinders, and slightly positive for cars with high number of cilinders. In other words, the slope increases in value with increasing number of cylinders. If we want to quantify this interaction effect, we need to run a linear model with an interaction effect.

```
<- mpg %>%
out lm(cty ~ displ + cyl + displ:cyl, data = .)
%>%
out tidy()
```

```
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 38.6 2.03 19.0 3.69e-49
## 2 displ -5.29 0.825 -6.40 8.45e-10
## 3 cyl -2.70 0.376 -7.20 8.73e-12
## 4 displ:cyl 0.559 0.104 5.38 1.85e- 7
```

We see that the `displ`

by `cyl`

interaction effect is 0.559. It means
that the slope of `displ`

changes by 0.559 for every unit increase in
`cyl`

.

For example, when we look at the predicted city miles per gallon with
`cyl = 2`

, we get the following model equation:

\[\begin{aligned} \widehat{\texttt{cty}} &= 0.6 -5.285\times \texttt{displ} -2.704 \texttt{cyl} + 0.559 \times \texttt{displ} \times \texttt{cyl} \nonumber\\ \widehat{\texttt{cty}} &= 0.6 -5.285 \times \texttt{displ} -2.704 \times 2 + 0.559 \times \texttt{displ} \times 2 \nonumber\\ \widehat{\texttt{cty}} &= 0.6 -5.285 \times \texttt{displ} -5.408 + 1.118 \times \texttt{displ} \nonumber\\ \widehat{\texttt{cty}} &= (0.6 -5.408) + (1.118-5.285) \times \texttt{displ} \nonumber\\ \widehat{\texttt{cty}} &= -4.808 -4.167 \times \texttt{displ}\nonumber\end{aligned}\]

If we increase the number of cylinders from 2 to 3, we obtain the equation:

\[\begin{aligned} \widehat{\texttt{cty}} &= 0.6 -5.285 \times \texttt{displ} -2.704 \times 3 + 0.559 \times \texttt{displ} \times 3 \nonumber\\ \widehat{\texttt{cty}} &= -7.512 -3.608 \times \texttt{displ}\nonumber\end{aligned}\]

We see a different intercept and a different slope. The difference in the slope between 3 and 2 cylinders equals 0.559, which is exactly the interaction effect. If you do the same exercise with 4 and 5 cylinders, or 6 and 7 cylinders, you will always see this difference again. This parameter for the interaction effect just says that the best prediction for the change in slope when increasing the number of cylinders with 1, is 0.559. We can plot the predictions from this model in the following way:

```
library(modelr)
%>%
mpg add_predictions(out) %>%
ggplot(aes(x = displ, y = cty, colour = cyl)) +
geom_point() +
geom_line(aes(y = pred, group = cyl))
```

If we compare these predicted regression lines with those in the previous figure

```
%>%
mpg add_predictions(out) %>%
ggplot(aes(x = displ, y = cty, group = cyl)) +
geom_point() +
geom_line(aes(y = pred), colour = "black") +
geom_smooth(method = "lm", se = F)
```

we see that they are a little bit different. That is because in the
model we treat `cyl`

as numeric: for every increase of 1 in `cyl`

, the
slope changes by a fixed amount. When you treat `cyl`

as categorical,
then you estimate the slope separately for all different levels. You
would then see multiple parameters for the interaction effect:

```
<- mpg %>%
out mutate(cyl = factor(cyl)) %>%
lm(cty ~ displ + cyl + displ:cyl, data = .)
%>%
out tidy()
```

```
## # A tibble: 8 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 33.8 1.70 19.9 1.50e-51
## 2 displ -5.96 0.785 -7.60 7.94e-13
## 3 cyl5 1.60 1.17 1.37 1.72e- 1
## 4 cyl6 -11.6 2.50 -4.65 5.52e- 6
## 5 cyl8 -23.4 2.89 -8.11 3.12e-14
## 6 displ:cyl5 NA NA NA NA
## 7 displ:cyl6 4.21 0.948 4.44 1.38e- 5
## 8 displ:cyl8 6.39 0.906 7.06 2.05e-11
```

When `cyl`

is turned into a factor, you see that cars with 4 cylinders
are taken as the reference category, and there are effects of having 5,
6, or 8 cylinders. We see the same for the interaction effects: there is
a reference category with 4 cylinders, where the slope of `displ`

equals
-5.96. Cars with 6 and 8 cylinders have different slopes: the one for 6
cylinders is 5.96 + 4.21 and the one for 8 cylinders is 5.96 + 6.39. The
slope for cars with 5 cylinders can’t be separately estimated because
there is no variation in `displ`

in the `mpg`

data set.

You see that you get different results, depending on whether you treat a variable as numeric or as categorical. Treated as numeric, you end up with a simpler model with fewer parameters, and therefore a larger number of degrees of freedom. What to choose depends on the research question and the amount of data. In general, a model should be not too complex when you have relatively few data points. Whether the model is appropriate for your data can be checked by looking at the residuals and checking the assumptions.