8.3 Interactions Between Independent Variables
There are research questions where it is interesting to learn how the effect on YY of a change in an independent variable depends on the value of another independent variable. For example, we may ask if districts with many English learners benefit differentially from a decrease in class sizes to those with few English learning students. To assess this using a multiple regression model, we include an interaction term. We consider three cases:
Interactions between two binary variables.
Interactions between a binary and a continuous variable.
Interactions between two continuous variables.
The following subsections discuss these cases briefly and demonstrate how to perform such regressions in R.
Interactions Between Two Binary Variables
Take two binary variables D1D1 and D2D2 and the population regression model
Yi=β0+β1×D1i+β2×D2i+ui.Yi=β0+β1×D1i+β2×D2i+ui.
Now assume that
Yi=ln(Earningsi),D1i={1if ith person has a college degree,0else.D2i={1if ith person is female,0if ith person is male.We know that β1 measures the average difference in ln(Earnings) between individuals with and without a college degree and β2 is the gender differential in ln(Earnings), ceteris paribus. This model does not allow us to determine if there is a gender specific effect of having a college degree and, if so, how strong this effect is. It is easy to come up with a model specification that allows to investigate this:
Yi=β0+β1×D1i+β2×D2i+β3×(D1i×D2i)+ui
(D1i×D2i) is called an interaction term and β3 measures the difference in the effect of having a college degree for women versus men.
Key Concept 8.3
A Method for Interpreting Coefficients in Regression with Binary Variables
Compute expected values of Y for each possible set described by the set of binary variables. Compare the expected values. The coefficients can be expressed either as expected values or as the difference between at least two expected values.
Following Key Concept 8.3 we have
E(Yi|D1i=0,D2i=d2)=β0+β1×0+β2×d2+β3×(0×d2)=β0+β2×d2.If D1i switches from 0 to 1 we obtain
E(Yi|D1i=1,D2i=d2)=β0+β1×1+β2×d2+β3×(1×d2)=β0+β1+β2×d2+β3×d2.Hence, the overall effect is
E(Yi|D1i=1,D2i=d2)−E(Yi|D1i=0,D2i=d2)=β1+β3×d2 so the effect is a difference of expected values.
Application to the Student-Teacher Ratio and the Percentage of English Learners
Now let
HiSTR={1,if STR≥200,else,HiEL={1,if PctEL≥100,else.We may use R to construct the variables above as follows.
# append HiSTR to CASchools
CASchools$HiSTR <- as.numeric(CASchools$size >= 20)
# append HiEL to CASchools
CASchools$HiEL <- as.numeric(CASchools$english >= 10)
We proceed by estimating the model
TestScore=β0+β1×HiSTR+β2×HiEL+β3×(HiSTR×HiEL)+ui.There are several ways to add the interaction term to the formula argument when using lm() but the most intuitive way is to use HiEL * HiSTR.7
# estimate the model with a binary interaction term
bi_model <- lm(score ~ HiSTR * HiEL, data = CASchools)
# print a robust summary of the coefficients
coeftest(bi_model, vcov. = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 664.1433 1.3881 478.4589 < 2.2e-16 ***
## HiSTR -1.9078 1.9322 -0.9874 0.3240
## HiEL -18.3155 2.3340 -7.8472 3.634e-14 ***
## HiSTR:HiEL -3.2601 3.1189 -1.0453 0.2965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated regression model is ^TestScore=664.1(1.39)−1.9(1.93)×HiSTR−18.3(2.33)×HiEL−3.3(3.12)×(HiSTR×HiEL) and it predicts that the effect of moving from a school district with a low student-teacher ratio to a district with a high student-teacher ratio, depending on high or low percentage of english learners is −1.9−3.3×HiEL. So for districts with a low share of english learners (HiEL=0), the estimated effect is a decrease of 1.9 points in test scores while for districts with a large fraction of English learners (HiEL=1), the predicted decrease in test scores amounts to 1.9+3.3=5.2 points.
We can also use the model to estimate the mean test score for each possible combination of the included binary variables.
# estimate means for all combinations of HiSTR and HiEL
# 1.
predict(bi_model, newdata = data.frame("HiSTR" = 0, "HiEL" = 0))
## 1
## 664.1433
# 2.
predict(bi_model, newdata = data.frame("HiSTR" = 0, "HiEL" = 1))
## 1
## 645.8278
# 3.
predict(bi_model, newdata = data.frame("HiSTR" = 1, "HiEL" = 0))
## 1
## 662.2354
# 4.
predict(bi_model, newdata = data.frame("HiSTR" = 1, "HiEL" = 1))
## 1
## 640.6598
We now verify that these predictions are differences in the coefficient estimates presented in equation (8.1):
^TestScore=ˆβ0=664.1⇔HiSTR=0,HIEL=0^TestScore=ˆβ0+ˆβ2=664.1−18.3=645.8⇔HiSTR=0,HIEL=1^TestScore=ˆβ0+ˆβ1=664.1−1.9=662.2⇔HiSTR=1,HIEL=0^TestScore=ˆβ0+ˆβ1+ˆβ2+ˆβ3=664.1−1.9−18.3−3.3=640.6⇔HiSTR=1,HIEL=1Interactions Between a Continuous and a Binary Variable
Now let Xi denote the years of working experience of person i, which is a continuous variable. We have Yi=ln(Earningsi),Xi=working experience of person i,Di={1,if ith person has a college degree0,else.The baseline model thus is
Yi=β0+β1Xi+β2Di+ui,
a multiple regression model that allows us to estimate the average benefit of having a college degree holding working experience constant as well as the average effect on earnings of a change in working experience holding college degree constant.
By adding the interaction term Xi×Di we allow the effect of an additional year of work experience to differ between individuals with and without college degree,
Yi=β0+β1Xi+β2Di+β3(Xi×Di)+ui.
Here, β3 is the expected difference in the effect of an additional year of work experience for college graduates versus non-graduates. Another possible specification is
Yi=β0+β1Xi+β2(Xi×Di)+ui.
This model states that the expected impact of an additional year of work experience on earnings differs for for college graduates and non-graduates but that graduating on its own does not increase earnings.
All three regression functions can be visualized by straight lines. Key Concept 8.4 summarizes the differences.
Key Concept 8.4
Interactions Between Binary and Continuous Variables
An interaction term like Xi×Di (where Xi is continuous and Di is binary) allows for the slope to depend on the binary variable Di. There are three possibilities:
- Different intercept and same slope: Yi=β0+β1Xi+β2Di+ui
Different intercept and different slope: Yi=β0+β1Xi+β2Di+β3×(Xi×Di)+ui
- Same intercept and different slope: Yi=β0+β1Xi+β2(Xi×Di)+ui
The following code chunk demonstrates how to replicate the results shown in Figure 8.8 of the book using artificial data.
# generate artificial data
set.seed(1)
X <- runif(200,0, 15)
D <- sample(0:1, 200, replace = T)
Y <- 450 + 150 * X + 500 * D + 50 * (X * D) + rnorm(200, sd = 300)
# divide plotting area accordingly
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(m)
# estimate the models and plot the regression lines
# 1. (baseline model)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Same Slope")
mod1_coef <- lm(log(Y) ~ X + D)$coefficients
abline(coef = c(mod1_coef[1], mod1_coef[2]),
col = "red",
lwd = 1.5)
abline(coef = c(mod1_coef[1] + mod1_coef[3], mod1_coef[2]),
col = "green",
lwd = 1.5)
# 2. (baseline model + interaction term)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Different Slopes")
mod2_coef <- lm(log(Y) ~ X + D + X:D)$coefficients
abline(coef = c(mod2_coef[1], mod2_coef[2]),
col = "red",
lwd = 1.5)
abline(coef = c(mod2_coef[1] + mod2_coef[3], mod2_coef[2] + mod2_coef[4]),
col = "green",
lwd = 1.5)
# 3. (omission of D as regressor + interaction term)
plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Same Intercept, Different Slopes")
mod3_coef <- lm(log(Y) ~ X + X:D)$coefficients
abline(coef = c(mod3_coef[1], mod3_coef[2]),
col = "red",
lwd = 1.5)
abline(coef = c(mod3_coef[1], mod3_coef[2] + mod3_coef[3]),
col = "green",
lwd = 1.5)
Hide Source
Hide Plot
Application to the Student-Teacher Ratio and the Percentage of English Learners
Using a model specification like the second one discussed in Key Concept 8.3 (different slope, different intercept) we may answer the question whether the effect on test scores of decreasing the student-teacher ratio depends on whether there are many or few English learners. We estimate the regression model
^TestScorei=β0+β1×sizei+β2×HiELi+β2(sizei×HiELi)+ui.
# estimate the model
bci_model <- lm(score ~ size + HiEL + size * HiEL, data = CASchools)
# print robust summary of coefficients to the console
coeftest(bci_model, vcov. = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 682.24584 11.86781 57.4871 <2e-16 ***
## size -0.96846 0.58910 -1.6440 0.1009
## HiEL 5.63914 19.51456 0.2890 0.7727
## size:HiEL -1.27661 0.96692 -1.3203 0.1875
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated regression model is ^TestScore=682.2(11.87)−0.97(0.59)×size+5.6(19.51)×HiEL−1.28(0.97)×(size×HiEL). The estimated regression line for districts with a low fraction of English learners (HiELi=0) is ^TestScore=682.2−0.97×sizei.
For districts with a high fraction of English learners we have
^TestScore=682.2+5.6−0.97×sizei−1.28×sizei=687.8−2.25×sizei.The predicted increase in test scores following a reduction of the student-teacher ratio by 1 unit is about 0.97 points in districts where the fraction of English learners is low but 2.25 in districts with a high share of English learners. From the coefficient on the interaction term size×HiEL we see that the difference between both effects is 1.28 points.
The next code chunk draws both lines belonging to the model. In order to make observations with HiEL=0 distinguishable from those with HiEL=1, we use different colors.
# identify observations with PctEL >= 10
id <- CASchools$english >= 10
# plot observations with HiEL = 0 as red dots
plot(CASchools$size[!id], CASchools$score[!id],
xlim = c(0, 27),
ylim = c(600, 720),
pch = 20,
col = "red",
main = "",
xlab = "Class Size",
ylab = "Test Score")
# plot observations with HiEL = 1 as green dots
points(CASchools$size[id], CASchools$score[id],
pch = 20,
col = "green")
# read out estimated coefficients of bci_model
coefs <- bci_model$coefficients
# draw the estimated regression line for HiEL = 0
abline(coef = c(coefs[1], coefs[2]),
col = "red",
lwd = 1.5)
# draw the estimated regression line for HiEL = 1
abline(coef = c(coefs[1] + coefs[3], coefs[2] + coefs[4]),
col = "green",
lwd = 1.5 )
# add a legend to the plot
legend("topright",
pch = c(20, 20),
col = c("red", "green"),
legend = c("HiEL = 0", "HiEL = 1"))
Hide Source
Hide Plot
Interactions Between Two Continuous Variables
Consider a regression model with Y the log earnings and two continuous regressors X1, the years of work experience, and X2, the years of schooling. We want to estimate the effect on wages of an additional year of work experience depending on a given level of schooling. This effect can be assessed by including the interaction term (X1i×X2i) in the model:
ΔYi=β0+β1×X1i+β2×X2i+β3×(X1i×X2i)+ui
Following Key Concept 8.1 we find that the effect on Y of a change on X1 given X2 is ΔYΔX1=β1+β3X2.
In the earnings example, a positive β3 implies that the effect on log earnings of an additional year of work experience grows linearly with years of schooling. Vice versa we have ΔYΔX2=β2+β3X1 as the effect on log earnings of an additional year of schooling holding work experience constant.
Altogether we find that β3 measures the effect of a unit increase in X1 and X2 beyond the effects of increasing X1 alone and X2 alone by one unit. The overall change in Y is thus
Yi=(β1+β3X2)ΔX1+(β2+β3X1)ΔX2+β3ΔX1ΔX2.Key Concept 8.5 summarizes interactions between two regressors in multiple regression.
Key Concept 8.5
Interactions in Multiple Regression
The interaction term between the two regressors X1 and X2 is given by their product X1×X2. Adding this interaction term as a regressor to the model Yi=β0+β1X1+β2X2+ui allows the effect on Y of a change in X2 to depend on the value of X1 and vice versa. Thus the coefficient β3 in the model Yi=β0+β1X1+β2X2+β3(X1×X2)+ui measures the effect of a one-unit increase in both X1
8.3.0.1 Application to the Student-Teacher Ratio and the Percentage of English Learners
We now examine the interaction between the continuous variables student-teacher ratio and the percentage of English learners.
# estimate regression model including the interaction between 'PctEL' and 'size'
cci_model <- lm(score ~ size + english + english * size, data = CASchools)
# print a summary to the console
coeftest(cci_model, vcov. = vcovHC, type = "HC1")
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 686.3385268 11.7593466 58.3654 < 2e-16 ***
## size -1.1170184 0.5875136 -1.9013 0.05796 .
## english -0.6729119 0.3741231 -1.7986 0.07280 .
## size:english 0.0011618 0.0185357 0.0627 0.95005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated model equation is ^TestScore=686.3(11.76)−1.12(0.59)×STR−0.67(0.37)×PctEL+0.0012(0.02)×(STR×PctEL).
For the interpretation, let us consider the quartiles of PctEL.
summary(CASchools$english)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.941 8.778 15.768 22.970 85.540
According to (8.2), if PctEL is at its median value of 8.78, the slope of the regression function relating test scores and the student teacher ratio is predicted to be −1.12+0.0012×8.78=−1.11. This means that increasing the student-teacher ratio by one unit is expected to deteriorate test scores by 1.11 points. For the 75% quantile, the estimated change on TestScore of a one-unit increase in STR is estimated by −1.12+0.0012×23.0=−1.09 so the slope is somewhat lower. The interpretation is that for a school district with a share of 23% English learners, a reduction of the student-teacher ratio by one unit is expected to increase test scores by only 1.09 points.
However, the output of summary() indicates that the difference of the effect for the median and the 75% quantile is not statistically significant. H0:β3=0 cannot be rejected at the 5% level of significance (the p-value is 0.95).
Example: The Demand for Economic Journals
In this section we replicate the empirical example presented at pages 336 - 337 of the book. The central question is: how elastic is the demand by libraries for economic journals? The idea here is to analyze the relationship between the number of subscription to a journal at U.S. libraries and the journal’s subscription price. The study uses the data set Journals which is provided with the AER package and contains observations for 180 economic journals for the year 2000. You can use the help function (?Journals
) to get more information on the data after loading the package.
# load package and the data set
library(AER)
data("Journals")
We measure the price as “price per citation” and compute journal age and the number of characters manually. For consistency with the book we also rename the variables.
# define and rename variables
Journals$PricePerCitation <- Journals$price/Journals$citations
Journals$Age <- 2000 - Journals$foundingyear
Journals$Characters <- Journals$charpp * Journals$pages/10^6
Journals$Subscriptions <- Journals$subs
The range of “price per citation” is quite large:
# compute summary statistics for price per citation
summary(Journals$PricePerCitation)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.005223 0.464495 1.320513 2.548455 3.440171 24.459459
The lowest price observed is a mere 0.5¢ per citation while the highest price is more than 20¢ per citation.
We now estimate four different model specifications. All models are log-log models. This is useful because it allows us to directly interpret the coefficients as elasticities, see Key Concept 8.2. (I) is a linear model. To alleviate a possible omitted variable bias, (II) augments (I) by the covariates ln(Age) and ln(Characters). The largest model (III) attempts to capture nonlinearities in the relationship of ln(Subscriptions) and ln(PricePerCitation) using a cubic regression function of ln(PricePerCitation) and also adds the interaction term (PricePerCitation×Age) while specification (IV) does not include the cubic term.
(I)ln(Subscriptionsi)=β0+β1ln(PricePerCitationi)+ui(II)ln(Subscriptionsi)=β0+β1ln(PricePerCitationi)+β4ln(Agei)+β6ln(Charactersi)+ui(III)ln(Subscriptionsi)=β0+β1ln(PricePerCitationi)+β2ln(PricePerCitationi)2+β3ln(PricePerCitationi)3+β4ln(Agei)+β5[ln(Agei)×ln(PricePerCitationi)]+β6ln(Charactersi)+ui(IV)ln(Subscriptionsi)=β0+β1ln(PricePerCitationi)+β4ln(Agei)+β6ln(Charactersi)+ui# Estimate models (I) - (IV)
Journals_mod1 <- lm(log(Subscriptions) ~ log(PricePerCitation),
data = Journals)
Journals_mod2 <- lm(log(Subscriptions) ~ log(PricePerCitation)
+ log(Age) + log(Characters),
data = Journals)
Journals_mod3 <- lm(log(Subscriptions) ~
log(PricePerCitation) + I(log(PricePerCitation)^2)
+ I(log(PricePerCitation)^3) + log(Age)
+ log(Age):log(PricePerCitation) + log(Characters),
data = Journals)
Journals_mod4 <- lm(log(Subscriptions) ~
log(PricePerCitation) + log(Age)
+ log(Age):log(PricePerCitation) +
log(Characters),
data = Journals)
Using summary(), we obtain the following estimated models:
(I)^ln(Subscriptionsi)=4.77−0.53ln(PricePerCitationi)(II)^ln(Subscriptionsi)=3.21−0.41ln(PricePerCitationi)+0.42ln(Agei)+0.21ln(Charactersi)(III)^ln(Subscriptionsi)=3.41−0.96ln(PricePerCitationi)+0.02ln(PricePerCitationi)2+0.004ln(PricePerCitationi)3+0.37ln(Agei)+0.16[ln(Agei)×ln(PricePerCitationi)]+0.23ln(Charactersi)(IV)^ln(Subscriptionsi)=3.43−0.90ln(PricePerCitationi)+0.37ln(Agei)+0.14[ln(Agei)×ln(PricePerCitationi)]+0.23ln(Charactersi)We use an F-Test to test if the transformations of ln(PricePerCitation) in Model (III) are statistically significant.
# F-Test for significance of cubic terms
linearHypothesis(Journals_mod3,
c("I(log(PricePerCitation)^2)=0", "I(log(PricePerCitation)^3)=0"),
vcov. = vcovHC, type = "HC1")
## Linear hypothesis test
##
## Hypothesis:
## I(log(PricePerCitation)^2) = 0
## I(log(PricePerCitation)^3) = 0
##
## Model 1: restricted model
## Model 2: log(Subscriptions) ~ log(PricePerCitation) + I(log(PricePerCitation)^2) +
## I(log(PricePerCitation)^3) + log(Age) + log(Age):log(PricePerCitation) +
## log(Characters)
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 175
## 2 173 2 0.1943 0.8236
Clearly, we cannot reject the null hypothesis H0:β3=β4=0 in model (III).
We now demonstrate how the function stargazer() can be used to generate a tabular representation of all four models.
# load the stargazer package
library(stargazer)
# gather robust standard errors in a list
rob_se <- list(sqrt(diag(vcovHC(Journals_mod1, type = "HC1"))),
sqrt(diag(vcovHC(Journals_mod2, type = "HC1"))),
sqrt(diag(vcovHC(Journals_mod3, type = "HC1"))),
sqrt(diag(vcovHC(Journals_mod4, type = "HC1"))))
# generate a Latex table using stargazer
stargazer(Journals_mod1, Journals_mod2, Journals_mod3, Journals_mod4,
se = rob_se,
digits = 3,
column.labels = c("(I)", "(II)", "(III)", "(IV)"))
Dependent Variable: Logarithm of Subscriptions | ||||
log(Subscriptions) | ||||
(I) | (II) | (III) | (IV) | |
log(PricePerCitation) | -0.533*** | -0.408*** | -0.961*** | -0.899*** |
(0.034) | (0.044) | (0.160) | (0.145) | |
I(log(PricePerCitation)2) | 0.017 | |||
(0.025) | ||||
I(log(PricePerCitation)3) | 0.004 | |||
(0.006) | ||||
log(Age) | 0.424*** | 0.373*** | 0.374*** | |
(0.119) | (0.118) | (0.118) | ||
log(Characters) | 0.206** | 0.235** | 0.229** | |
(0.098) | (0.098) | (0.096) | ||
log(PricePerCitation):log(Age) | 0.156*** | 0.141*** | ||
(0.052) | (0.040) | |||
Constant | 4.766*** | 3.207*** | 3.408*** | 3.434*** |
(0.055) | (0.380) | (0.374) | (0.367) | |
Observations | 180 | 180 | 180 | 180 |
R2 | 0.557 | 0.613 | 0.635 | 0.634 |
Adjusted R2 | 0.555 | 0.607 | 0.622 | 0.626 |
Residual Std. Error | 0.750 (df = 178) | 0.705 (df = 176) | 0.691 (df = 173) | 0.688 (df = 175) |
F Statistic | 224.037*** (df = 1; 178) | 93.009*** (df = 3; 176) | 50.149*** (df = 6; 173) | 75.749*** (df = 4; 175) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
Table 8.1: Nonlinear Regression Models of Journal Subscribtions
The subsequent code chunk reproduces Figure 8.9 of the book.
# divide plotting area
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(m)
# scatterplot
plot(Journals$PricePerCitation,
Journals$Subscriptions,
pch = 20,
col = "steelblue",
ylab = "Subscriptions",
xlab = "ln(Price per ciation)",
main = "(a)")
# log-log scatterplot and estimated regression line (I)
plot(log(Journals$PricePerCitation),
log(Journals$Subscriptions),
pch = 20,
col = "steelblue",
ylab = "ln(Subscriptions)",
xlab = "ln(Price per ciation)",
main = "(b)")
abline(Journals_mod1,
lwd = 1.5)
# log-log scatterplot and regression lines (IV) for Age = 5 and Age = 80
plot(log(Journals$PricePerCitation),
log(Journals$Subscriptions),
pch = 20,
col = "steelblue",
ylab = "ln(Subscriptions)",
xlab = "ln(Price per ciation)",
main = "(c)")
JM4C <-Journals_mod4$coefficients
# Age = 80
abline(coef = c(JM4C[1] + JM4C[3] * log(80),
JM4C[2] + JM4C[5] * log(80)),
col = "darkred",
lwd = 1.5)
# Age = 5
abline(coef = c(JM4C[1] + JM4C[3] * log(5),
JM4C[2] + JM4C[5] * log(5)),
col = "darkgreen",
lwd = 1.5)
Hide Source
Hide Plot
As can be seen from plots (a) and (b), the relation between subscriptions and the citation price is adverse and nonlinear. Log-transforming both variables makes it approximately linear. Plot (c) shows that the price elasticity of journal subscriptions depends on the journal’s age: the red line shows the estimated relationship for Age=80 while the green line represents the prediction from model (IV) for Age=5.
Which conclusion can be drawn?
We conclude that the demand for journals is more elastic for young journals than for old journals.
For model (III) we cannot reject the null hypothesis that the coefficients on ln(PricePerCitation)2 and ln(PricePerCitation)3 are both zero using an F-test. This is evidence compatible with a linear relation between log-subscriptions and log-price.
Demand is greater for Journals with more characters, holding price and age constant.
Altogether our estimates suggest that the demand is very inelastic, i.e., the libraries’ demand for economic journals is quite insensitive to the price: using model (IV), even for a young journal (Age=5) we estimate the price elasticity to be −0.899+0.374×ln(5)+0.141×[ln(1)×ln(5)]≈−0.3 so a one percent increase in price is predicted to reduce the demand by only 0.3 percent.
This finding comes at no surprise since providing the most recent publications is a necessity for libraries.
Appending HiEL * HiSTR to the formula will add HiEL, HiSTR and their interaction as regressors while HiEL:HiSTR only adds the interaction term.↩