5.8 Types of predictor variables

In the discussion so far, we have included multiple predictors in a model without making any explicit distinction between their roles. This section assumes there is a single primary predictor of interest for which you would like to test the association with the outcome. In this context, we discuss the distinctions between the roles of other predictors, which might be confounders, mediators, or moderators, and illustrate these distinctions using causal diagrams.

5.8.1 Confounder

In many studies, there is a primary predictor of interest and the goal is to obtain an unbiased estimate of the effect of that predictor on the outcome. The causal diagram, ignoring all other variables, is illustrated in Figure 5.7.

Box with the word predictor inside with an arrow pointing to another box with the word outcome inside

Figure 5.7: Association between predictor and outcome

However, especially in an observational study, there may be confounders – variables that are associated with both the predictor and the outcome and are not in the causal pathway from predictor to outcome, as illustrated in Figure 5.8.

Box with the word predictor inside with an arrow pointing to another box with the word outcome inside. A third box with the word confounder inside has arrows pointing to both the predictor and the outcome

Figure 5.8: Confounded association

Suppose the confounder is positively associated with both the predictor and the outcome, but the predictor is not actually associated with the outcome. Failure to adjust for confounding may lead to the spurious (incorrect) conclusion that the predictor is associated with the outcome; differences in the outcome corresponding to differences in the predictor are actually due to differences in the confounder. When there is an association between the predictor and outcome, failure to adjust for confounding may result in not identifying that association, or over- or underestimating it. Including the confounder in the regression model along with the predictor is an attempt to adjust for confounding. Confounding can be adjusted for in the study design (e.g., matching, randomization, restriction of the scope) or analysis (e.g., stratification, standardization, regression adjustment). In this text, we focus only on regression adjustment.

For example, suppose a researcher’s goal is to estimate the effect of weight loss on metabolic syndrome, defined as having at least three of the following five risk factors: (1) large waist circumference, (2) high blood pressure or taking blood pressure medication, (3) elevated triglycerides, (4) elevated fasting glucose or taking medication to lower glucose, and (5) low high-density lipoprotein (for specific cutoffs, see http://my.clevelandclinic.org/health/articles/metabolic-syndrome, accessed January 4, 2021). However, the effect of weight loss on metabolic syndrome may be confounded by income, as illustrated in Figure 5.9.

Box with the words weight loss  inside, the predictor,  with an arrow pointing to another box with the words metabolic syndrome inside, the outcome. A third box with the word income inside, the mediator, has arrows pointing to both the predictor and the outcome

Figure 5.9: Income confounding the association between weight loss and metabolic syndrome

Individuals with fewer financial resources may find it more difficult to lose weight and also may be more likely to have poor metabolic characteristics. Thus, the effect of weight loss is confounded with the effect of income. In order to estimate the true (unconfounded) association between weight loss and metabolic syndrome in an observational study of individuals spanning a range of income levels, you need to adjust for potential confounding due to income. Failure to adjust for income may result in obtaining a biased estimate of association; part of the unadjusted “weight loss effect” may, in fact, be an “income effect”.

5.8.2 Mediator

A mediator is like a confounder in that it is associated with both the predictor and the outcome. However, unlike a confounder, it is in the causal pathway. When mediation is present, the predictor typically has both a direct effect on the outcome (not through the mediator) and an indirect effect (through the mediator), as illustrated in Figure 5.10. If you are interested in the total effect of the predictor on the outcome, regardless of the pathway, then do not adjust for a mediator; doing so would bias the predictor’s effect estimate by removing some of the effect you are actually interested in.

Box with the word predictor inside with an arrow pointing to another box with the word outcome inside. The predictor box has an arrow pointing to a third box with the word mediator  inside, and the mediator box has an arrow  pointing to the outcome

Figure 5.10: Mediated association

For example, adipocytokines may mediate the effect of weight loss on metabolic syndrome (Matsuzawa 2006; Rolland, Hession, and Broom 2011), as illustrated in Figure 5.11.

Box with the words weight loss inside, the predictor,  with an arrow pointing to another box with the words metabolic syndrom inside, the outcome. The predictor box has an arrow pointing to a third box with the word adipocytokines inside, the mediator, and the mediator box has an arrow  pointing to the outcome

Figure 5.11: Adipocytokines mediating the association between weight loss and metabolic syndrome

Weight loss leads to an improvement in the characteristics that define metabolic syndrome, in part, due to its effect on levels of adipocytokines. Adjusting for adipocytokine levels would result in attenuating the estimate of the total effect of weight loss since you would be removing part of its effect (the indirect effect through adipocytokines). Therefore, if you are interested in the total effect of weight loss, do not adjust for adipocytokine levels. Compare this to the role of income as a confounder in Figure 5.9. Income precedes weight loss in the causal pathway – the effect of weight loss on metabolic syndrome is not through changes in income, but might be explained by differences in income between those who differ in weight loss. Thus, in these examples, income is a confounder while adipocytokine level is a mediator.

Another example is related to the study of health disparities. Should you “control for” socioeconomic status (SES) when studying racial disparities in health outcomes? In such a study, the health outcomes of individuals of different race/ethnicities are compared. “Race/ethnicity,” a social not biological construct, acts as a proxy for structural racism, the actual reason for disparities (American Medical Association 2020). In the U.S., SES is correlated with both race/ethnicity and health outcomes which may lead one to believe it is a confounder that should be adjusted for. However, SES is in the causal pathway between racism and health outcomes and therefore is a mediator of their association (see Figure 5.12).

Box with the words race/ethnicity (racism) inside, the predictor, with an arrow pointing to another box with the words health outcomes inside, the outcome. The predictor box has an arrow pointing to a third box with the words socioeconomic status  inside, the mediator, and the mediator box has an arrow  pointing to the outcome

Figure 5.12: Socioeconomic status mediating racial disparities in health outcomes

If you “control for SES” you will remove part of the effect you are trying to estimate and underestimate disparities in health outcomes, perhaps even concluding there are no disparities. For example, Yehia et al. (2020), after adjusting for a number of demographic variables, conclude that race/ethnicity is not associated with COVID-19 mortality. However, Katikireddi et al. (2021) contend that this conclusion is in error exactly because the analysis adjusted for mediators. See also, for example, Meghani and Chittams (2015) regarding SES, and Zalla et al. (2021) and Schnake-Mahl and Bilal (2021) regarding the role of geography as a mediator of racial disparities in COVID-19 mortality.

Investigating the nature and magnitude of mediation, and decomposing the total effect into its direct and indirect components, is the realm of mediation analysis and is beyond the scope of this text (see, for example, Hayes (2022)). However, even if you are interested only in the total effect, it is vital to understand the distinction between mediators and confounders, and to not include mediators in a regression model — including a mediator will adjust out part of the very effect you are trying to estimate.

5.8.3 Moderator

In the above examples, there is a single predictor effect. In the case of confounding, the effect is obscured but there is still just one effect and the solution is to adjust for the confounder in the design or analysis. In the case of mediation, the effect is in part due to another variable but, again, there is still just a single effect of interest. Some variables, however, are moderators (or effect modifiers) – the effect of the predictor on the outcome depends on and varies with the level of the moderator. The predictor does not have a single effect, but rather a range of effects spanning the range of values of the moderator, as illustrated in Figure 5.13. The multiple lines going from Predictor to Outcome correspond to multiple magnitudes of association, with values that depend on the value of the moderator.

Box with the word predictor inside with multiple arrows pointing to another box with the word outcome inside. A third box with the word moderator inside has an arrow pointing down to the multiple arrows connecting the predictor to the outcome

Figure 5.13: Moderated association

A moderator in a regression model is a term that is involved in an interaction (discussed in Section 5.9). In a regression model, include both the moderator and its interaction with the predictor.

For example, the effect of weight loss on metabolic syndrome may be moderated by baseline metabolic characteristics (see Figure 5.14) . Weight loss might have a greater impact among individuals with more room for improvement. By including the baseline measurement and a baseline \(\times\) weight loss interaction in the regression model, you can estimate how the weight loss effect varies between those with different baseline metabolic characteristics.

Box with the words weight loss inside, the predictor, with multiple arrows pointing to another box with the words metabolic syndrom inside, the outcome. A third box with the words baseline metabolic characteristics inside, the moderator, has an arrow pointing down to the multiple arrows connecting the predictor to the outcome

Figure 5.14: Baseline values moderating the association between weight loss and metabolic syndrome

Including a variable as a moderator also takes care of any confounding bias due to that variable, but in a different way than when including a confounder without an interaction. Including an interaction is similar to stratifying the analysis by another variable and estimating the effect of a variable within levels of another. If you stratify a regression analysis by a confounder that is not a moderator, then within each stratum you would get (approximately) the same effect. The regression coefficients would be approximately the same between strata (exactly the same in theory, but in practice they would vary, just not meaningfully). That single within-strata effect may be different than the effect if you ignored the confounder, but it would be the same within strata. By stratifying, you are removing the confounding – the confounder does not vary within strata so is no longer associated with the predictor or outcome within strata. When you “adjust” for a confounder in a regression, this is sort of what is happening, but mathematically it is different than stratifying. If you were to stratify your analysis by a moderator, however, then you would get different effects between strata – the regression coefficient for the predictor would vary between strata.

References

American Medical Association. 2020. “New AMA Policies Recognize Race as a Social, Not Biological, Construct.” AMA Press Releases. www.ama-assn.org/press-center/press-releases/new-ama-policies-recognize-race-social-not-biological-construct.
Hayes, Andrew F. 2022. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. 3rd ed. New York, NY: The Guilford Press.
Katikireddi, Srinivasa Vittal, Sham Lal, Enitan D. Carrol, Claire L. Niedzwiedz, Kamlesh Khunti, Ruth Dundas, Finn Diderichsen, and Ben Barr. 2021. “Unequal Impact of the COVID-19 Crisis on Minority Ethnic Groups: A Framework for Understanding and Addressing Inequalities.” J Epidemiol Community Health 75 (10): 970–74. https://doi.org/10.1136/jech-2020-216061.
Matsuzawa, Y. 2006. “The Metabolic Syndrome and Adipocytokines.” FEBS Lett 580 (12): 2917–21. https://doi.org/10.1016/j.febslet.2006.04.028.
Meghani, Salimah H., and Jesse Chittams. 2015. “Controlling for Socioeconomic Status in Pain Disparities Research: All-Else-Equal Analysis When All Else Is Not Equal.” Pain Medicine 16 (12): 2222–25. https://doi.org/10.1111/pme.12829.
Rolland, C., M. Hession, and I. Broom. 2011. “Effect of Weight Loss on Adipokine Levels in Obese Patients.” Diabetes Metab Syndr Obes 4: 315–23. https://doi.org/10.2147/DMSO.S22788.
Schnake-Mahl, Alina S, and Usama Bilal. 2021. “Schnake-Mahl and Bilal Respond to “Structural racism and COVID-19 mortality in the US.” American Journal of Epidemiology 190 (8): 1447–51. https://doi.org/10.1093/aje/kwab058.
Yehia, Baligh R., Angela Winegar, Richard Fogel, Mohamad Fakih, Allison Ottenbacher, Christine Jesser, Angelo Bufalino, Ren-Huai Huang, and Joseph Cacchione. 2020. “Association of Race with Mortality Among Patients Hospitalized with Coronavirus Disease 2019 (COVID-19) at 92 U.S. Hospitals.” JAMA Network Open 3 (8): e2018039. https://doi.org/10.1001/jamanetworkopen.2020.18039.
Zalla, Lauren C, Chantel L Martin, Jessie K Edwards, Danielle R Gartner, and Grace A Noppert. 2021. “A Geography of Risk: Structural Racism and Coronavirus Disease 2019 Mortality in the United States.” American Journal of Epidemiology 190 (8): 1439–46. https://doi.org/10.1093/aje/kwab059.