Chapter 4 Assessing the overall (cumulative) effect of multiple exposures

Extensions of linear regression presented in the previous chapter address the complexity of the mixture-outcome association by selecting relevant predictors within the mixture, thus removing covariates that would create problems due to high collinearity, or simply by reducing the dimension of the exposure matrix thus improving the fit of the model. This approach, however, also comes with relevant drawbacks.

Let’s think of the group of highly correlated exposures from our hypothetical example (\(X_3-X_4-X_5\)), where penalized approaches recommended only selecting \(X_4\). This allowed evaluating the independent effect of \(X_4\) on the outcome without being troubled by the high levels of correlation between this covariate and the other two of the cluster. This same selection, however, is preventing us to address other important questions. For example, what if there is an interaction between \(X_3\) and \(X_4\) (this can happen even if \(X_3\) does not have an independent effect on the outcome, but only an effect that is triggered in the presence of the other co-exposure)? By removing \(X_3\) from the model, we will not be able to evaluate this interaction. Moreover, we will not be able to correctly quantify the joint effect of \(X_3\) and \(X_4\), which is the sum of the two main effects and their 2-way interaction. As discussed in the first chapter, this is a very important research question; the three correlated exposures might for instance come from the same source, and quantifying their joint effect would in this case provide useful information on the public health benefits of reducing exposure to the source.

The question that we will address in this section is the following: how do we quantify the joint effect of several exposures, possibly highly correlated, when regression techniques are not functional?