3.3 Other regression-based approaches

Before moving on to the a general discussion on advantages and limitations of regression-based approaches, and introduce and motivate further methods for environmental mixtures, it is useful to provide a broad overview of some alternative approaches based on, or derived from, classical regression that have proven useful in this context.

3.3.1 Hierarchical linear models

Hierarchical modeling allows improving performances of a multiple regression model when clustering of exposures can be clearly identified. Application of this approach for multiple exposures was first introduced to evaluate the effect of antiretroviral treatments in HIV epidemiology, where several drugs belonging to clearly defined drug classes are usually defined (Correia and Williams (2019)). In brief, the model incorporates first-stage effects for each drug class, and second-stage effects for individual drugs, assuming that the effect of each drug is the summation of the (fixed) effect of its drug class and a residual effect specific to the individual drug. Assuming that we can identify (or observe from preliminary analysis such as a PCA) well characterized subgroups of environmental exposures, this modeling technique can be used to improve the performance of multiple regression when focusing on environmental mixtures. Potential advantages include the absence of variable selection and shrinkage, thus allowing a better interpretation of results.

3.3.2 Partial least square regression

The Partial least square (PLS) regression can be seen as a method that generalizes and combines PCA and multiple regression. PLS regression is very useful to predict dependent variables from a very large number of predictors that might be highly correlated. The PLS regression replaces the initial independent variable space (X) and the initial response variable space (Y) by smaller spaces that rely on a reduced number of variables named latent variables, which are included one by one in an iterative process. The sparse PLS (sPLS) regression, in particular, is an extension of PLS that aims at combining variable selection and modeling in a one-step procedure (Lê Cao et al. (2008)). Components are defined iteratively such that they explain as much as possible of the remaining covariance between the predictors and the outcome. The sPLS approach simultaneously yields good predictive performance and appropriate variable selection by creating sparse linear combinations of the original predictors. Sparsity is induced by including a penalty in the estimation of the linear combination coefficients, that is, all coefficients with an absolute value lower than some fraction η of the maximum absolute coefficient are shrunk to zero. Only the first K components are included as covariates in a linear regression model. sPLS is available in the R package spls , documented here. A detailed illustration of using sPLS in environmental epidemiology can be found in Lenters et al. (2015).


Correia, Katharine, and Paige L Williams. 2019. “A Hierarchical Modeling Approach for Assessing the Safety of Exposure to Complex Antiretroviral Drug Regimens During Pregnancy.” Statistical Methods in Medical Research 28 (2): 599–612.
Lenters, Virissa, Lützen Portengen, Lidwien AM Smit, Bo AG Jönsson, Aleksander Giwercman, Lars Rylander, Christian H Lindh, et al. 2015. “Phthalates, Perfluoroalkyl Acids, Metals and Organochlorines and Reproductive Function: A Multipollutant Assessment in Greenlandic, Polish and Ukrainian Men.” Occupational and Environmental Medicine 72 (6): 385–93.
Lê Cao, Kim-Anh, Debra Rossouw, Christele Robert-Granié, and Philippe Besse. 2008. “A Sparse PLS for Variable Selection When Integrating Omics Data.” Statistical Applications in Genetics and Molecular Biology 7 (1).