6.4 Final remarks

The goal of this introductory course (and book) was to discuss the challenges involved in the study of environmental mixtures as they relate to health outcomes, and introduce the most common statistical approaches that can help addressing some of these challenges. The core points of the discussion were the following:

  • Environmental mixtures represent the way environmental exposures occur and operate in real life and, as such, should be integrated and evaluated in environmental epidemiology studies. This involves a series of analytic and statistical issues that should be carefully addressed and taken into account.

  • A first critical classification of statistical methodologies it the one between supervised and unsupervised approaches. It is always recommended to begin the analyses with a thorough pre-processing phase that involves unsupervised techniques. These will help identifying clustering and groupings of exposures, high levels of correlations, missingness, and the presence of inflated covariates, crucially guiding subsequent steps.

  • When incorporating the outcome into the picture (supervised approaches) it it always recommended to begin with regression-based approaches. These provide unique advantages and most of the times will provide a good fit to the question of interest.

  • Specific methods have been developed to address environmental mixtures when regression techniques are of little use or more flexible approaches are required. This occurs, for example, when high-dimensional interactions are of interests, if most associations are non-linear, or if the primary interest is in retrieving the cumulative mixture effect. Generally, all techniques come with some limitations and it is always recommended to test several methods and validate results providing different perspectives.

  • With a very large number of exposures and/or interaction, machine learning (ML) techniques should be considered. Recent extensions of random forests such as gradient boosting machines (or boosted regression trees) provide several advantages in this context. Proceeding with different layers of analysis, using ML results to build a second-step regression model is recommended.

  • Most current methods are available and well documented/presented in the statistical software R.

  • In general, when dealing with environmental exposures, the choice of the methods should be drive by the research question of interest.

    • Are there repeated exposure patterns? (unsupervised analysis, e.g. PCA)
    • What are the effects of individual exposures within the mixture? (regression methods, BKMR)
    • What are the most important contributors to the association? (PIPs in BKMR, weights from WQS, selected covariates in elastic net …)
    • What is the overall (cumulative) effect of the mixture? (regression methods, WQS)
    • Are there interactions (or even synergy) between chemicals? (tree-based modeling, BKMR, regression methods)

Several papers have been published that discuss different techniques and provide further guidance to choose the correct approach. Recommended reading in this context include Hamra and Buckley (2018), Stafoggia et al. (2017), and Gibson et al. (2019).

Finally, it is useful to remind that the material here presented is just a selection of topics out of a wide and fast-growing research area. Methodological extensions and new applications are continuously published and it is crucial for researchers working in this area to keep up with the literature.


Gibson, Elizabeth A, Yanelli Nunez, Ahlam Abuawad, Ami R Zota, Stefano Renzetti, Katrina L Devick, Chris Gennings, Jeff Goldsmith, Brent A Coull, and Marianthi-Anna Kioumourtzoglou. 2019. “An Overview of Methods to Address Distinct Research Questions on Environmental Mixtures: An Application to Persistent Organic Pollutants and Leukocyte Telomere Length.” Environmental Health 18 (1): 1–16.
Hamra, Ghassan B, and Jessie P Buckley. 2018. “Environmental Exposure Mixtures: Questions and Methods to Address Them.” Current Epidemiology Reports 5 (2): 160–65.
Stafoggia, Massimo, Susanne Breitner, Regina Hampel, and Xavier Basagaña. 2017. “Statistical Approaches to Address Multi-Pollutant Mixtures and Multiple Exposures: The State of the Science.” Current Environmental Health Reports 4 (4): 481–90.