1.3 Broad classification(s) of statistical approaches
Over the last few years several papers have reviewed the existing literature on statistical methods for mixtures and provide different criteria for their classifications. Among these, two recommended readings are Hamra and Buckley (2018) and Stafoggia et al. (2017). Simple and relevant classification criteria are the following:
- Supervised vs unsupervised procedures
This first distinction refers to whether or not the mixture is evaluated by taking into account its association with a given outcome of interest. We will discuss in Section 2 that, before evaluating the effects of our exposures on health outcomes, it is important to carefully assess the features of the mixture, especially when this is composed by a high number of components, investigating its correlations structure and identifying the presence of subgroups or clusters of exposures. To this end, we turn to unsupervised techniques that directly focus on characterizing the complex mixture of exposures without any reference to a given outcome of interest such as principal component analysis. Supervised techniques, on the other hand, attempt to account for the complex nature of exposures while investigating a given mixture-outcome association.
- Data reduction vs variable selection techniques.
The common goal of all approaches that we will discuss is to reduce the complexity of the data to be able to assess mixtures-outcome associations while losing as little information as possible. This is broadly done in two ways: by summarizing the original exposures into fewer covariates, or by selecting targeted elements of the mixture. We can use the term “data reduction approaches” to describe those techniques that reduce the dimension of the mixture by generating new variables (scores, components, indexes ). On the other hand, methodologies that select specific elements of the mixture that are directly evaluated with respect to the outcome can be defined as “variable selection approaches.”