1.4 Introduction to R packaged and simulated data

All methods presented in this book are implemented in the R statistical software. R is a free statistical software environment that allows you to write your own code and packages, sharing them as open sources. For this reason several recently developed approaches for environmental mixtures are only available in R. Most R packages are accompanied by online tutorials and vignettes that describe all features of the library and provide illustrative examples and explanations. We refer to those documents for the technical information of the R packages, and only brieflt discuss methods implementation and results interpretation. The following packages will be used:

Packages <- c("readxl", "bkmr", "qgraph", "gWQS", "qgcomp", "corrplot", 
              "cluster","factoextra","gridExtra","table1","glmnet")
lapply(Packages, library, character.only = TRUE)

As an illustrative example a simulated dataset that was developed for the 2015 NIEHS workshop previously mentioned and made publicly available will be used throughout the text. The dataset is available online. The data includes a mixture of 14 continuous exposures, (\(X_1-X_{14}\)), a continuous outcome \(Y\), and 3 additional covariates (\(Z_1-Z_3\)).

Chemical concentrations were generated based on the correlation between log-transformed polychlorinated biphenyls (PCB), dioxins, and furans, from NHANES data. Two clusters of highly-correlated covariates were present (\(X_3-X_4-X_5\), and \(X_{12}- X_{13}\), while low to moderate correlations were simulated between other covariates. \(Z_1\) and \(Z_2\) were simulated based on poverty index and age, both assumed to be confounders of the association. \(Z_3\) was simulated based on gender distribution, and assumed to be an effect modifier. The outcome was generated with the following functions for male and female, respectively:

\(Z_3=0: E[Y]=3 + 0.05\cdot X_4 + 0.1\cdot X_6 + 0.1\cdot X_{11} + 0.5\cdot X_{12} + 0.1\cdot X_{14} + 0.01\cdot Z_1 + 0.003\cdot Z_2\)

\(Z_3=1: E[Y]=3 + 0.01\cdot X_1 + 0.05\cdot X_4 + 0.1\cdot X_{11} + 0.1\cdot X_{14} + 0.01\cdot Z_1 + 0.003\cdot Z_2 – 0.32\cdot (Z_3=1)\)

Thus, for \(Z_3=0\) only \(X_4, X_6, X_{11}, X_{12}\) and \(X_{14}\) are positively associated with \(Y\). When \(Z_3=1\), only \(X_1, X_4, X_{11}\) and \(X_{14}\) are associated with \(Y\). Interactions between chemicals were not considered.