5 Correlation analysis
In order to understand the relationships between the variables in the dataset, I performed a correlation analyses across the entire dataset of exposure variables in both men and women using the discovery datasets. Correlations were calculated in each imputed dataset first, the coefficients were transformed using Fisher’s Z transformation, and then the transformed coefficients were pooled across all imputed datasets. Coefficients were then re-transformed to the r-scale after pooling. I used the polychor package in R, which allowed me to perform a single correlation analysis on a heterogeneous set of variables that include continuous, dichotomous, and categorical variables. Specifically, the hetcor function automatically performs Pearson product-moment correlations between continuous variables, polyserial correlations between continuous and ordinal variables, and polychoric correlations between ordinal variables. A heatmap is shown below for both men and women that visualizes the correlations between all variables and automatically clusters correlated variables into dendograms according to the euclidean distance between the correlation coefficients. These heatmaps are interactive. Clicking or hovering over a single cell will show the row and column it corresponds to. Furthermore, given the high dimensionality of these plots, it will be very useful to zoom in to different regions to explore significant correlations. Generally, two variables with a correlation coefficients above 0.5 are considered correlated.