6.2 How many factors should we retain?

The goal of principal component analysis is to reduce the number of dimensions that describe our data, without losing too much information. The first step in principal component analysis is to decide upon the number of principal components or factors we want to retain. To help us decide, we’ll use the pre_factor function from the radiant package:

install.packages("radiant")
library(radiant)
# store the names of the dimensions in a vector so that we don't have to type them again and again
dimensions <- c("prevents_cavities", "shiny_teeth", "strengthens_gums", "freshens_breath", "decay_prevention_unimportant", "attractive_teeth") 

# hint: we could also do this as follows:
# dimensions <- toothpaste %>% select(-consumer, -gender, -age) %>% names()

summary(pre_factor(toothpaste, vars = dimensions))
## Pre-factor analysis diagnostics
## Data        : toothpaste 
## Variables   : prevents_cavities, shiny_teeth, strengthens_gums, freshens_breath, decay_prevention_unimportant, attractive_teeth 
## Observations: 60 
## Correlation : Pearson
## 
## Bartlett test
## Null hyp. : variables are not correlated
## Alt. hyp. : variables are correlated
## Chi-square: 238.93 df(15), p.value < .001
## 
## KMO test:  0.66 
## 
## Variable collinearity:
##                               Rsq  KMO
## prevents_cavities            0.86 0.62
## shiny_teeth                  0.48 0.70
## strengthens_gums             0.81 0.68
## freshens_breath              0.54 0.64
## decay_prevention_unimportant 0.76 0.77
## attractive_teeth             0.59 0.56
## 
## Fit measures:
##      Eigenvalues Variance % Cumulative %
##  PC1        2.73       0.46         0.46
##  PC2        2.22       0.37         0.82
##  PC3        0.44       0.07         0.90
##  PC4        0.34       0.06         0.96
##  PC5        0.18       0.03         0.99
##  PC6        0.09       0.01         1.00

Under Fit measures, we see that two components explain 82 percent of the variance in the ratings. This is quite a lot already and it suggests we can safely reduce the number of dimensions to two components. A rule of thumb here is that the cumulative variance explained by the components should be at least 70%.