5.2 How many factors should we retain?

The goal of principal component analysis is to reduce the number of dimensions that describe our data, without losing too much information. The first step in principal component analysis is to decide upon the number of principal components or factors we want to retain. To help us decide, we’ll use the PCA function from the FactoMineR package:

install.packages("FactoMineR")
library(FactoMineR)

To be able to use the PCA function, we need to transform the data frame first:

office.df <- office %>% 
  select(- brand) %>% # The input for the principal components analysis should be only the dimensions, not the identifier(s), so let's remove the identifiers.
  as.data.frame() # then change the type of the object to 'data.frame'. This is necessary for the PCA function

rownames(office.df) <- office$brand # Set the row names of the data.frame to the brands (this is important later on when making a biplot)

We can now proceed with the principal component analysis:

office.pca <- PCA(office.df, graph=FALSE) # Carry out the principal component analysis

office.pca$eig # and look at the table with information on explained variance

##        eigenvalue percentage of variance cumulative percentage of variance
## comp 1  4.2656310              71.093850                          71.09385
## comp 2  1.6197932              26.996554                          98.09040
## comp 3  0.1145758               1.909596                         100.00000

If we look at this table, then we see that two components explain 98.1 percent of the variance in the ratings. This is quite a lot already and it suggests we can safely do with two dimensions to describe our data. A rule of thumb here is that the cumulative variance explained by the components should be at least 70%.