## 5.3 Principal components analysis:

Let’s retain only two components or factors:

`office.pca.two <- PCA(office.df, ncp = 2, graph=FALSE) # Ask for two factors by filling in the ncp argument.`

### 5.3.1 Factor loadings

We can now inspect the table with the *factor loadings*:

```
office.pca.two$var$cor %>% # table with factor loadings
varimax # but ask for a varimax rotation to improve interpretability
```

```
## $loadings
##
## Loadings:
## Dim.1 Dim.2
## large_choice 0.516 -0.850
## low_prices -0.990
## service_quality 0.912 -0.410
## product_quality 0.964
## convenience 0.175 0.978
## preference_score 0.708 -0.706
##
## Dim.1 Dim.2
## SS loadings 3.538 2.347
## Proportion Var 0.590 0.391
## Cumulative Var 0.590 0.981
##
## $rotmat
## [,1] [,2]
## [1,] 0.8515627 -0.5242528
## [2,] 0.5242528 0.8515627
```

These loadings are the correlations between the original dimensions (`large_choice`

, `low_prices`

, etc.) and the two factors that are retained (`Dim.1`

and `Dim.2`

). We see that `low_prices`

, `service_quality`

, and `product_quality`

score highly on the first factor, whereas `large_choice`

, `convenience`

, and `preference_score`

score highly on the second factor. We could therefore say that the first factor describes the price and quality of the brand and that the second factor describes the convenience of the brand’s stores.

We also want to know how much each of the six dimensions are explained by the extracted factors. For this we need to calculate the *communality* and/or its complement, the *uniqueness* of the dimensions:

```
loadings <- as_tibble(office.pca.two$var$cor) %>% # We need to capture the loadings as a data frame into a new object. Use as_tibble(), otherwise we cannot access the different factors
mutate(variable = rownames(office.pca.two$var$cor), # keep track of the row names (these are removed when converting to tibble)
communality = Dim.1^2 + Dim.2^2,
uniqueness = 1 - communality) # The ^ operator elevates a value to a certain power. To calculate the communality, we need to sum the squares of the loadings on each factor.
loadings
```

```
## # A tibble: 6 x 5
## Dim.1 Dim.2 variable communality uniqueness
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 0.885 -0.453 large_choice 0.988 0.0116
## 2 -0.845 -0.516 low_prices 0.980 0.0198
## 3 0.991 0.128 service_quality 0.999 0.000669
## 4 0.841 0.473 product_quality 0.930 0.0696
## 5 -0.364 0.925 convenience 0.988 0.0124
## 6 0.973 -0.230 preference_score 0.999 0.000524
```

The communality of a variable is the percentage of that variable’s variance that is explained by the factors. Its complement is called uniqueness. Uniqueness could be pure measurement error, or it could represent something that is measured reliably by that particular variable, but not by any of the other variables. The greater the uniqueness, the more likely that it is more than just measurement error. A uniqueness of more than 0.6 is usually considered high. If the uniqueness is high, then the variable is not well explained by the factors. We see that for all dimensions, communality is high and therefore uniqueness is low, so all dimensions are captured well by the extracted factors.

### 5.3.2 Loading plot and biplot

We can also plot the loadings:

`fviz_pca_var(office.pca.two, repel = TRUE) # the repel = TRUE argument makes sure the text is displayed nicely on the graph`

We see that `large_choice`

, `service_quality`

, `product_quality`

, and `preference_score`

have high scores on the first factor (the X-axis `Dim1`

) and that `convenience`

has a high score on the second factor (the Y-axis `Dim2`

). We can also add the observations (the different brands) to this plot:

`fviz_pca_biplot(office.pca.two, repel = TRUE) # plot the loadings and the brands together on one plot`

This is also called a *biplot*. We can see, for example, that OfficeStar scores highly on the first factor.