Introduction to Social Epi Methods

1.4 PCA Computation & Exploration

Now, all that is quite complicated. Luckily, there is a built-in function in R to do all this in the background, for as many dimensions of data as we might like! Let’s do all that we did above in a few lines of code:

The prcomp() command conducts a PCA. Note that scale = T(rue) indicates that the variables must be centered (on 0) and scaled (to have variance 1) to be interpretable, as we did above.

We can then extract the eigenvectors and eigenvalues.

‘Loadings’ is a term used to refer to the eigenvectors when they are transformed by multiplication by the square root of the eigenvalues. Mathematically, this is done such that the loadings can be interpreted directly as the correlation between the component and the variable. Untransformed eigenvectors themselves are not useful for interpretation of a PCA - I use the two terms ‘eigenvector’ and ‘loading’ interchangeably although this is not strictly correct…!

#--- Select the columns we want for our PCA
beercols <- beers %>% select(abv.standard, rating.standard)

#--- Conduct the PCA
beers_pca <- prcomp(beercols, scale = T)

#--- Get the eigenvectors 
beers_pca$rotation

##                        PC1        PC2
## abv.standard    -0.7071068  0.7071068
## rating.standard -0.7071068 -0.7071068

#--- Get the eigenvalues
(beers_pca$sdev)^2

## [1] 1.6833148 0.3166852

#--- Get the proportion of variance (row 2)
summary(beers_pca)

## Importance of components:
##                           PC1    PC2
## Standard deviation     1.2974 0.5627
## Proportion of Variance 0.8417 0.1583
## Cumulative Proportion  0.8417 1.0000

But what does all this really mean?