Processing math: 100%

Everitt (1984) data

Everitt data contains exam scores from six different subjects for 220 students.

X1 = French score

X2 = English score

X3 = History score

X4 = Arithmetic score

X5 = Algebra score

X6 = Geometry score

First, we read in the correlation matrix of the six scores.

# correlation matrix
everitt <- matrix(c(1, .44, .41, .29, .33, .25,
                    .44, 1, .35, .35, .32, .33,
                    .41, .35, 1, .16, .19, .18,
                    .29, .35, .16, 1, .59, .47,
                    .33, .32, .19, .59, 1, .46,
                    .25, .33, .18, .47, .46, 1), nrow=6)
everitt
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.44 0.41 0.29 0.33 0.25
## [2,] 0.44 1.00 0.35 0.35 0.32 0.33
## [3,] 0.41 0.35 1.00 0.16 0.19 0.18
## [4,] 0.29 0.35 0.16 1.00 0.59 0.47
## [5,] 0.33 0.32 0.19 0.59 1.00 0.46
## [6,] 0.25 0.33 0.18 0.47 0.46 1.00

Conduct PCA

We use princomp() function to do PCA. This function allows us to find principal components even when we don’t have the entire data matrix and only the covariance or correlation matrix is available.

pca.everitt = princomp(covmat = everitt)  
summary(pca.everitt, loadings = T)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000
## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## [1,]  0.400  0.418  0.207  0.455  0.629  0.138
## [2,]  0.417  0.273  0.677 -0.347 -0.383 -0.160
## [3,]  0.313  0.602 -0.662 -0.145 -0.283       
## [4,]  0.445 -0.393         0.233 -0.336  0.693
## [5,]  0.449 -0.351 -0.169  0.394 -0.130 -0.689
## [6,]  0.411 -0.333 -0.176 -0.664  0.498
# The coefficients (eigenvectors) are displayed under 'Loadings'
# Each column vector is the vector of coefficients for the corresponding principal component (PC)

Below are the obtained eigenvectors corresponding to each principal component.

(A <- loadings(pca.everitt))
## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## [1,]  0.400  0.418  0.207  0.455  0.629  0.138
## [2,]  0.417  0.273  0.677 -0.347 -0.383 -0.160
## [3,]  0.313  0.602 -0.662 -0.145 -0.283       
## [4,]  0.445 -0.393         0.233 -0.336  0.693
## [5,]  0.449 -0.351 -0.169  0.394 -0.130 -0.689
## [6,]  0.411 -0.333 -0.176 -0.664  0.498       
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
## Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000
# SS loadings here are just the squared sum of the coefficients for each component which is 1 (meaningless...)
# Both signs & relative magnitudes of weights are important in interpreting the PCs
# PC1 can be interpreted as the general intelligence/ability
# PC2 is a component that distinguishes the first three exams and the last three exams
# Children at the high end of the second PC perform disproportionally better on the first three exams (literacy-related) and worse on the last three exams (math-related)

So we can express each principal component (PC) as:

PC1=0.400X1+0.417X2+0.313X3+0.445X4+0.449X5+0.411X6PC2=0.418X1+0.273X2+0.602X30.393X40.351X50.333X6PC3=0.207X1+0.677X20.662X30.023X40.169X50.176X6PC4=0.455X10.347X20.145X3+0.233X4+0.394X50.664X6PC5=0.629X10.383X20.283X30.336X40.130X5+0.498X6PC6=0.138X10.160X2+0.030X3+0.693X40.689X5+0.006X6

Proportion of variance explained

The variance of PC scores can be calculated as var(PCi)=aTiSai which is equivalent to eigenvalue λi.

(lambda <- pca.everitt$sdev^2) # eigenvalues (correspond to the variance of each PC)
##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
## 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097
# The first PC appears to be about three times more powerful than a single original variable

The total variance of PCs (i.e., the sum of eigenvalues (=Σpi=1λi) and this equals the total variance of the original variables (i.e., =trace(S)).

sum(lambda)  # total variance of PCs = total number of variables (correlation matrix)
## [1] 6

The proportion of the variance explained by the principal components can be calculated by dividing the variance of each component with the total variance. Proportion of variance explained by PCi=λiΣpi=1λi=λitrace(S)

prop <- c(lambda[1]/sum(lambda), lambda[2]/sum(lambda),
          lambda[3]/sum(lambda), lambda[4]/sum(lambda),
          lambda[5]/sum(lambda), lambda[6]/sum(lambda)) 
prop
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
## 0.45478058 0.18813203 0.10254857 0.10046814 0.08708573 0.06698495
summary(pca.everitt)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000

Number of components to extrract

1. Scree plot

screeplot(pca.everitt)  # suggests extracting 2 components

#or
plot(lambda, xlab="Component number", ylab="Component variance (eigenvalue)", type="l", pch=16, main="Scree plot")

2. The proportion of variance-accounted-for

Let’s say we hope to extract PCs until 80% of the total variance is explained.

cum.prop <- c(sum(prop[1]), sum(prop[1:2]), 
              sum(prop[1:3]), sum(prop[1:4]), 
              sum(prop[1:5]), sum(prop[1:6]))
cum.prop
## [1] 0.4547806 0.6429126 0.7454612 0.8459293 0.9330151 1.0000000
summary(pca.everitt)   # suggests extracting 3-4 components
## Importance of components:
##                           Comp.1    Comp.2    Comp.3    Comp.4     Comp.5
## Standard deviation     1.6518727 1.0624463 0.7844052 0.7764077 0.72285155
## Proportion of Variance 0.4547806 0.1881320 0.1025486 0.1004681 0.08708573
## Cumulative Proportion  0.4547806 0.6429126 0.7454612 0.8459293 0.93301505
##                            Comp.6
## Standard deviation     0.63396346
## Proportion of Variance 0.06698495
## Cumulative Proportion  1.00000000

3. Keep PCs having eigenvalues greater than the average eigenvalues

average <- sum(lambda)/length(lambda)  # produces 1 as we analyzed correlation matrix
lambda > average   # suggests extracting 2 components
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 
##   TRUE   TRUE  FALSE  FALSE  FALSE  FALSE
screeplot(pca.everitt); abline(h=1, col="red")

## The methods for deciding how many components to retain can often lead to different conclusions

Interpreting principal components

Calculating the correlation between the original variables and principal components can be helpful in interpreting the principal components. The correlation of variable xi and component zj can be computed as aijλjsi. As we analyzed the correlation matrix, si=1, so it becomes aijλj.

Recall that

X1 = French score

X2 = English score

X3 = History score

X4 = Arithmetic score

X5 = Algebra score

X6 = Geometry score

A[,1]* sqrt(lambda[1])  # correlation of PC1 with the original variables
## [1] 0.6608026 0.6884648 0.5163560 0.7356196 0.7418676 0.6781684
A[,2]* sqrt(lambda[2])  # correlation of PC2 with the original variables
## [1]  0.4444749  0.2897705  0.6395524 -0.4170184 -0.3727588 -0.3540996
loadings=NULL
for (i in 1:length(lambda)){
  loadings <- cbind(loadings, A[,i]* sqrt(lambda[i]))
}
loadings
##           [,1]       [,2]        [,3]       [,4]       [,5]         [,6]
## [1,] 0.6608026  0.4444749  0.16264111  0.3535839  0.4545472  0.087721919
## [2,] 0.6884648  0.2897705  0.53133039 -0.2696851 -0.2769629 -0.101482646
## [3,] 0.5163560  0.6395524 -0.51917689 -0.1128110 -0.2042286  0.019206576
## [4,] 0.7356196 -0.4170184 -0.01782593  0.1811840 -0.2429384  0.439084313
## [5,] 0.7418676 -0.3727588 -0.13293760  0.3056990 -0.0939495 -0.436729414
## [6,] 0.6781684 -0.3540996 -0.13781337 -0.5158018  0.3600526  0.004393172
## The sum of squares of each loading vectors produces the corresponding eigenvalues
apply(loadings^2, 2, sum)   # sum of squares of each column of loading matrix
## [1] 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097
lambda    # eigenvalues 
##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
## 2.7286835 1.1287922 0.6152914 0.6028089 0.5225144 0.4019097

Reconstructing covariance/correlation matrix

We can reconstruct the covariance/correlation matrix from the eigenvectors and eigenvalues. According to the spectral decomposition theorem, S=AΛA where Λ is a diagonal matrix with eigenvalues and A is a matrix of eigenvectors ai in its columns.

We can rewrite the above equation as S=AΛ1/2Λ1/2A=(AΛ1/2)(Λ1/2A)=(AΛ1/2)(AΛ1/2)=AA where A is a matrix with component loading vectors (i.e., ai=λiai) that we created above and assigned to loadings object.

## Reconstruct correlation matrix from all PCs
(pred.all <- loadings %*% t(loadings))    # predicted correlation matrix
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.44 0.41 0.29 0.33 0.25
## [2,] 0.44 1.00 0.35 0.35 0.32 0.33
## [3,] 0.41 0.35 1.00 0.16 0.19 0.18
## [4,] 0.29 0.35 0.16 1.00 0.59 0.47
## [5,] 0.33 0.32 0.19 0.59 1.00 0.46
## [6,] 0.25 0.33 0.18 0.47 0.46 1.00
everitt     # the original correlation matrix 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.44 0.41 0.29 0.33 0.25
## [2,] 0.44 1.00 0.35 0.35 0.32 0.33
## [3,] 0.41 0.35 1.00 0.16 0.19 0.18
## [4,] 0.29 0.35 0.16 1.00 0.59 0.47
## [5,] 0.33 0.32 0.19 0.59 1.00 0.46
## [6,] 0.25 0.33 0.18 0.47 0.46 1.00
everitt - pred.all   # residual matrix
##               [,1]          [,2]          [,3]          [,4]          [,5]
## [1,] -6.661338e-16 -7.771561e-16  1.110223e-16 -3.330669e-16 -1.665335e-16
## [2,] -7.771561e-16 -1.776357e-15 -1.221245e-15 -9.992007e-16 -8.326673e-16
## [3,]  1.110223e-16 -1.221245e-15 -2.220446e-16 -5.551115e-16 -5.551115e-16
## [4,] -3.330669e-16 -9.992007e-16 -5.551115e-16  2.220446e-16  1.110223e-16
## [5,] -1.665335e-16 -8.326673e-16 -5.551115e-16  1.110223e-16  3.330669e-16
## [6,]  2.220446e-16 -3.330669e-16 -1.221245e-15  2.220446e-16  5.551115e-17
##               [,6]
## [1,]  2.220446e-16
## [2,] -3.330669e-16
## [3,] -1.221245e-15
## [4,]  2.220446e-16
## [5,]  5.551115e-17
## [6,]  1.110223e-16
## Reconstruct correlation matrix from the first 2 PCs
(pred.two <- loadings[, 1:2] %*% t(loadings[, 1:2]))    # predicted correlation matrix
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.6342180 0.5837351 0.6254744 0.3007451 0.3245461 0.2907470
## [2,] 0.5837351 0.5579508 0.5408164 0.3856086 0.4027352 0.3642874
## [3,] 0.6254744 0.5408164 0.6756508 0.1131365 0.1446690 0.1237110
## [4,] 0.3007451 0.3856086 0.1131365 0.7150405 0.7011796 0.6465400
## [5,] 0.3245461 0.4027352 0.1446690 0.7011796 0.6893167 0.6351049
## [6,] 0.2907470 0.3642874 0.1237110 0.6465400 0.6351049 0.5852988
everitt - pred.two    # residual matrix
##              [,1]        [,2]        [,3]        [,4]         [,5]        [,6]
## [1,]  0.365781971 -0.14373510 -0.21547438 -0.01074514  0.005453879 -0.04074705
## [2,] -0.143735097  0.44204920 -0.19081639 -0.03560858 -0.082735239 -0.03428744
## [3,] -0.215474378 -0.19081639  0.32434920  0.04686355  0.045331023  0.05628896
## [4,] -0.010745143 -0.03560858  0.04686355  0.28495948 -0.111179614 -0.17653997
## [5,]  0.005453879 -0.08273524  0.04533102 -0.11117961  0.310683343 -0.17510487
## [6,] -0.040747047 -0.03428744  0.05628896 -0.17653997 -0.175104871  0.41470116
(pred.five <- loadings[, 1:5] %*% t(loadings[, 1:5]))    # predicted correlation matrix
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## [1,] 0.9923049 0.4489023 0.4083152 0.2514827 0.3683107 0.2496146
## [2,] 0.4489023 0.9897013 0.3519491 0.3945594 0.2756795 0.3304458
## [3,] 0.4083152 0.3519491 0.9996311 0.1515667 0.1983881 0.1799156
## [4,] 0.2514827 0.3945594 0.1515667 0.8072050 0.7817610 0.4680710
## [5,] 0.3683107 0.2756795 0.1983881 0.7817610 0.8092674 0.4619186
## [6,] 0.2496146 0.3304458 0.1799156 0.4680710 0.4619186 0.9999807
everitt - pred.five    # residual matrix
##               [,1]          [,2]          [,3]         [,4]         [,5]
## [1,]  0.0076951351 -0.0089022525  1.684838e-03  0.038517319 -0.038310742
## [2,] -0.0089022525  0.0102987275 -1.949134e-03 -0.044559438  0.044320457
## [3,]  0.0016848377 -0.0019491341  3.688926e-04  0.008433306 -0.008388077
## [4,]  0.0385173185 -0.0445594381  8.433306e-03  0.192795034 -0.191761035
## [5,] -0.0383107423  0.0443204567 -8.388077e-03 -0.191761035  0.190732581
## [6,]  0.0003853774 -0.0004458307  8.437778e-05  0.001928973 -0.001918627
##               [,6]
## [1,]  3.853774e-04
## [2,] -4.458307e-04
## [3,]  8.437778e-05
## [4,]  1.928973e-03
## [5,] -1.918627e-03
## [6,]  1.929996e-05