Chapter 7 PCA analysis

7.1 PCA and Kinship visualization

In this section we will examine population structure using the PCAtools R package (Blighe and Lun 2021) and visualize the Kinship matrix calculated from the previous section using the pheatmap R package (Kolde 2019).

Other packages used in this section include rMVP (Yin et al. 2021), data.table (Dowle and Srinivasan 2021), tidyverse (Wickham 2021) and pheatmap (Kolde 2019)

Load required pacakges

library(PCAtools)
library(rMVP)
library(data.table)
library(tidyverse)
library(pheatmap)

Set working directory to where rMVP formatted files are in workshop materials

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")

Load SNP matrix, phenotypes and Kinship matrix

## Make SNP matrix with 5000 SNPs 
snp_file = attach.big.matrix("WIDIV_942.geno.desc")
snp_file = snp_file[,]
dim(snp_file)

## Read in phenotypes
phenos = fread("WIDIV_942.phe",data.table = FALSE)

## Read in Subpopulation information for 942 lines
subpop = read.csv("widiv_942_subpopulation.csv")
phenos = left_join(phenos,subpop)

## Read in kinship matrix
Kin = attach.big.matrix("WIDIV_942.kin.desc")
Kin = Kin[,]

## Set colnames and rownames
colnames(snp_file) = phenos$Taxa
rownames(phenos) = phenos$Taxa

Run PCA

## Run PCA including phenotypes as metadata
p_snp = pca(snp_file,#SNPs in rows, samples in columns 
            center = TRUE, 
            scale = TRUE,
            metadata = phenos)

Put results of the first 3 PCs into a dataframe

pca.dat = data.frame(Genotype = p_snp$metadata$Taxa,
                     PC1 = p_snp$rotated$PC1,
                     PC2 = p_snp$rotated$PC2,
                     PC3 = p_snp$rotated$PC3,
                     Subpopulation = as.factor(p_snp$metadata$Subpopulation),
                     stringsAsFactors = FALSE)

Make a scatterplot of the first 3 PCs

## Scatterplots of top 3 PCs
pc.percent = p_snp$variance
head(round(pc.percent, 2))
sum(pc.percent[1:3])
lbls = paste("PC", 1:3, "\n", format(pc.percent[1:3], digits=2), "%", sep="")
my_cols = c("#b15928", "#000000", "#6a3d9a","#cab2d6","#ff7f00","#fdbf6f","#e31a1c","#fb9a99","#33a02c","#b2df8a","#1f78b4","#a6cee3")
pairs(pca.dat[,2:4], col=my_cols[pca.dat$Subpopulation], labels=lbls,pch = 19,  cex = 0.9,lower.panel = NULL)

setwd("~/PGRP_mapping_workshop/figures/pcs_and_kinship/")
pdf("WIDIV_942_PCA_Matrix.pdf", width = 10, height = 9)
pairs(pca.dat[,2:4], col=my_cols[pca.dat$Subpopulation], labels=lbls,pch = 19,  cex = 1.1,lower.panel = NULL)
dev.off()

Correlation of simulated phenotypes with forst 3 PCs

peigen = eigencorplot(p_snp,
                      components = p_snp$components[1:3],
                      metavars = c('Color_low_h2','Color_high_h2'),
                      col = c('#d7191c', '#fdae61', '#ffffbf', '#abdda4', '#2b83ba'),
                      cexCorval = 1.2,
                      fontCorval = 2,
                      posLab = 'all',
                      rotLabX = 45,
                      scale = TRUE,
                      plotRsquared = FALSE,
                      corFUN = 'pearson',
                      corUSE = 'complete.obs',
                      signifSymbols = c('***', '**', '*', ''),
                      signifCutpoints = c(0, 0.001, 0.01, 0.05, 1))

pdf("WIDIV_942_PCA_phenos_cor.pdf", width = 6, height = 3)
peigen
dev.off()

Make a heatmap of the Kinship matrix

pheatmap(Kin)
pdf("WIDIV_942_Kinship_Matrix.pdf", width = 10, height = 9)
pheatmap(Kin)
dev.off()

References

Blighe, Kevin, and Aaron Lun. 2021. PCAtools: Everything Principal Components Analysis. https://github.com/kevinblighe/PCAtools.
Dowle, Matt, and Arun Srinivasan. 2021. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Kolde, Raivo. 2019. Pheatmap: Pretty Heatmaps. https://CRAN.R-project.org/package=pheatmap.
Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Yin, Lilin, Haohao Zhang, Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, et al. 2021. rMVP: Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool. https://github.com/xiaolei-lab/rMVP.