Chapter 7 PCA analysis
7.1 PCA and Kinship visualization
In this section we will examine population structure using the PCAtools R package (Blighe and Lun 2021) and visualize the Kinship matrix calculated from the previous section using the pheatmap R package (Kolde 2019).
Other packages used in this section include rMVP (Yin et al. 2021), data.table (Dowle and Srinivasan 2021), tidyverse (Wickham 2021) and pheatmap (Kolde 2019)
Load required pacakges
library(PCAtools)
library(rMVP)
library(data.table)
library(tidyverse)
library(pheatmap)
Set working directory to where rMVP formatted files are in workshop materials
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
Load SNP matrix, phenotypes and Kinship matrix
## Make SNP matrix with 5000 SNPs
= attach.big.matrix("WIDIV_942.geno.desc")
snp_file = snp_file[,]
snp_file dim(snp_file)
## Read in phenotypes
= fread("WIDIV_942.phe",data.table = FALSE)
phenos
## Read in Subpopulation information for 942 lines
= read.csv("widiv_942_subpopulation.csv")
subpop = left_join(phenos,subpop)
phenos
## Read in kinship matrix
= attach.big.matrix("WIDIV_942.kin.desc")
Kin = Kin[,]
Kin
## Set colnames and rownames
colnames(snp_file) = phenos$Taxa
rownames(phenos) = phenos$Taxa
Run PCA
## Run PCA including phenotypes as metadata
= pca(snp_file,#SNPs in rows, samples in columns
p_snp center = TRUE,
scale = TRUE,
metadata = phenos)
Put results of the first 3 PCs into a dataframe
= data.frame(Genotype = p_snp$metadata$Taxa,
pca.dat PC1 = p_snp$rotated$PC1,
PC2 = p_snp$rotated$PC2,
PC3 = p_snp$rotated$PC3,
Subpopulation = as.factor(p_snp$metadata$Subpopulation),
stringsAsFactors = FALSE)
Make a scatterplot of the first 3 PCs
## Scatterplots of top 3 PCs
= p_snp$variance
pc.percent head(round(pc.percent, 2))
sum(pc.percent[1:3])
= paste("PC", 1:3, "\n", format(pc.percent[1:3], digits=2), "%", sep="")
lbls = c("#b15928", "#000000", "#6a3d9a","#cab2d6","#ff7f00","#fdbf6f","#e31a1c","#fb9a99","#33a02c","#b2df8a","#1f78b4","#a6cee3")
my_cols pairs(pca.dat[,2:4], col=my_cols[pca.dat$Subpopulation], labels=lbls,pch = 19, cex = 0.9,lower.panel = NULL)
setwd("~/PGRP_mapping_workshop/figures/pcs_and_kinship/")
pdf("WIDIV_942_PCA_Matrix.pdf", width = 10, height = 9)
pairs(pca.dat[,2:4], col=my_cols[pca.dat$Subpopulation], labels=lbls,pch = 19, cex = 1.1,lower.panel = NULL)
dev.off()
Correlation of simulated phenotypes with forst 3 PCs
= eigencorplot(p_snp,
peigen components = p_snp$components[1:3],
metavars = c('Color_low_h2','Color_high_h2'),
col = c('#d7191c', '#fdae61', '#ffffbf', '#abdda4', '#2b83ba'),
cexCorval = 1.2,
fontCorval = 2,
posLab = 'all',
rotLabX = 45,
scale = TRUE,
plotRsquared = FALSE,
corFUN = 'pearson',
corUSE = 'complete.obs',
signifSymbols = c('***', '**', '*', ''),
signifCutpoints = c(0, 0.001, 0.01, 0.05, 1))
pdf("WIDIV_942_PCA_phenos_cor.pdf", width = 6, height = 3)
peigendev.off()
Make a heatmap of the Kinship matrix
pheatmap(Kin)
pdf("WIDIV_942_Kinship_Matrix.pdf", width = 10, height = 9)
pheatmap(Kin)
dev.off()
References
Blighe, Kevin, and Aaron Lun. 2021. PCAtools: Everything Principal Components Analysis. https://github.com/kevinblighe/PCAtools.
Dowle, Matt, and Arun Srinivasan. 2021. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Kolde, Raivo. 2019. Pheatmap: Pretty Heatmaps. https://CRAN.R-project.org/package=pheatmap.
Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Yin, Lilin, Haohao Zhang, Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, et al. 2021. rMVP: Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool. https://github.com/xiaolei-lab/rMVP.