Chapter 8 GWAS

8.1 GWAS analysis

In this exercise, we are conducting a GWAS for four different models: 1) Naive (T-test), 2) GLM accounting for population structure (Q), 3) MLM accounting for population structure (Q) and Kinship (K), 4) FarmCPU accounting for population structure (Q) and Kinship (K). Each model is testing the same hypothesis: Ho: No association between SNP and trait. Ha: There is an association between SNP and trait.

The Naive model is fit using base R functions and GLM, MLM and FarmCPU models are run using the rMVP R package (Yin et al. 2020).

Additional packages used in this section are data.table (Dowle and Srinivasan 2020), and tidyverse (Wickham 2019).

Load necessary packages

Set working directory to where rMVP files are in workshop materials

Read in phenotypes

Read in SNPs

Read in SNPs map

Read in Kinship matrix (A matrix)

Read in principle components (Q)

Run a naive model without any correction for kinship or population structure

### Make a dataframe with phenotypes and genotypes together
GD_naive = cbind(phenos,t(myGD[,]))

### Fit a simple linear regression model for low heritability simulated trait regressed on SNPs and save those results for later
low_h2_naive = sapply(4:ncol(GD_naive), function(x) anova(lm(GD_naive$Color_low_h2 ~ GD_naive[,x]))$'Pr(>F)'[1])
low_h2_naive = cbind(map,low_h2_naive)
colnames(low_h2_naive)[6] = "Color_low_h2.Naive"

### Fit a simple linear regression model for high heritability simulated trait regressed on SNPs
high_h2_naive = sapply(4:ncol(GD_naive), function(x) anova(lm(GD_naive$Color_high_h2 ~ GD_naive[,x]))$'Pr(>F)'[1])
high_h2_naive = cbind(map,high_h2_naive)
colnames(high_h2_naive)[6] = "Color_high_h2.Naive"

### Save those results for visualizing later with IGV
setwd("~/PGRP_mapping_workshop/gwas_results/for_IGV_visualization/")
low_h2_naive_IGV = low_h2_naive[,c(2,3,1,6)]
colnames(low_h2_naive_IGV) = c("CHR","BP","SNP","P")
high_h2_naive_IGV = high_h2_naive[,c(2,3,1,6)]
colnames(high_h2_naive_IGV) = c("CHR","BP","SNP","P")
fwrite(low_h2_naive_IGV, file = "Color_low_h2_naive.gwas",col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")
fwrite(high_h2_naive_IGV, file = "Color_high_h2_naive.gwas",col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")

## Save results for simple static plot visualization later
setwd("~/PGRP_mapping_workshop/gwas_results/")
fwrite(low_h2_naive, file = "Color_low_h2_naive.txt",col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")
fwrite(high_h2_naive, file = "Color_high_h2_naive.txt",col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")

Run GLM with (Q), MLM with (Q) and (K), and FarmCPU with (Q) and (K) and save results for IGV visualization Note to visualize gwas results in IGV, you need a .gwas file which has four columns, CHR, SNP, BP, and P. The loop below writes the .gwas file for each model fit to the high heritability trait and the low heritability trait.

setwd("~/PGRP_mapping_workshop/gwas_results/for_IGV_visualization/")
for(i in colnames(phenos)[2:ncol(phenos)]){
  gwas_kinship = MVP(
    phe=phenos[, c("Taxa", i)], # Phenotype dataframe
    geno=myGD, # Genotypic data as a big.matrix
    map=map, # Map dataframe
    K=myKin, # Kinship used in MLM 
    maxLoop=10, # Max number of iterations for FarmCPU
    CV.GLM=snp_pcs[,1:3], # Covariates (Q) in GLM
    CV.MLM=snp_pcs[,1:3], # Covariates (Q) in MLM
    CV.FarmCPU=snp_pcs[,1:3], # Covariates (Q) in FarmCPU
    method.bin="FaST-LMM", # Method for optimizing bins used by FarmCPU
    method=c("GLM", "MLM", "FarmCPU"), # GWAS models being fit
    file.output = FALSE # If set to TRUE, will generate many plots and files for each trait
  )
  ### These lines are simply for reformatting and remove columns which you would normally want to keep for real data such as effect sizes and standard error associated with each test.
  GLM = cbind(gwas_kinship$map,gwas_kinship$glm.results)
  GLM = GLM[,c(2,3,1,8)]
  colnames(GLM) = c("CHR","BP","SNP","P")
  fwrite(GLM, file = paste0(i,"_GLM_Q.gwas"),col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")
  MLM = cbind(gwas_kinship$map,gwas_kinship$mlm.results)
  MLM = MLM[,c(2,3,1,8)]
  colnames(MLM) = c("CHR","BP","SNP","P")
  fwrite(MLM, file = paste0(i,"_MLM_QandK.gwas"),col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")
  CPU = cbind(gwas_kinship$map,gwas_kinship$farmcpu.results)
  CPU = CPU[,c(2,3,1,8)]
  colnames(CPU) = c("CHR","BP","SNP","P")
  fwrite(CPU, file = paste0(i,"_FarmCPU_QandK.gwas"),col.names = TRUE, sep = "\t", quote = FALSE, na = "NA")
  gc()
}

Run the same models, but saving output to a text file for static visualization later

8.2 Static plot visualization

Now that the analysis has been run, lets visualize the results in a static manhattan plot. For this we will use the CMplot R package (LiLin-Yin 2020). These plots are saved in the directory set below.

Load necessary packages

Read in the results which were saved to individual text files

Combine the results from a list into a single dataframe

Set the working directory to where the figures will be saved

Set the bonferroni correction at alpha = 0.05

Plot for high heritability trait models

From top to bottom, manhattan plots are for a Naive model (T-test), GLM model accounting for (Q), MLM model accounting for (Q) and (K), FarmCPU model accounting for (Q) and (K).

Plot for low heritability trait models

From top to bottom, manhattan plots are for a Naive model (T-test), GLM model accounting for (Q), MLM model accounting for (Q) and (K), FarmCPU model accounting for (Q) and (K).

As a reminder, these are the QTLs we simulated.

References

Dowle, Matt, and Arun Srinivasan. 2020. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.

LiLin-Yin. 2020. CMplot: Circle Manhattan Plot. https://github.com/YinLiLin/CMplot.

Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

Yin, Lilin, Haohao Zhang, Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, et al. 2020. RMVP: Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated Gwas Tool. https://github.com/xiaolei-lab/rMVP.