Chapter 9 TWAS

9.1 TWAS data prep

Now that we’ve completed a traditional GWAS analysis, lets run a Transcription Wide Association Study (TWAS). That is, testing the following hypothesis: Ho: No association between gene expression and trait. Ha: There is an association between gene expression and trait.

First we need to prepare the expression matrix. The expression data comes from the same 942 maize inbred lines used in the GWAS portion of the workshop (Mazaheri et al. 2019). Raw RNAseq reads were processed as described in (Zhou et al. 2020) to generate a transcripts per million (TPM) expression matrix for ~46,430 gene models in maize.

We will first create a subset of 501 genes sampled equally from the 10 maize chromosomes. The phenotypes used in this section ate the same simulated phenotypes used from the GWAS section.

Packages used in this section are: (Dowle and Srinivasan 2020), tidyverse (Wickham 2019), rtracklayer (Lawrence, Carey, and Gentleman 2020), rrBLUP (Endelman 2019).

Load nececarry packages

Read in 942 expression matrix generated

Keep only protein coding genes gff file

Keep needed columns from gff file

Read in Metadata

Read in phenotypic data

Add a row for gid to metadata

Gene coordinates of expression matrix

Keep only protein coding genes from .gff file

Change sample names in columns to name in phenotyps file

Check variance across all lines for expression in each data file

Remove genes with zero variance

Remove genes not expressed in at least 5% of lines

Match name order to phenotype file file

Make quantile rank normalized gene expression matrix

Scale TMP values to be between -1 and 1 to be compatible with rrBLUP

Make output file

Keep only genes on chromosomes, remove mitochondrial and chrolorplast genes

Sample 50 gene models per chromsome and make sure to include P1

See if P1 is in the filtered expression matrix

Bind P1 expression to the matrix

Sort by chromosome and transcription start site from gff file

Write the file

Look at histogram of expression of P1

9.2 TWAS analysis

Load nececarry packages

Read in phenotypes

Read in expression data and put in the format for rrBLUP

Read A matrix

Check files are in the same order

Run rrBLUP for each trait and write out results

K only model

K and SNP calculated PCs from K matrix in model

TWAS manhattan plot for Q and K model for high heritability trait

Read in results for Q and K model and make a scatterplot for most significant gene

Plot it

Now do the same for P1

Plot it

References

Dowle, Matt, and Arun Srinivasan. 2020. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.

Endelman, Jeffrey. 2019. RrBLUP: Ridge Regression and Other Kernels for Genomic Selection. http://potatobreeding.cals.wisc.edu/software.

Lawrence, Michael, Vince Carey, and Robert Gentleman. 2020. Rtracklayer: R Interface to Genome Annotation Files and the Ucsc Genome Browser.

Mazaheri, Mona, Marlies Heckwolf, Brieanne Vaillancourt, Joseph L. Gage, Brett Burdo, Sven Heckwolf, Kerrie Barry, et al. 2019. “Genome-Wide Association Analysis of Stalk Biomass and Anatomical Traits in Maize.” BMC Plant Biology 19 (1): 45. https://doi.org/10.1186/s12870-019-1653-x.

Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.

Zhou, Peng, Zhi Li, Erika Magnusson, Fabio Gomez Cano, Peter A. Crisp, Jaclyn M. Noshay, Erich Grotewold, Candice N. Hirsch, Steven P. Briggs, and Nathan M. Springer. 2020. “Meta Gene Regulatory Networks in Maize Highlight Functionally Relevant Regulatory Interactions.” The Plant Cell 32 (5). American Society of Plant Biologists: 1377–96. https://doi.org/10.1105/tpc.20.00080.