Chapter 3 Genotypic data filtering
3.1 Downloading RNA-seq marker data
The data we are using comes from (Mazaheri et al. 2019), which includes ~900K RNA-seq derived SNP markers for 942 diverse maize inbred lines.
The data is in Hapmap format. More details about Hapmap format can be found here
3.2 SNP filtering
Lets start by using TASSEL https://www.maizegenetics.net/tassel (Bradbury et al. 2007) for SNP data filtering. There are many plugins available for different types of analyses, and I encourage you to read more by following the link.
The commands below are run when using the standalone version of TASSEL. These tasks can also be accomplished using the Graphical User Interphase (GUI) of TASSEL.
Set path to where your TASSEL standalone is installed
Change directory to where data was downloaded to
Filter for Minor allele frequency > 0.05 and set heterozygous calls to NA
The “-Xms2G” and “-Xmx32G” flags set how much RAM (min and max) TASSEL will be allocated to use. You can change depending on your machines hardware.
The “-h” flag tells TASSEL that the data we are loading is in Hapmap format.
The “-filterAlign -filterAlignMinFreq 0.05” flags are for filtering SNPs with MAF below 0.05.
The “-homozygous” flag converts any heterozygous values to unknown. (since we are working with inbred lines this makes sense to use here).
The “-export” flag tells TASSEL to export the output.
The “-exportType” flag tells TASSEL which format to export the data as.
3.3 SNP subsetting
For the remainder of the exercises, we are working with a subset of ~5000 markers by sampling ~500 SNP markers from each of the 10 chromosomes in maize.
Set working directory to where subset hapmap file is in workshop materials
Read in SNP data
We will use the p1 gene in maize (Zhang and Peterson 2005) as an example for many of the mapping exercises. In the genotypic data, there are no SNPs within the transcription start and stop sites of the p1 gene annotation, but we will keep SNPs within 1Mb of this gene.
Keep SNPs within 1Mb P1
Subset these SNPs from SNP file
Write the p1 SNP matrix file
Select 500 SNPs per chromosome
Remove any of the p1 SNPs which were selected
Row bind the p1 snp matrix and the 5000 selected SNPs
Sort by chromosome and position
Write the subset SNP file
Bradbury, Peter J., Zhiwu Zhang, Dallas E. Kroon, Terry M. Casstevens, Yogesh Ramdoss, and Edward S. Buckler. 2007. “TASSEL: software for association mapping of complex traits in diverse samples.” Bioinformatics 23 (19): 2633–5. https://doi.org/10.1093/bioinformatics/btm308.
Dowle, Matt, and Arun Srinivasan. 2020. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Mazaheri, Mona, Marlies Heckwolf, Brieanne Vaillancourt, Joseph L. Gage, Brett Burdo, Sven Heckwolf, Kerrie Barry, et al. 2019. “Genome-Wide Association Analysis of Stalk Biomass and Anatomical Traits in Maize.” BMC Plant Biology 19 (1): 45. https://doi.org/10.1186/s12870-019-1653-x.
Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Zhang, Feng, and Thomas Peterson. 2005. “Comparisons of Maize Pericarp Color1 Alleles Reveal Paralogous Gene Recombination and an Organ-Specific Enhancer Region.” The Plant Cell 17 (3): 903–14.