Chapter 6 rMVP data formatting
6.1 Genotypic and phenotypic data in rMVP format
For GWAS we are using the rMVP package (Yin et al. 2021). More information about the R package can be found here.
The genotypic and phenotypic data can be formatted in one step using the code below. This chunk of code also calculates the kinship matrix, principle components, and saves genotypic data as a filebacked matrix which is a memory efficient way of storing the data and prevents the need for the entire matrix to be loaded into RAM. For the size of the subset data it won’t be a major performace difference, but with larger datasets it is extremely beneficial.
Load packages
library(rMVP)
Set working directory to where subset hapmap file is in workshop materials
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
Prepare data in rMVP format
MVP.Data(fileHMP = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt",
fileKin=TRUE,
filePhe="BLUPs_Seed_Color_Simulated.txt",
filePC=TRUE,
SNP.impute = "Major",
out="WIDIV_942")
References
Yin, Lilin, Haohao Zhang, Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, et al. 2021. rMVP: Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool. https://github.com/xiaolei-lab/rMVP.