Chapter 6 rMVP data formatting

6.1 Genotypic and phenotypic data in rMVP format

For GWAS we are using the rMVP package (Yin et al. 2021). More information about the R package can be found here.

The genotypic and phenotypic data can be formatted in one step using the code below. This chunk of code also calculates the kinship matrix, principle components, and saves genotypic data as a filebacked matrix which is a memory efficient way of storing the data and prevents the need for the entire matrix to be loaded into RAM. For the size of the subset data it won’t be a major performace difference, but with larger datasets it is extremely beneficial.

Load packages

library(rMVP)

Set working directory to where subset hapmap file is in workshop materials

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")

Prepare data in rMVP format

MVP.Data(fileHMP = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt",
         fileKin=TRUE,
         filePhe="BLUPs_Seed_Color_Simulated.txt",
         filePC=TRUE,
         SNP.impute = "Major",
         out="WIDIV_942")

References

Yin, Lilin, Haohao Zhang, Zhenshuang Tang, Jingya Xu, Dong Yin, Zhiwu Zhang, Xiaohui Yuan, et al. 2021. rMVP: Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool. https://github.com/xiaolei-lab/rMVP.