Chapter 4 Simulating Phenotypes
4.1 Phenotypic data simulation of high and low heritability traits
In this section, phenotypes are simulated since this allows us to explain conceptually how different trait heritabilities affect power to detect signals in GWAS. The simplePhenotypes R package is used to simulate phenotypic data.
Set working directory to where subset hapmap file is in workshop materials
Simulate QTL for high and low heritability trait
The code below simulates two phenotypes, each controlled by the same 4 QTL. Effect sizes range from 0.3 to 1.1. Traits simulated are a high heritability trait (0.75) and a low heritability trait (0.25) assuming an additive only model without epistasis or dominance.
my_effects = list(trait_1 = c(0.6,1.1,0.4,0.3), trait_2 = c(0.6,1.1,0.4,0.3)) create_phenotypes( geno_file = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt", # Name of the hapmap file to be used ntraits = 2, # Simulating phenotypes for 2 traits add_QTN_num = 4, # 4 additive effect QTNs from genotypic data add_effect = my_effects, # Additive effect sizes from my_effects list above h2 = c(0.25,0.75), # Heritability of each trait rep = 3, # 3 replications model = "A", # Additive only model SNP_impute = "Major", # Imputation of any missing data to the major allele constraints = list(maf_above = 0.25, maf_below = 0.5, hets ="remove"), # QTL simulated for markers within these constraints for minor allele frequency home_dir = getwd(), # Directory where input genotype file is located seed = 12345, # Seed set for reproducibility sim_method = "custom", output_dir = "Simulated_phenotypes") # Directory where simulated data is saved
Read in simulated data
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/") phenos = fread("Simulated_Data_3_Reps_Herit_0.25_0.75.txt") colnames(phenos) = c("Genotype","Color_low","Color_high","rep") #Give column names ## Change the scale to something more biologically meaningful like wavelength (redness color in this wavelength range) phenos$Color_low = phenos$Color_low+620 phenos$Color_high = phenos$Color_high+620
Write the dataframe of simulated data
Take a look at the simulated QTL
Here are the same simulated QTL in a table format and some additional information
Dowle, Matt, and Arun Srinivasan. 2020. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Fernandes, Samuel, and Alexander Lipka. 2020. SimplePHENOTYPES: Simulation of Pleiotropic, Linked and Epistatic Phenotypes. https://github.com/samuelbfernandes/simplePHENOTYPES.
Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.