Chapter 4 Simulating Phenotypes
4.1 Phenotypic data simulation of high and low heritability traits
In this section, phenotypes are simulated since this allows us to explain conceptually how different trait heritabilities affect power to detect signals in GWAS. The simplePhenotypes R package is used to simulate phenotypic data.
More information about simplePhenotypes can be found here along with a short vignette here.
The R packages used in this section are simplePHENOTYPES (R-simplePHENOTYPES?), data.table (Dowle and Srinivasan 2021), and tidyverse (Wickham 2021).
Load packages
library(simplePHENOTYPES)
library(data.table)
library(tidyverse)
Set working directory to where subset hapmap file is in workshop materials
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
Simulate QTL for high and low heritability trait
The code below simulates two phenotypes, each controlled by the same 4 QTL. Effect sizes range from 0.3 to 1.1. Traits simulated are a high heritability trait (0.75) and a low heritability trait (0.25) assuming an additive only model without epistasis or dominance.
= list(trait_1 = c(0.6,1.1,0.4,0.3),
my_effects trait_2 = c(0.6,1.1,0.4,0.3))
create_phenotypes(
geno_file = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt", # Name of the hapmap file to be used
ntraits = 2, # Simulating phenotypes for 2 traits
add_QTN_num = 4, # 4 additive effect QTNs from genotypic data
add_effect = my_effects, # Additive effect sizes from my_effects list above
h2 = c(0.25,0.75), # Heritability of each trait
rep = 3, # 3 replications
model = "A", # Additive only model
SNP_impute = "Major", # Imputation of any missing data to the major allele
constraints = list(maf_above = 0.25, maf_below = 0.5, hets ="remove"), # QTL simulated for markers within these constraints for minor allele frequency
home_dir = getwd(), # Directory where input genotype file is located
seed = 12345, # Seed set for reproducibility
sim_method = "custom",
output_dir = "Simulated_phenotypes") # Directory where simulated data is saved
Read in simulated data
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
= fread("Simulated_Data_3_Reps_Herit_0.25_0.75.txt")
phenos colnames(phenos) = c("Genotype","Color_low","Color_high","rep") #Give column names
## Change the scale to something more biologically meaningful like wavelength (redness color in this wavelength range)
$Color_low = phenos$Color_low+620
phenos$Color_high = phenos$Color_high+620 phenos
Write the dataframe of simulated data
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
fwrite(phenos,"Seed_Color_Simulated.txt",sep="\t", quote = FALSE,col.names = TRUE)
Take a look at the simulated QTL
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
= fread("Additive_selected_QTNs.txt")
selected_qtns head(selected_qtns[,1:7]) # These are the simulated QTLs
Here are the same simulated QTL in a table format and some additional information