Chapter 4 Simulating Phenotypes
4.1 Phenotypic data simulation of high and low heritability traits
In this section, phenotypes are simulated since this allows us to explain conceptually how different trait heritabilities affect power to detect signals in GWAS. The simplePhenotypes R package is used to simulate phenotypic data.
More information about simplePhenotypes can be found here along with a short vignette here.
The R packages used in this section are simplePHENOTYPES (R-simplePHENOTYPES?), data.table (Dowle and Srinivasan 2021), and tidyverse (Wickham 2021).
Load packages
library(simplePHENOTYPES)
library(data.table)
library(tidyverse)Set working directory to where subset hapmap file is in workshop materials
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")Simulate QTL for high and low heritability trait
The code below simulates two phenotypes, each controlled by the same 4 QTL. Effect sizes range from 0.3 to 1.1. Traits simulated are a high heritability trait (0.75) and a low heritability trait (0.25) assuming an additive only model without epistasis or dominance.
my_effects = list(trait_1 = c(0.6,1.1,0.4,0.3),
               trait_2 = c(0.6,1.1,0.4,0.3))
               
create_phenotypes(
  geno_file = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt", # Name of the hapmap file to be used
  ntraits = 2, # Simulating phenotypes for 2 traits
  add_QTN_num = 4, # 4 additive effect QTNs from genotypic data
  add_effect = my_effects, # Additive effect sizes from my_effects list above
  h2 = c(0.25,0.75), # Heritability of each trait
  rep = 3, # 3 replications
  model = "A", # Additive only model
  SNP_impute = "Major", # Imputation of any missing data to the major allele
  constraints = list(maf_above = 0.25, maf_below = 0.5, hets ="remove"), # QTL simulated for markers within these constraints for minor allele frequency
  home_dir = getwd(), # Directory where input genotype file is located
  seed = 12345, # Seed set for reproducibility
  sim_method = "custom",
  output_dir = "Simulated_phenotypes") # Directory where simulated data is savedRead in simulated data
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
phenos = fread("Simulated_Data_3_Reps_Herit_0.25_0.75.txt")
colnames(phenos) = c("Genotype","Color_low","Color_high","rep") #Give column names
## Change the scale to something more biologically meaningful like wavelength (redness color in this wavelength range)
phenos$Color_low = phenos$Color_low+620
phenos$Color_high = phenos$Color_high+620Write the dataframe of simulated data
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
fwrite(phenos,"Seed_Color_Simulated.txt",sep="\t", quote = FALSE,col.names = TRUE)Take a look at the simulated QTL
setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
selected_qtns = fread("Additive_selected_QTNs.txt")
head(selected_qtns[,1:7]) # These are the simulated QTLsHere are the same simulated QTL in a table format and some additional information