Chapter 4 Simulating Phenotypes

4.1 Phenotypic data simulation of high and low heritability traits

In this section, phenotypes are simulated since this allows us to explain conceptually how different trait heritabilities affect power to detect signals in GWAS. The simplePhenotypes R package is used to simulate phenotypic data.

More information about simplePhenotypes can be found here along with a short vignette here.

The R packages used in this section are simplePHENOTYPES (R-simplePHENOTYPES?), data.table (Dowle and Srinivasan 2021), and tidyverse (Wickham 2021).

Load packages

library(simplePHENOTYPES)
library(data.table)
library(tidyverse)

Set working directory to where subset hapmap file is in workshop materials

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")

Simulate QTL for high and low heritability trait

The code below simulates two phenotypes, each controlled by the same 4 QTL. Effect sizes range from 0.3 to 1.1. Traits simulated are a high heritability trait (0.75) and a low heritability trait (0.25) assuming an additive only model without epistasis or dominance.

my_effects = list(trait_1 = c(0.6,1.1,0.4,0.3),
               trait_2 = c(0.6,1.1,0.4,0.3))
               
create_phenotypes(
  geno_file = "subset_widiv_942g_899784SNPs_imputed_filteredGenos_noRTA_AGPv4.hmp.txt", # Name of the hapmap file to be used
  ntraits = 2, # Simulating phenotypes for 2 traits
  add_QTN_num = 4, # 4 additive effect QTNs from genotypic data
  add_effect = my_effects, # Additive effect sizes from my_effects list above
  h2 = c(0.25,0.75), # Heritability of each trait
  rep = 3, # 3 replications
  model = "A", # Additive only model
  SNP_impute = "Major", # Imputation of any missing data to the major allele
  constraints = list(maf_above = 0.25, maf_below = 0.5, hets ="remove"), # QTL simulated for markers within these constraints for minor allele frequency
  home_dir = getwd(), # Directory where input genotype file is located
  seed = 12345, # Seed set for reproducibility
  sim_method = "custom",
  output_dir = "Simulated_phenotypes") # Directory where simulated data is saved

Read in simulated data

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
phenos = fread("Simulated_Data_3_Reps_Herit_0.25_0.75.txt")
colnames(phenos) = c("Genotype","Color_low","Color_high","rep") #Give column names

## Change the scale to something more biologically meaningful like wavelength (redness color in this wavelength range)
phenos$Color_low = phenos$Color_low+620
phenos$Color_high = phenos$Color_high+620

Write the dataframe of simulated data

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/")
fwrite(phenos,"Seed_Color_Simulated.txt",sep="\t", quote = FALSE,col.names = TRUE)

Take a look at the simulated QTL

setwd("~/PGRP_mapping_workshop/data/genotypic_and_phenotypic_data/Simulated_phenotypes/")
selected_qtns = fread("Additive_selected_QTNs.txt")
head(selected_qtns[,1:7]) # These are the simulated QTLs

Here are the same simulated QTL in a table format and some additional information

References

Dowle, Matt, and Arun Srinivasan. 2021. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Wickham, Hadley. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.