6.4 R Exercise Week 3

Task: Drop offspring (seeds) from dataset pulsatilla_genotypes.csv, check for HWE by site and locus and calculate Hexp for each site.

Hints:

Load packages: Make sure the packages gstudio, dplyr and adegenet are loaded.
Import data: Re-use your code from Week 1 exercise to import the dataset pulsatilla_genotypes.csv into gstudio.
Count genotyped individuals. Determine the number of rows (and thus genotyped individuals). The dataset contains adults (OffID == 0) and genotyped seeds (OffID != 0). Determine the number of adults in the dataset. You can achieve this either by subsetting with square brackets [ ], or as a pipe using the function filter from the dplyr package, followed by nrow().
Drop offspring from dataset: Subset the data to retain only the adults, and call it Pulsatilla.adults. Again, you can achieve this either by indexing with square brackets, or by using the function filter from the dplyr package. Check the number of rows (adults).
Split dataset by site. Use function split to split the data by site (population) and create an object Adults.by.site. Determine the length of the resulting list, i.e., the number of sub-datasets, one for each site.
Count adults per site with sapply: Use sapply to calculate the number of rows (and thus genotyped individuals) per site (population). What is the range of sample sizes for adults?
Convert to genind object: adapt your code from Week 1 exercise to convert the dataset with all adults, Pulsatilla.adults, to a genind object. Print the object to check that the data have been correctly imported. Is the number of rows equal to the number of adults that you found above?
Check polymorphism: Use function summary (section 2.b) to check whether markers are polymorphic: what is the range of expected heterozygosity among the loci?
Test for HWE by site and locus: adapt the code from section 2.c to test for HWE deviations across by site and locus (using chi-square or Monte-Carlo test). How many tests were significant (p-value < 0.05)? Is there a systematic pattern of deviations for a specific locus, or for a specific site?
Calculate Hexp and Hobs by site: adapt code from section 3.b to calculate Hexp and Hobs by site and locus, then take the mean across all loci (Hexp.pop, Hobs.pop) and combine them into a dataframe H.pop. Include the population name as a variable.
Save result as R object: Save the object H.pop as an R object using the following code:
saveRDS(H.pop, file = paste0(here::here(), "/output/H.pop.rds")).
We will need it for a later R exercise.

Question: Which site had the lowest expected heterozygosity?