6.4 R Exercise Week 3
Task: Drop offspring (seeds) from dataset pulsatilla_genotypes.csv
, check for HWE by site and locus and calculate Hexp for each site.
Hints:
- Load packages: Make sure the packages
gstudio
,dplyr
andadegenet
are loaded. - Import data: Re-use your code from Week 1 exercise to import the dataset
pulsatilla_genotypes.csv
intogstudio
. - Count genotyped individuals. Determine the number of rows (and thus genotyped individuals). The dataset contains adults (
OffID == 0
) and genotyped seeds (OffID != 0
). Determine the number of adults in the dataset. You can achieve this either by subsetting with square brackets [ ], or as a pipe using the functionfilter
from thedplyr
package, followed bynrow()
.
- Drop offspring from dataset: Subset the data to retain only the adults, and call it
Pulsatilla.adults
. Again, you can achieve this either by indexing with square brackets, or by using the functionfilter
from thedplyr
package. Check the number of rows (adults). - Split dataset by site. Use function
split
to split the data by site (population) and create an objectAdults.by.site
. Determine the length of the resulting list, i.e., the number of sub-datasets, one for each site. - Count adults per site with sapply: Use
sapply
to calculate the number of rows (and thus genotyped individuals) per site (population). What is the range of sample sizes for adults? - Convert to genind object: adapt your code from Week 1 exercise to convert the dataset with all adults,
Pulsatilla.adults
, to agenind
object. Print the object to check that the data have been correctly imported. Is the number of rows equal to the number of adults that you found above? - Check polymorphism: Use function
summary
(section 2.b) to check whether markers are polymorphic: what is the range of expected heterozygosity among the loci? - Test for HWE by site and locus: adapt the code from section 2.c to test for HWE deviations across by site and locus (using chi-square or Monte-Carlo test). How many tests were significant (p-value < 0.05)? Is there a systematic pattern of deviations for a specific locus, or for a specific site?
- Calculate Hexp and Hobs by site: adapt code from section 3.b to calculate
Hexp
andHobs
by site and locus, then take the mean across all loci (Hexp.pop
,Hobs.pop
) and combine them into a dataframeH.pop
. Include the population name as a variable. - Save result as R object: Save the object
H.pop
as an R object using the following code:
saveRDS(H.pop, file = paste0(here::here(), "/output/H.pop.rds"))
.
We will need it for a later R exercise.
Question: Which site had the lowest expected heterozygosity?