R Exercise Week 6
Task: Test whether observed heterozygosity of Pulsatilla vulgaris adults depends on census population size. Fit a model at the individual level where you include a random effect for population.
Hints:
Load packages:
- Please install the package
inbreedR
(to calculate individual measures of heterozygosity) from CRAN, if it is not yet installed. - You may want to load the packages
dplyr
andggplot2
. Alternatively, you can use::
to call functions from packages.
- Please install the package
Import data and extract adults:
- Use the code below to import the data.
- Use
dplyr::filter
to extract adults withOffID == 0
.
Calculate multilocus heterozygosity: Use package
inbreedR
to calculate multilocus heterozygosity for each adult.- Use the function
inbreedR::convert_raw(x)
, where x is the matrix of genotypes only (no ID or other non-genetic data), with two columns per locus. Check the help file of the functionconvert_raw
. - Use the function
inbreedR::MLH
to calculate observed heterozygosity for each individual. - Add the result as a variable
het
to the adults dataset.
Example code from
inbreedR::MLH
help file:data(mouse_msats) genotypes <- convert_raw(mouse_msats) het <- MLH(genotypes)
- Use the function
Add population-level data:
Import the file “pulsatilla_population.csv” with the code below.
Do a left-join to add the data to your adults dataset.
Check the dataset.
Scatterplot with regression line: Use ggplot2 to create a scatterplot of adult heterozygosity against census population size (
population.size
), with a regression line.Fit linear mixed model: Adapt code from section 3.c to perform a regression of individual-level observed heterozygosity (response variable) on census population size (predictor), including population as a random effect. Fit the model with REML and print a summary.
Test fixed effect: Adapt code from section 2.f to test the fixed effect with function
car::Anova
.Check residual plots: Adapt code from section 2.d to create residual plots.
Questions: There is one influential point in the regression analysis:
- What was the direction of the relationship, did heterozygosity increase or decrease with census population size?
- Was the fixed effect statistically significant?
- Was the model valid, or was there a problem with the residual plots?
- What would be the main issue, and what remedy could you suggest?