R Exercise Week 7

The Pulsatilla vulgaris dataset that we’ve been analyzing in the R exercises has two variables that were observed or calculated for each sampled mother plant (i.e., those plants from which seeds were collected; see DiLeo et al. 2017, Journal of Ecology):

  • flower.density: he number of flowers within 2 m of the mother plant. A radius of 2 m around mother plants was chosen as it gave the strongest correlation with selfing rates and pollination distances compared to lower (1 m) and higher (3 m) tested values.
  • mom.isolation: the mean distance of the mother plant to all other plants within the population (mean neighbour distance).

We would expect the following:

  • A negative relationship, where floral density within 2 m of isolated mother plants is low.
  • Likely (right-)skewed distributions for both variables.
  • Positive spatial autocorrelation of both variables within each patch.
  • Values within each patch likely more similar than between patches.

Consider how the points listed above may violate different assumptions of a linear regression model fitted with least squares (lm). How can we test and account for these potential violations and fit a valid model?

Task: Test the regression of flower density on the isolation of the sampled mother plants of Pulsatilla vulgaris. Account for the sampling of multiple mothers from each of seven patches, and for residual spatial autocorrelation (if statistically significant).

Hints:

  1. Load packages: You may want to load the packages dplyr, ggplot2 and ‘nlme’. Alternatively, you can use :: to call functions from packages.

  2. Import data, add spatial coordinates. Use the code below to import the data, extract moms, add spatial coordinates, and remove replicate flowers sampled from the same mother.

library(dplyr)

# Dataset with variables 'flower.density' and 'mom.isolation' for each mom:
Moms <- read.csv(system.file("extdata",
                            "pulsatilla_momVariables.csv", 
                            package = "LandGenCourse"))

# Dataset with spatial coordinates of individuals:
Pulsatilla <- read.csv(system.file("extdata",
                            "pulsatilla_genotypes.csv", 
                            package = "LandGenCourse"))
Adults <- Pulsatilla %>% filter(OffID == 0)

# Combine data
Moms <- left_join(Moms, Adults[,1:5])

# Remove replicate flowers sampled from the same mother
Moms <- Moms %>% filter(OffID == 0)
  1. Explore data. How many mother plants are there in total, and per population? Are the distributions of flower.density and of mom.isolation skewed?

  2. Create scatterplots: Use ggplot to create a scatterplot of flower.density (y) against mom.isolation (x). Modify the plot with coord_trans to apply a log-transformation to each axis. Will this make the relationship more linear? - Note: using geom_smooth together with coord_trans can create problems. You may adapt the following code to plot two ggplot-type plots side-by-side: gridExtra::grid.arrange(myPlot1, myPlot2, nrow = 1)

  3. Scatterplot with line: Instead of using coord_trans, create a scatterplot with log-transformed variable (log(y) vs. log(x)). Add a regression line.

  4. Fit non-spatial models: Adapt code from section 4 to fit two models with the response log(flower.density)and the fixed factor log(mom.isolation), using funcions from package nmle:

    • Basic model: you can fit a simple model by adapting this code: nlme::gls(Response ~ FixedEffect, data=Data, method=REML)
    • Random effect: add a random effect for Population, use nlme::lme instead of gls. nlme::lme(Response ~ FixedEffect, random = ~ 1| RandomEffect, data=Data, method=REML)
    • For now, omit the correlation term (no spatial correlation structure).
  5. Plot residual variograms: Plot variograms for the two models. Inspect the x-axes of the plots. What is the effect of including the random effect Population on the fitting of the variogram? Recall the sampling design. Was there spatial autocorrelation within populations? Hints:

    • Check the variable names in Moms to adapt the names of the x and y coordinates as needed in the term form= ~ xcoord + ycoord.
    • Print each variogram object to check the number of pairs per lag. Ideally, this should be around 100.
    • These variogram plots are trellis plots. Fortunately, you can again use gridExtra::grid.arrange to plot them side by side. Write each plot into an object first.
  6. Add correlation structure: Add a term of the type correlation = nlme::corExp(form = ~ x + y, nugget=T) to the mixed model fitted with REML. Adapt code from section 4.b to evaluate different variogram functions (exponential, spherical, Gaussian, ratio) and use AIC (with REML) to choose the best-fitting variogram model.

  7. Check residual plots: For the best model (fitted with REML) and check the residuals. Plot a variogram of the residuals, and the fitted variogram.

  8. Test fixed effect: Refit the best model with maximum likelihood to test the fixed effect with car::Anova. Give the model a new name to keep them apart and avoid overwriting.

  9. Determine the marginal R-squared. For the best model (fitted with REML), use MuMIn::r.squaredGLMM to determine the marginal and conditional R-squared.

Questions: Justify your answers to the following questions:

  • Did you find a statistically significant, negative relationship between local floral density and mom isolation? If so, how strong was it?
  • Was it necessary to account for skewness in both variables?
  • Was it necessary to account for spatial autocorrelation?
  • Was it necessary to account for population?