## R Exercise Week 7

The *Pulsatilla vulgaris* dataset that we’ve been analyzing in the R exercises has two variables that were observed or calculated for each sampled mother plant (i.e., those plants from which seeds were collected; see DiLeo et al. 2017, Journal of Ecology):

`flower.density`

: he number of flowers within 2 m of the mother plant. A radius of 2 m around mother plants was chosen as it gave the strongest correlation with selfing rates and pollination distances compared to lower (1 m) and higher (3 m) tested values.`mom.isolation`

: the mean distance of the mother plant to all other plants within the population (mean neighbour distance).

We would expect the following:

- A negative relationship, where floral density within 2 m of isolated mother plants is low.
- Likely (right-)skewed distributions for both variables.
- Positive spatial autocorrelation of both variables within each patch.
- Values within each patch likely more similar than between patches.

Consider how the points listed above may violate different assumptions of a linear regression model fitted with least squares (`lm`

). How can we test and account for these potential violations and fit a valid model?

**Task:** Test the regression of flower density on the isolation of the sampled mother plants of *Pulsatilla vulgaris*. Account for the sampling of multiple mothers from each of seven patches, and for residual spatial autocorrelation (if statistically significant).

**Hints:**

**Load packages**: You may want to load the packages`dplyr`

,`ggplot2`

and ‘nlme’. Alternatively, you can use`::`

to call functions from packages.**Import data, add spatial coordinates**. Use the code below to import the data, extract moms, add spatial coordinates, and remove replicate flowers sampled from the same mother.

```
library(dplyr)
# Dataset with variables 'flower.density' and 'mom.isolation' for each mom:
Moms <- read.csv(system.file("extdata",
"pulsatilla_momVariables.csv",
package = "LandGenCourse"))
# Dataset with spatial coordinates of individuals:
Pulsatilla <- read.csv(system.file("extdata",
"pulsatilla_genotypes.csv",
package = "LandGenCourse"))
Adults <- Pulsatilla %>% filter(OffID == 0)
# Combine data
Moms <- left_join(Moms, Adults[,1:5])
# Remove replicate flowers sampled from the same mother
Moms <- Moms %>% filter(OffID == 0)
```

**Explore data**. How many mother plants are there in total, and per population? Are the distributions of`flower.density`

and of`mom.isolation`

skewed?**Create scatterplots**: Use ggplot to create a scatterplot of`flower.density`

(y) against`mom.isolation`

(x). Modify the plot with`coord_trans`

to apply a log-transformation to each axis. Will this make the relationship more linear? - Note: using`geom_smooth`

together with`coord_trans`

can create problems. You may adapt the following code to plot two ggplot-type plots side-by-side:`gridExtra::grid.arrange(myPlot1, myPlot2, nrow = 1)`

**Scatterplot with line**: Instead of using coord_trans, create a scatterplot with log-transformed variable (log(y) vs. log(x)). Add a regression line.**Fit non-spatial models**: Adapt code from section 4 to fit two models with the response`log(flower.density)`

and the fixed factor`log(mom.isolation)`

, using funcions from package`nmle`

:- Basic model: you can fit a simple model by adapting this code:
`nlme::gls(Response ~ FixedEffect, data=Data, method=REML)`

- Random effect: add a random effect for Population, use nlme::lme instead of gls.
`nlme::lme(Response ~ FixedEffect, random = ~ 1| RandomEffect, data=Data, method=REML)`

- For now, omit the
`correlation`

term (no spatial correlation structure).

- Basic model: you can fit a simple model by adapting this code:
**Plot residual variograms**: Plot variograms for the two models. Inspect the x-axes of the plots. What is the effect of including the random effect`Population`

on the fitting of the variogram? Recall the sampling design. Was there spatial autocorrelation within populations? Hints:- Check the variable names in
`Moms`

to adapt the names of the x and y coordinates as needed in the term`form= ~ xcoord + ycoord`

. - Print each variogram object to check the number of pairs per lag. Ideally, this should be around 100.
- These variogram plots are trellis plots. Fortunately, you can again use
`gridExtra::grid.arrange`

to plot them side by side. Write each plot into an object first.

- Check the variable names in
**Add correlation structure**: Add a term of the type`correlation = nlme::corExp(form = ~ x + y, nugget=T)`

to the mixed model fitted with REML. Adapt code from section 4.b to evaluate different variogram functions (exponential, spherical, Gaussian, ratio) and use AIC (with REML) to choose the best-fitting variogram model.**Check residual plots**: For the best model (fitted with REML) and check the residuals. Plot a variogram of the residuals, and the fitted variogram.**Test fixed effect**: Refit the best model with maximum likelihood to test the fixed effect with`car::Anova`

. Give the model a new name to keep them apart and avoid overwriting.**Determine the marginal R-squared**. For the best model (fitted with REML), use`MuMIn::r.squaredGLMM`

to determine the marginal and conditional R-squared.

**Questions:** Justify your answers to the following questions:

- Did you find a statistically significant, negative relationship between local floral density and mom isolation? If so, how strong was it?
- Was it necessary to account for skewness in both variables?
- Was it necessary to account for spatial autocorrelation?
- Was it necessary to account for population?