11.4 R Exercise Week 8

Task: Carry out a permutation test for the Mantel rank correlation to test for fine-scale spatial genetic structure in Pulsatilla vulgaris adults, using the pooled data from all seven patches. Use a one-sided alternative “greater”, as we expect the Mantel rank correlation to be positive.

Hints:

Exclude all pairs that involve individuals from different patches.
Permute individuals only within patches, not between patches.
Calculate the Mantel rank correlation for the observed data (M.rho.obs)
For each of R = 499 permutations, calculate M.rho.sim
Determine the approximate p-value of the one-sided test with alternative “greater” as the percentage of the 500 values (1 observed, 499 permuted) that are greater or equal to the observed one.

Load packages: You may want to load the packages dplyr and gstudio. Alternatively, you can use :: to call functions from packages.
Import data, extract adults. Use the code below to import the data into gstudio and extract adults (OffID == 0).

library(dplyr)
Pulsatilla.gstudio <- gstudio::read_population(path=system.file("extdata",
                            "pulsatilla_genotypes.csv", 
                            package = "LandGenCourse"), 
                    type="column", locus.columns=c(6:19), 
                    phased=FALSE, sep=",", header=TRUE)
Adults.gstudio <- Pulsatilla.gstudio %>% filter(OffID == 0)

Sort individuals by patch. Create a new ID variable Names that combines the existing variables Population and ID (starting with population). Then use the function dplyr::arrange to sort Adults.gstudio by Names (check the help file for arrange). This is important here for two reasons:
- In order to efficiently permute individuals within patches, they need to be sorted by patch.
- In some cases, the function gstudio::genetic_distance will sort the distance matrix alphabetically. To avoid mismatches between the data and the resulting distance matrix, it is best to sort the data alphabetically already.
Calculate Euclidean distances: Use the metric coordinates in variables “X” and “Y” to calculate Euclidean distances (see Week 5, section 4). Here it is important to store the distances Dgeo in the full matrix format, with as.matrix.
Calculate genetic distances (Dps): Use the following code (as needed (adapt if needed) to calculate individual-level genetic distances (proportion of shared alleles) and store them in the full matrix format. We subtract values from 1 to obtain a distance measure. Note: the calculation in gstudio is based on Bray distance and the resulting values are proportional to those calculated by adegenet::propShared. I.e., the two measures have a correlation of 1 but the actual values differ.

Dgen <- 1 - as.matrix(gstudio::genetic_distance(Adults.gstudio, stratum="Names", mode="dps"))
Plot distances, calculate M.rho: Create a plot of Dgen vs Dgeo, and calculate the Mantel rank correlation. Note: the function ‘cor’ with default settings does not allow missing values (NA). This can be changed e.g. with the argument use.
- Use as.dist to access only the values from the lower triangle of each matrix.
- The function plot will do. Inspect the plot. Where would you find the pairs of individuals within the same patch?
- Use the function cor with method = "spearman" to calculate the Mantel rank correlation. Allow for missing values with the argument use = "complete.obs".
Limit to within-patch pairs: Restrict the analysis to pairs within the same patch. For this, we want to set all between-site comparisons in Dgeo to NA. Uncomment the code below to:
- Create a matrix that is TRUE if the two individuals are in the same patch, and FALSE if not (first line)
- Change FALSE to NA (second line)
- Multiply Dgeo by this matrix to set distances between patches to NA (third line).
- Adapt your code from above to plot the distances Dgen vs. Dgeo.within.
- Calculate the Mantel rank correlation between Dgen vs. Dgeo.within and store it as Cor.obs.

#SamePatch <- outer(Adults.gstudio$Population, Adults.gstudio$Population, FUN = "==")
#SamePatch[SamePatch == "FALSE"] <- NA
#Dgeo.within <- SamePatch * Dgeo

Note: check the help file or run the following examples to figure out what outer does: outer(c(1:5), c(1:5), FUN = "*") outer(c(1:5), c(1:5), FUN = "-") outer(LETTERS[1:5], c(1:5), FUN = "paste0")

Unrestricted permutation test: Create a variable Order with row numbers (from 1 to the number of individuals in Adults.gstudio). Then, uncomment the code below (adapt as needed) to carry out a permutation test by permuting the order of individuals, R = 499 times. Notes to permute a distance matrix, we need to permute the rows and columns of the full distance matrix simultaneously with the same order: Dgen[a,a]. We only need to permute one of the two matrices (Dgen or Dgeo.within), but not both. The approximate p-value is calculated as the proportion of values (R simulated ones and 1 observed Cor.obs) that were as large, or larger, than Cor.obs.

#R = 499
#Cor.perm.unrestricted <- rep(NA, R)
#for(r in 1:R)
#{
#  a <- sample(Order)
#  Cor.perm.unrestricted[r] <- cor(as.dist(Dgen[a,a]),as.dist(Dgeo.within), method="spearman", use="complete.obs")
#}
#approx.p.unrestricted <- mean(c(Cor.obs, Cor.perm.unrestricted) >= Cor.obs)
#approx.p.unrestricted

Restricted permutation test: Adapt the code to permute individuals only within patches. For this, split ‘Order’ by population, randomize row numbers within groups (list elements = populations) with sapply, and use unlist to convert to a vector again. Make sure to change object names from ‘unrestricted’ to ‘restricted’.
- Before the for loop, add this code: b <- split(Order, Adults.gstudio$Population)
- Inside the for loop, replace the calculation of a : a <- unlist(sapply(b, sample))

Compare results: Create side-by-side boxplots of the simulated Mantel rank correlation values for the unrestricted and the restricted permutation tests. Note: if none of the simulated values was larger than Cor.obs, the approx. p-value will be 1/(R+1). This indicates the resolution of the permutation test, i.e., the smallest possible p-value given R.

Questions:

Did the direction, size or statistical significance of the observed Mantel rank correlation (as a measure of fine-scale spatial genetic structure in P. vulgaris) change between the unrestricted and the restricted permutation test? Why, or why not?
How did the distributions of the simulated values of the Mantel rank correlation differ between the unrestricted and the restricted test? Can you think of a reason for this?