10.5 Bonus: Si index
Yessica Rico and Helene Wagner
1. Overview of Bonus Materials
The Week 7 Worked Example used patch-level values of an Si index as predictors. Here we show how these were calculated from distance matrices. The main steps are:
- Import ecological distance matrices
- Optimize the scaling parameter alpha
- Calculate Hanski’s index Si with source patch parameters
The following data will be imported:
- dModels.rds: a list of 5 distance matrices (stored as
dist
objects) among all 106 calcareous grassland patches in the study area. Each distance matrix represents a biological hypothesis about functional connectivity of D. carthusianorum (see below). - PATCH_XY_Dianthus.csv: data frame with patch attributes for all 106 calcareous grassland patches in the study area. We will use the following attributes:
- Dc.89, Dc.09: binary variables indicating whether or not D. carthusianorum was recorded in the patch in the first survey in 1989, or the second survey in 2009, respectively.
- Ha: patch area in ha.
- pop09: population size in 2009 (categorical, 4 levels, see below).
2. Import ecological distance matrices
Each distance matrix represents one hypothesis about gene flow in this system (see Week 7 video).
- Eu: Euclidean (geographic) distance, representing IBD
- Shecte: Connectivity by shepherding among consistently grazed patches only, distance-dependent: this counts the number of patches that sheep traverse to get from patch A to patch B.
- Sheint: Connectivity by shepherding, among consistently or intermittently grazed patches, distance-dependent: again, this counts the number of patches that sheep traverse to get from patch A to patch B.
- Shenu: Connectivity by shepherding, no distance effect. This considers only whether or not two patches are part of the same grazing system.
- Forest: This model is similar to Eu but considers forest as a barrier.
Here, we import these distance matrices as a list of dist
objects and check their names and dimensions: each has 106 rows and columns.
dModels <- readRDS(system.file("extdata", "dModels.rds",
package = "LandGenCourse"))
lapply(dModels, function(ls) dim(as.matrix(ls)))
## $Eu
## [1] 106 106
##
## $Shecte
## [1] 106 106
##
## $Sheint
## [1] 106 106
##
## $Shenu
## [1] 106 106
##
## $Forest
## [1] 106 106
Note: the distance matrices contain 106 patches. All of these are considered as source populations. However, we will only calculate Si indices for the 65 patches included in the Dianthus dataset.
3. Optimize the scaling parameter alpha
In order to calculate Hanski’s index, we first need to optimize the value of alpha for each distance model, using presence-absence data for the two time steps of 1989 and 2009.
Import the patch-level data for all 106 patches in the study area.
Patches <- read.csv(system.file("extdata", "PATCH_XY_Dianthus.csv",
package = "LandGenCourse"),
header=TRUE, row.names=1)
tibble::as_tibble(Patches)
## # A tibble: 106 × 20
## patch id Elements2008 Elements2008.4 cat type Ha area x
## <chr> <int> <int> <int> <chr> <chr> <dbl> <dbl> <dbl>
## 1 A01 1 1 1 3 A 0.136 0.0014 4431470.
## 2 A02 2 2 2 4 B 0.182 0.0018 4431614.
## 3 A03 3 3 3 4 B 0.206 0.0021 4431320.
## 4 A04 4 0 0 4 B 0.142 0.0014 4431246.
## 5 A05 5 2 2 4 B 0.134 0.0013 4431037.
## 6 A06 6 2 2 4 B 0.201 0.002 4431027.
## 7 A07 7 0 0 3 N 0.0213 0.0002 4431284.
## 8 A07a 8 1 1 4 A 0.200 0.002 4431362.
## 9 A08 9 3 2 5 F 0.464 0.0046 4428814.
## 10 A09 10 4 3 3 F 0.0704 0.0007 4428809.
## # ℹ 96 more rows
## # ℹ 11 more variables: y <dbl>, size.cat <int>, sizeCat <chr>, size.cat2 <chr>,
## # pj08 <dbl>, pj89 <dbl>, pop09 <int>, grazing <int>, Dc.89 <int>,
## # Dc.09 <int>, Sampled <int>
Check that the patch IDs match between Patches
and the distance matrices. Here we tabulate the number of rows for which the patch names match perfectly. This is the case for all 106 rows, which is great.
##
## FALSE TRUE
## 105 1
The function ‘get.alphafit’ optimizes alpha for each model and stores the values in table ’table.alpha.
- First, we create an empty matrix (with a single column and five rows, one per model) to hold the optimized alpha value for each model (i.e., for each distance matrix).
- We define the list of values of alpha that should be considered. Here we will use
nseq=100
values between 0.1 and 2.5. - Then we define the function get.alphafit that will take four arguments:
alpha
: scaling parameter of the exponential functiond
: distance matrixpj
: whether or not the species was recorded in a patch in the second survey.Op
: number of surveys (out of two) that the species was recorded in the patch.
table.alpha <- matrix(NA, nrow=5)
dimnames(table.alpha) <- list(models=(names(dModels)))
nseq = 100
alpha = seq(0.1,2.5, length = nseq)
get.alphafit <- function(alpha, d, pj, Op)
{
expo<-exp(-alpha* d )
diag(expo)<-0
matr<-sweep(expo,2, pj, "*")
Si <-rowSums(sweep(matr, 2, Op/2, "*"), na.rm=TRUE)
mod<- glm(cbind(Op,2 -Op) ~ Si, family=binomial)
deviance(mod)
}
Apply the function to optimize alpha.
pj <- Patches$Dc.09
Op <- (Patches$Dc.89 + Patches$Dc.09)
for(m in 1:length(dModels))
{
table.alpha[m] <- (optimize(get.alphafit, interval=alpha,
d=as.matrix(dModels[[m]]), pj, Op)$minimum)
}
table.alpha
##
## models [,1]
## Eu 2.1708021
## Shecte 0.6195857
## Sheint 1.2055182
## Shenu 2.3661953
## Forest 2.4999322
These are the optimized scaling parameters alpha of the distance term (related to dispersal ability) in the incidence function \(S_{i}\), where \(p_{j}\) is 1 if the species is present in source population j and 0 if it is absent, and \(d_{ij}\) is the pairwise distance between source patch j and focal patch i:
\(S_{i}\) = \(\sum_{j\neq i} exp(-\alpha d_{ij})p_{j}\)
4. Calculate Hanski’s index Si with source patch parameters
We will consider there alternative source patch parameters
- \(p_{j}\): whether or not the species was observed in 2009.
- \(A_{j}\): area of the patch in ha.
- \(N_{j}\): indicator of population size. This was recorded in the field in four ordinal categories:
- 1: <4 (too few to treat as population)
- 2: 4-40 (small enough for complete sampling)
- 3: 41-100 (small enough to count)
- 4: >100 (very large)
Note that here, \(N_{j}\) is treated as a numeric variable. While this is an over-interpretation, though preliminary analyses showed that this linearized the relationships between variables.
Compile three alternative source patch parameters for each patch. Note that we set the area Aj to zero of the species was not recorded in the patch. This is done by multiplication with pj.
Prepare an empty table Si to hold the connectivity indices \(S_{i}\), one for each combination of focal patch i (rows), distance model \(d_{ij}\) and source patch parameter (15 columns: 5 distances x 3 parameters).
Si <- data.frame(matrix(NA,nrow(Patches),ncol=15))
dimnames(Si) <- list(row.names(Patches),
paste(rep(names(dModels), rep(3,5)),
rep(c("pj", "Aj", "Nj"),5), sep="_"))
Define function get.Si
with three arguments:
alpha
: the optimized alpha value from aboved
: the distance matrixAp
: the source patch parameter to be used.
get.Si <- function(alpha, d, Ap)
{
expo<-exp(-alpha*d)
diag(expo)<-0
matr<-sweep(expo,2, Ap, "*")
S <- rowSums(sweep(matr, 2, Op/2, "*"), na.rm=TRUE)
}
Apply function get.Si
to calculate Si
values. sb
indicates which column of Ap
to use as source patch parameter.
sb <- rep(1:3,5)
for (n in 1:ncol(Si))
{
Si[,n] <- get.Si(alpha=table.alpha[mod[n]],
d=as.matrix(dModels[[mod[n]]]),
Ap=Source[,sb[n]])
}
Table with results: values of Si. Column names are a combination of distance matrix and source patch parameter.
## # A tibble: 106 × 15
## Eu_pj Eu_Aj Eu_Nj Shecte_pj Shecte_Aj Shecte_Nj Sheint_pj Sheint_Aj Sheint_Nj
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.636 1.21 1.76 8.58e-26 1.42e-25 2.47e-25 3.07e-51 5.07e-51 8.83e-51
## 2 0.626 0.616 1.58 8.52e-26 1.42e-25 2.46e-25 3.05e-51 5.06e-51 8.79e-51
## 3 0.943 0.638 2.32 8.52e-26 1.42e-25 2.46e-25 6.20e- 1 2.27e+ 0 2.05e+ 0
## 4 1.27 0.847 3.30 8.52e-26 1.42e-25 2.46e-25 5.02e- 1 7.71e- 1 1.30e+ 0
## 5 1.50 1.13 4.10 1.26e+ 0 3.41e+ 0 4.21e+ 0 5.04e- 1 4.09e- 1 1.29e+ 0
## 6 1.67 1.44 4.91 2.13e+ 0 6.27e+ 0 7.40e+ 0 7.42e- 1 1.19e+ 0 2.43e+ 0
## 7 2.12 1.98 6.50 8.58e-26 1.42e-25 2.47e-25 3.07e-51 5.07e-51 8.83e-51
## 8 1.21 1.62 3.82 2.00e+ 0 6.83e+ 0 7.11e+ 0 8.76e- 1 1.60e+ 0 2.87e+ 0
## 9 1.60 2.99 4.41 8.52e-26 1.41e-25 2.46e-25 1.00e+ 0 2.56e+ 0 3.32e+ 0
## 10 1.82 2.14 4.38 8.52e-26 1.42e-25 2.46e-25 3.28e- 1 3.53e- 1 7.79e- 1
## # ℹ 96 more rows
## # ℹ 6 more variables: Shenu_pj <dbl>, Shenu_Aj <dbl>, Shenu_Nj <dbl>,
## # Forest_pj <dbl>, Forest_Aj <dbl>, Forest_Nj <dbl>
Note: the dataset Dianthus
used in the Week 7 Worked Example only contains Si values for 65 patches with genetic data (i.e., the patches where D. carthusianorum was observed, and sampled, during the second survey 2009).