Chapter 5 Inferring geographic origins

We will now show how to perform the geographic assignment as such. For this, you need the isoscapes built as shown in section 3 and the isotope data of the assignment samples. If the type of samples you want to assign differ from the type of samples you used to create the isoscapes, then you will also need a calibration fit obtained as shown in section 4.

5.1 Preparing the data

You need at least two pieces of information from the samples that you want to assign: a label and an isotope value. You should provide such information as a data frame with the column names sample_ID and sample_value. You may also add the latitude and longitude of the sampling locations for the assignment samples, if you want that such locations appear on your assignment maps. However, such locations are not considered for the statistical assignment.

Here we are going to assign the following bats:

AssignDataBat
sample_ID lat long sample_value
Nnoc_1 51.72323 13.67630 -86.58500
Nnoc_2 51.72323 13.67630 -93.96700
Nnoc_3 51.72323 13.67630 -88.71600
Nnoc_4 51.69520 14.64945 -84.85700
Nnoc_5 51.70366 14.63915 -104.21900
Nnoc_7 51.69520 14.64945 -85.58300
Nnoc_8 51.70536 14.63289 -89.99500
Nnoc_9 51.70536 14.63289 -92.62700
Nnoc_10 51.89348 13.76488 -85.55300
Nnoc_11 51.84883 13.64525 -87.05000
Nnoc_12 51.60535 11.92700 -89.34916
Nnoc_13 51.96695 12.08462 -103.65232
Nnoc_15 51.96695 12.08462 -121.18926
Nnoc_16 51.45457 11.50689 -98.49742

This dataset is available in IsoriX. It contains hydrogen delta values of fur keratin from common noctule bats (Nyctalus noctula) killed at wind turbines in northern Germany. The question we want to answer is whether those bats are local bats or migratory bats killed during their migration.

5.2 Running the geographic assignments

Running the assignment is fairly straightforward: you must use the function isofind() and provide it with the data frame with the data to assign, the isoscape object (i.e. the object produced by the function isoscape(); see section 3), and (if needed) the calibration object (i.e. the object produced by the function calibfit(); see section 4):

AssignedBats <- isofind(data = AssignDataBat,
                        isoscape = EuropeIsoscape,
                        calibfit = CalibBats)

The output of the function, here stored under the name AssignedBats, is a list of class ISOFIND containing 3 elements: the element sample storing all rasters related to the assignment of each single sample, group storing the rasters related to the group assignment, and sp_points storing the spatial points of the source locations as well as the ones for the calibration samples and the assignment samples (if available). As for the output of the function isoscape(), we choose to store all this information into a single R object to make the plotting of the assignment maps easy peasy.

5.3 How does the assignment work?

Here we explain how the assignment procedure work without going into geeky details. It is important that you understand how it works because it has bearing on how you will interpret the assignment results.

First, isofind() runs the assignment test for each sample at each candidate location or cell from our structural raster. Precisely, for each candidate location the function tests whether the isoscape value at the unknown location of origin of a given sample is consistent with the predicted isoscape value at the candidate location.

To do this, we use the difference between these two isoscape values as test statistics. The isoscape value at the unknown location of origin is either the isoscape value of the assignment sample if no calibration is being supplied (when the source samples and assignment samples are of the same type), or isofind() computes it by inverting the calibration relationship established by the calibration fit (when a calibration is required; see section 4). The predicted isoscape value of the candidate location is directly the point prediction of the isotope value at each prediction location, which is an information given by the main isoscape (see section 3). The test statistics over all locations can be retrieved for each sample from the object produced by isofind(). For example, the raster stored under the name AssignedBats$sample$stat$Nnoc_1 contains such information for the first bat:

AssignedBats$sample$stat$Nnoc_1
## class       : RasterLayer 
## dimensions  : 1283, 2787, 3575721  (nrow, ncol, ncell)
## resolution  : 0.03333333, 0.03333333  (x, y)
## extent      : -31.55847, 61.34153, 28.13319, 70.89986  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : Nnoc_1 
## values      : -59.13842, 65.70829  (min, max)

We know detail the variance associated to the test statistics. Under the null hypothesis (the isotope value at location of origin = the isotope value at the candidate location), the variance of the test statistic is the variance of the difference between the two isoscape values that defines it. The variance of the test statistics is also stored as a raster (e.g. AssignedBats$sample$stat_var$Nnoc_1).

In the appendix of our book chapter (Courtiol et al. 2019), we show that such variance of the assignment test depends on (1) the prediction variance of the mean fit and is therefore not constant over the entire spatial range (see Appendix). Therefore, a small difference in isoscape values at a location where the prediction variance is very small may argue against an assignment at the corresponding location, while a large difference where the prediction variance is very large may not. The variance of the test also depends on (2) the residual variation among samples at the unknown location of origin. In the current workflow, we estimate such variance from the residual variance of the calibration fit, which we assume constant. Other approaches are possible. In particular, if assignment samples are of the same kind as the source sample (and thus when no calibration model exists), such variance is instead given by the residual dispersion variance of the isoscape. The distribution of the assignment test also depends on (3) the prediction variance of the calibration fit (which decreases with the size of the calibration dataset) given the predicted isoscape value at the candidate location. Finally, the distribution of the assignment test depends on (4) a covariance term between the prediction error of the calibration fit given the predicted isotope value at the candidate location, and the prediction error of the isoscape at this location. We can compute this last term, which relates the uncertainty of the isoscape to the one of the calibration, in IsoriX because the calibration and the geographic assignment are both computed within the same software package. Note however that we have not yet implemented this 4th variance term. We are planning to do so for the version 1.0.

For each candidate location of origin, the p-value of the assignment test is the probability that an assignment sample truly originating from the candidate location has a larger absolute value than the observed test statistic. Even if a location presents a perfect match (p-value close to one), it does not mean that the location is the true place of origin. It only means that the candidate location has an isotopic composition similar to the one of the true origin location, irrespective of where this location may be. This limitation is of course not specific to IsoriX.

The p-value of the assignment test is also stored in a raster within the object produced by isofind() (e.g. AssignedBats$sample$pv$Nnoc_1) and you can used such raster to directly extract the result of the test of an assignment hypothesis. For example, to test whether the first bat to assign (Nnoc_1) originates from its sample location we can simply type:

extract(AssignedBats$sample$pv[[1]], cbind(AssignDataBat$long[1], AssignDataBat$lat[1]))
##           
## 0.9863708

The outcome of this call gives a non-significant p-value, so we cannot rule out a local origin for the first bat. Repeating this for all assignment samples shows that we only reject a local origin for the bat Nnoc_15:

Pvalues <- sapply(1:nrow(AssignDataBat),
                  function(i) extract(AssignedBats$sample$pv[[i]],
                                       cbind(AssignDataBat$long[i],
                                             AssignDataBat$lat[i])))
AssignDataBat$sample_ID[Pvalues <= 0.05]
## [1] Nnoc_15
## 14 Levels: Nnoc_1 Nnoc_10 Nnoc_11 Nnoc_12 Nnoc_13 Nnoc_15 Nnoc_16 ... Nnoc_9

The assignment function isofind() always performs two types of geographic assignments: an individual assignment and a group assignment. The individual assignment which we have just explored is useful to infer the origin of a single sample or to infer independently the origin of several samples. In contrast, the group assignment infers a single origin for all the samples in the data frame. The underlying assumption of a group assignment is that all the individuals come from the same origin location.

Note: if you wish to assign different groups of samples separately, you simply have to prepare several data frames and run the assignment function on each of the groups separately.

Once the function isofind() has assigned all samples individually, it thus performs the group assignment. The p-value of the geographic assignment test for the entire group (here stored as the raster AssignedBats$group$pv) is computed by combining, at each location, the p-value of the geographic assignment for all assignment samples using Fisher’s combination of probabilities method (Fisher 1925).

The null hypothesis of this group assignment is now that all assignment samples originated from a location with identical distribution of source isotope values as the one being tested. Therefore, the test considers that all assignment samples must come from the same location of origin (or at least from a location presenting the same isotopic composition), and if this assumption is not fulfilled, the test may exclude all candidate locations. This is why you should always start by exploring individual assignments even you are ultimately interested in a group assignment!

5.4 Plotting the geographic assignments

You can plot the geographic assignments by simply calling the generic function plot(). The generic function will invokes a plotting method very similar to the one described for isoscapes (section @ref(plot_isoscapes)). Most arguments between the two plotting methods are the same. They are detailed in the help file ?plots. One main difference, however, is that the argument which used to indicate which raster to plot (mean, mean_predVar, …) is now replaced by an argument who. If you set who to "group", the function will show a single map representing the geographic assignment for the group. If instead you want to obtain a single assignment map per sample, you can provide a vector of the samples to be plotted (as indices or using their ID)to who .

Here is an example for plotting the first 4 bats:

plot(AssignedBats, who = 1:4)

Here is an example for plotting specific all 14 bats:

plot(AssignedBats,
     who = 1:14,
     sources = list(draw = FALSE),
     calibs = list(draw = FALSE),
     assigns = list(draw = FALSE)
     )

Note that we used the arguments sources, calibs and assigns to prevent the automatic display of points which would saturate the tiny maps.

Here is an example showing how to plot a specific sample using its name:

plot(AssignedBats, who = "Nnoc_15")

We did not choose the bat randomly. As you can see, this probability maps confirm what we found above: the sampling location (eastern Germany) being rejected as a source of origin for the bat Nnoc_15.

We will now make a group assignment:

plot(AssignedBats, who = "group")

This map does not make much sense if the bats come from different location. So instead, we will now rerun the assignment and plot the result after taking out the individual that is likely to originate from outside Germany (Nnoc_15). This can be done by typing:

AssignDataBat2 <- subset(AssignDataBat, sample_ID != "Nnoc_15")
AssignedBats2 <- isofind(data = AssignDataBat2,
                         isoscape = EuropeIsoscape,
                         calibfit = CalibBats)
## [1] "computing the test statistic and its variance..."
## [1] "running the assignment test..."
## [1] "combining assignments across samples..."
## [1] "assignments for all 13 organisms have been computed in 56 sec."
## [1] "converting log p-values into p-values..."
## [1] "applying the mask..."
## [1] "done!"
plot(AssignedBats2, who = "group")

5.4.1 More information on the assignment plots

The plotting method use the information from the assignment test to define confidence regions. By default, a 95%-confidence region is being displayed in grey, which means that the non grey area should contain the true location of origin with a 95% probability. You can alter this setting using the argument level.

Another default setting you choose, is to apply a mask to hide the outcome of the geographic assignment over the large water bodies. You can change that as well if you fancy using the argument mask.

Related to this arguments, you can also define another mask using the argument mask2. For example, you may want to use it in order to hide areas outside the IUCN distribution map for your species. If more than two masks are necessary, just combine several masks into one and provide that one to this argument mask2. Otherwise, or if the masks need to be visually distinct, you can use the function layer().

5.5 Fiddling with the assignment plots

There are plenty options you can change and plenty functions from the packages lattice, latticeExtra and rasterVis that you can also use to produce the perfect plot. We will add here different examples.

5.5.1 Example: adding the maximum

As a first example, we will show you how to add the point that is the most compatible with the unknown origin of these bats using the package raster.

We start by recovering the coordinate of such a point:

library(raster)
coord <- coordinates(AssignedBats2$group$pv)
MaxLocation <- coord[which.max(values(AssignedBats2$group$pv)), ]
Maximum <- data.frame(long = MaxLocation[1], lat = MaxLocation[2])
Maximum
##       long      lat
## x 61.32486 48.28319

You can then plot these information on top of the last plot we created by simply typing:

plot(AssignedBats2, who = "group", plot = FALSE) + 
  xyplot(Maximum$lat ~ Maximum$long,
         pch = 13, col = "orange", cex = 5, lwd = 2, panel = panel.points)

[more examples to come!]