# Chapter 5 Inferring geographic origins

We will now show how to perform the geographic assignment as such. For this, you need the isoscapes built as shown in section 3 and the isotope data of the *assignment samples*. If the type of samples you want to assign differ from the type of samples you used to create the isoscapes, then you will also need a calibration fit obtained as shown in section 4.

## 5.1 Preparing the data

You need at least two pieces of information from the samples that you want to assign: a label and an isotope value. You should provide such information as a data frame with the column names `sample_ID`

and `sample_value`

. You may also add the latitude and longitude of the sampling locations for the assignment samples, if you want that such locations appear on your assignment maps. However, such locations are not considered for the statistical assignment.

Here we are going to assign the following bats:

`AssignDataBat`

sample_ID | lat | long | sample_value |
---|---|---|---|

Nnoc_1 | 51.72323 | 13.67630 | -86.58500 |

Nnoc_2 | 51.72323 | 13.67630 | -93.96700 |

Nnoc_3 | 51.72323 | 13.67630 | -88.71600 |

Nnoc_4 | 51.69520 | 14.64945 | -84.85700 |

Nnoc_5 | 51.70366 | 14.63915 | -104.21900 |

Nnoc_7 | 51.69520 | 14.64945 | -85.58300 |

Nnoc_8 | 51.70536 | 14.63289 | -89.99500 |

Nnoc_9 | 51.70536 | 14.63289 | -92.62700 |

Nnoc_10 | 51.89348 | 13.76488 | -85.55300 |

Nnoc_11 | 51.84883 | 13.64525 | -87.05000 |

Nnoc_12 | 51.60535 | 11.92700 | -89.34916 |

Nnoc_13 | 51.96695 | 12.08462 | -103.65232 |

Nnoc_15 | 51.96695 | 12.08462 | -121.18926 |

Nnoc_16 | 51.45457 | 11.50689 | -98.49742 |

This dataset is available in *IsoriX*. It contains hydrogen delta values of fur keratin from common noctule bats (*Nyctalus noctula*) killed at wind turbines in northern Germany. The question we want to answer is whether those bats are local bats or migratory bats killed during their migration.

## 5.2 Running the geographic assignments

Running the assignment is fairly straightforward: you must use the function `isofind()`

and provide it with the data frame with the data to assign, the isoscape object (i.e. the object produced by the function `isoscape()`

; see section 3), and (if needed) the calibration object (i.e. the object produced by the function `calibfit()`

; see section 4):

```
AssignedBats <- isofind(data = AssignDataBat,
isoscape = EuropeIsoscape,
calibfit = CalibBats)
```

The output of the function, here stored under the name `AssignedBats`

, is a list of class `ISOFIND`

containing 3 elements: the element `sample`

storing all rasters related to the assignment of each single sample, `group`

storing the rasters related to the group assignment, and `sp_points`

storing the spatial points of the source locations as well as the ones for the calibration samples and the assignment samples (if available). As for the output of the function `isoscape()`

, we choose to store all this information into a single R object to make the plotting of the assignment maps easy peasy.

## 5.3 How does the assignment work?

Here we explain how the assignment procedure work without going into geeky details. It is important that you understand how it works because it has bearing on how you will interpret the assignment results.

First, `isofind()`

runs the assignment test for each sample at each candidate location or cell from our structural raster. Precisely, for each candidate location the function tests whether the isoscape value at the unknown location of origin of a given sample is consistent with the predicted isoscape value at the candidate location.

To do this, we use the difference between these two isoscape values as test statistics. The isoscape value at the unknown location of origin is either the isoscape value of the assignment sample if no calibration is being supplied (when the source samples and assignment samples are of the same type), or `isofind()`

computes it by inverting the calibration relationship established by the calibration fit (when a calibration is required; see section 4). The predicted isoscape value of the candidate location is directly the point prediction of the isotope value at each prediction location, which is an information given by the main isoscape (see section 3). The test statistics over all locations can be retrieved for each sample from the object produced by `isofind()`

. For example, the raster stored under the name `AssignedBats$sample$stat$Nnoc_1`

contains such information for the first bat:

`AssignedBats$sample$stat$Nnoc_1`

```
## class : RasterLayer
## dimensions : 1283, 2787, 3575721 (nrow, ncol, ncell)
## resolution : 0.03333333, 0.03333333 (x, y)
## extent : -31.55847, 61.34153, 28.13319, 70.89986 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : Nnoc_1
## values : -59.13842, 65.70829 (min, max)
```

We know detail the variance associated to the test statistics. Under the null hypothesis (the isotope value at location of origin = the isotope value at the candidate location), the variance of the test statistic is the variance of the difference between the two isoscape values that defines it. The variance of the test statistics is also stored as a raster (e.g. `AssignedBats$sample$stat_var$Nnoc_1`

).

In the appendix of our book chapter (Courtiol et al. 2019), we show that such variance of the assignment test depends on (1) the prediction variance of the mean fit and is therefore not constant over the entire spatial range (see Appendix). Therefore, a small difference in isoscape values at a location where the prediction variance is very small may argue against an assignment at the corresponding location, while a large difference where the prediction variance is very large may not. The variance of the test also depends on (2) the residual variation among samples at the unknown location of origin. In the current workflow, we estimate such variance from the residual variance of the calibration fit, which we assume constant. Other approaches are possible. In particular, if assignment samples are of the same kind as the source sample (and thus when no calibration model exists), such variance is instead given by the residual dispersion variance of the isoscape. The distribution of the assignment test also depends on (3) the prediction variance of the calibration fit (which decreases with the size of the calibration dataset) given the predicted isoscape value at the candidate location. Finally, the distribution of the assignment test depends on (4) a covariance term between the prediction error of the calibration fit given the predicted isotope value at the candidate location, and the prediction error of the isoscape at this location. We can compute this last term, which relates the uncertainty of the isoscape to the one of the calibration, in IsoriX because the calibration and the geographic assignment are both computed within the same software package. Note however that we have not yet implemented this 4th variance term. We are planning to do so for the version 1.0.

For each candidate location of origin, the p-value of the assignment test is *the probability that an assignment sample truly originating from the candidate location has a larger absolute value than the observed test statistic*. **Even if a location presents a perfect match (p-value close to one), it does not mean that the location is the true place of origin**. It only means that the candidate location has an isotopic composition similar to the one of the true origin location, irrespective of where this location may be. This limitation is of course not specific to IsoriX.

The p-value of the assignment test is also stored in a raster within the object produced by `isofind()`

(e.g. `AssignedBats$sample$pv$Nnoc_1`

) and you can used such raster to directly extract the result of the test of an assignment hypothesis. For example, to test whether the first bat to assign (`Nnoc_1`

) originates from its sample location we can simply type:

`extract(AssignedBats$sample$pv[[1]], cbind(AssignDataBat$long[1], AssignDataBat$lat[1]))`

```
##
## 0.9863708
```

The outcome of this call gives a non-significant p-value, so we cannot rule out a local origin for the first bat. Repeating this for all assignment samples shows that we only reject a local origin for the bat Nnoc_15:

```
Pvalues <- sapply(1:nrow(AssignDataBat),
function(i) extract(AssignedBats$sample$pv[[i]],
cbind(AssignDataBat$long[i],
AssignDataBat$lat[i])))
AssignDataBat$sample_ID[Pvalues <= 0.05]
```

```
## [1] Nnoc_15
## 14 Levels: Nnoc_1 Nnoc_10 Nnoc_11 Nnoc_12 Nnoc_13 Nnoc_15 Nnoc_16 ... Nnoc_9
```

The assignment function `isofind()`

always performs two types of geographic assignments: an individual assignment and a group assignment. The individual assignment which we have just explored is useful to infer the origin of a single sample or to infer independently the origin of several samples. In contrast, the group assignment infers a single origin for all the samples in the data frame. The underlying assumption of a group assignment is that all the individuals come from the same origin location.

Note: if you wish to assign different groups of samples separately, you simply have to prepare several data frames and run the assignment function on each of the groups separately.

Once the function `isofind()`

has assigned all samples individually, it thus performs the group assignment. The p-value of the geographic assignment test for the entire group (here stored as the raster `AssignedBats$group$pv`

) is computed by combining, at each location, the p-value of the geographic assignment for all assignment samples using Fisher’s combination of probabilities method (Fisher 1925).

The null hypothesis of this group assignment is now that all assignment samples originated from a location with identical distribution of source isotope values as the one being tested. Therefore, the test considers that all assignment samples must come from the same location of origin (or at least from a location presenting the same isotopic composition), and if this assumption is not fulfilled, the test may exclude all candidate locations. This is why you should always start by exploring individual assignments even you are ultimately interested in a group assignment!

## 5.4 Plotting the geographic assignments

You can plot the geographic assignments by simply calling the generic function `plot()`

. The generic function will invokes a plotting method very similar to the one described for isoscapes (section @ref(plot_isoscapes)). Most arguments between the two plotting methods are the same. They are detailed in the help file `?plots`

. One main difference, however, is that the argument `which`

used to indicate which raster to plot (`mean`

, `mean_predVar`

, …) is now replaced by an argument `who`

. If you set `who`

to `"group"`

, the function will show a single map representing the geographic assignment for the group. If instead you want to obtain a single assignment map per sample, you can provide a vector of the samples to be plotted (as indices or using their ID)to `who`

.

Here is an example for plotting the first 4 bats:

`plot(AssignedBats, who = 1:4)`

Here is an example for plotting specific all 14 bats:

```
plot(AssignedBats,
who = 1:14,
sources = list(draw = FALSE),
calibs = list(draw = FALSE),
assigns = list(draw = FALSE)
)
```

Note that we used the arguments `sources`

, `calibs`

and `assigns`

to prevent the automatic display of points which would saturate the tiny maps.

Here is an example showing how to plot a specific sample using its name:

`plot(AssignedBats, who = "Nnoc_15")`

We did not choose the bat randomly. As you can see, this probability maps confirm what we found above: the sampling location (eastern Germany) being rejected as a source of origin for the bat Nnoc_15.

We will now make a group assignment:

`plot(AssignedBats, who = "group")`

This map does not make much sense if the bats come from different location. So instead, we will now rerun the assignment and plot the result after taking out the individual that is likely to originate from outside Germany (Nnoc_15). This can be done by typing:

```
AssignDataBat2 <- subset(AssignDataBat, sample_ID != "Nnoc_15")
AssignedBats2 <- isofind(data = AssignDataBat2,
isoscape = EuropeIsoscape,
calibfit = CalibBats)
```

```
## [1] "computing the test statistic and its variance..."
## [1] "running the assignment test..."
## [1] "combining assignments across samples..."
## [1] "assignments for all 13 organisms have been computed in 56 sec."
## [1] "converting log p-values into p-values..."
## [1] "applying the mask..."
## [1] "done!"
```

`plot(AssignedBats2, who = "group")`

### 5.4.1 More information on the assignment plots

The plotting method use the information from the assignment test to define confidence regions. By default, a 95%-confidence region is being displayed in grey, which means that the non grey area should contain the true location of origin with a 95% probability. You can alter this setting using the argument `level`

.

Another default setting you choose, is to apply a mask to hide the outcome of the geographic assignment over the large water bodies. You can change that as well if you fancy using the argument `mask`

.

Related to this arguments, you can also define another mask using the argument `mask2`

. For example, you may want to use it in order to hide areas outside the IUCN distribution map for your species. If more than two masks are necessary, just combine several masks into one and provide that one to this argument `mask2`

. Otherwise, or if the masks need to be visually distinct, you can use the function `layer()`

.

## 5.5 Fiddling with the assignment plots

There are plenty options you can change and plenty functions from the packages *lattice*, *latticeExtra* and *rasterVis* that you can also use to produce the perfect plot. We will add here different examples.

### 5.5.1 Example: adding the maximum

As a first example, we will show you how to add the point that is the most compatible with the unknown origin of these bats using the package *raster*.

We start by recovering the coordinate of such a point:

```
library(raster)
coord <- coordinates(AssignedBats2$group$pv)
MaxLocation <- coord[which.max(values(AssignedBats2$group$pv)), ]
Maximum <- data.frame(long = MaxLocation[1], lat = MaxLocation[2])
Maximum
```

```
## long lat
## x 61.32486 48.28319
```

You can then plot these information on top of the last plot we created by simply typing:

```
plot(AssignedBats2, who = "group", plot = FALSE) +
xyplot(Maximum$lat ~ Maximum$long,
pch = 13, col = "orange", cex = 5, lwd = 2, panel = panel.points)
```

[more examples to come!]