Chapter 3 Building Isoscapes

3.1 Preparing the source data

To get an isoscape, you need data capturing how the isotope composition varies in space. In this documentation, we will use data provided by the Global Networks of Isotopes in Precipitation (GNIP), but you can use whatever source of data you prefer, or have access to. In any case, do make sure to format your data the same way we do it. Precisely, your dataset should be a data.frame (or a tibble) with the same columns as the ones shown below in section 3.1.2.

3.1.1 The GNIP database

You can download precipitation isotope data for hydrogen and oxygen from the Global Networks of Isotopes in Precipitation (GNIP).

This step must be done outside IsoriX since there is no application programming interface for GNIP.

To get to know what the GNIP go there and explore related resources: https://www.iaea.org/services/networks/gnip

The GNIP data are free to download after the registration process is completed. The following link will bring you to the right place to create an account: https://websso.iaea.org/IM/UserRegistrationPage.aspx?returnpage=http://nucleus.iaea.org/wiser

Once your account has been activated, download the data you need from the WISER interface: https://nucleus.iaea.org/wiser/index.aspx (Tab Datasets)

Last time we checked, downloads were limited to 5,000 records per batch, which makes the compilation of huge databases fastidious. GNIP promised to come up, in the future, with datasets directly prepared for IsoriX to save you the trouble. We are eagerly waiting for this to happen!

3.1.2 A toy example: GNIPDataDE

Within IsoriX we have already included a small extract from GNIP corresponds to \(\delta^2H\) values for Germany. The dataset has kindly been provided by Christine Stumpp and you can use it to try out the functions of the package without having to worry about getting data. We won’t use this dataset here but it illustrates the structure of the data needed to fit an isoscape. Here is what this dataset looks like:

GNIPDataDE

3.1.3 A real example: GNIPDataEU

We are going to show you how to prepare a dataset for IsoriX starting from a raw extract from GNIP. For this, we will use a file in which we have compiled the GNIP data for \(\delta^2H\) values for the whole world. We are unfortunately not being able to share these data with you, but you can download these data yourself as explained in section 3.1.1.

We first import the data into R:

rawGNIP <- read.csv("./data/2016-10-31 Extract_ISORIX.csv")

This dataset contains 127637 rows and 24 columns. For example, the first row contains the following information:

1
Name.of.Site NY ALESUND
Country NO
WMO.Code 100400
Latitude 78.91
Longitude 11.93
Altitude 7
Type.Of.Site Precipitation Collectors - not specified
Source.of.Information [Partners: Sverdrup Station, Norsk Polarinstitutt], [Info: Major revision in 2013. All laboratory records from 1992-2009 replaced. Meteo data record replaced for 1991-2012 with official record from Norwegion Meteorological Office.]
Sample.Name 199001
Media.Type Water - Precipitation - rain & snow (“mixed”)
Date 1990-01-15 00:00:00
Begin.Of.Period 1990-01-01 00:00:00
End.of.Period 1990-01-31 00:00:00
Comment
O18 -9.83
O18.Laboratory International Atomic Energy Agency, Vienna, AT
H2 -72.8
H2.Laboratory International Atomic Energy Agency, Vienna, AT
H3 6.9
H3.Error 0.5
H3.Laboratory International Atomic Energy Agency, Vienna, AT
Precipitation 27.6
Air.Temperature -6.4
Vapour.Pressure 2.6

We are now going to reshape these data step-by-step to make them ready for IsoriX. We first reformat some temporal information:

rawGNIP$year.begin  <- as.numeric(format(as.Date(rawGNIP$Begin.Of.Period), "%Y"))
rawGNIP$year.end    <- as.numeric(format(as.Date(rawGNIP$End.of.Period), "%Y"))
rawGNIP$year.span   <- rawGNIP$year.begin - rawGNIP$year.end
rawGNIP$month.begin <- as.numeric(format(as.Date(rawGNIP$Begin.Of.Period), "%m"))
rawGNIP$month.end   <- as.numeric(format(as.Date(rawGNIP$End.of.Period), "%m"))
rawGNIP$day.span    <- as.Date(rawGNIP$Begin.Of.Period) - as.Date(rawGNIP$End.of.Period)
rawGNIP$Year        <- as.numeric(format(as.Date(rawGNIP$Date), "%Y"))
rawGNIP$Month       <- as.numeric(format(as.Date(rawGNIP$Date), "%m"))

Mind that, since we downloaded that file, WISER has renamed Begin.Of.Period to Begin.of.Period (small case for the “o” in of), so you may need to adjust that. Do pay attention to other possible change that may have also occurred in WISER.

Second, we identify the rows for which crucial information is missing. Because we are going to make an isoscape based on deuterium measurements, we only check for the availability of data for this isotope in particular. If you work on oxygen, you will have to adjust the script accordingly. We also check that the intervals during which precipitation water has been collected for each measurement correspond roughly to one month:

rows_missing_or_unreliable_info <- is.na(rawGNIP$H2) |
                                   is.na(rawGNIP$day.span) |
                                   rawGNIP$day.span > -25 |
                                   rawGNIP$day.span < -35 

Third, we only keep the rows and columns we are interested in:

columns_to_keep <- c("Name.of.Site", "Latitude", "Longitude", "Altitude",
                     "Year", "Month", "H2")
GNIPData <- rawGNIP[!rows_missing_or_unreliable_info, columns_to_keep]

Fourth, we turn the variable Name.of.Site into a factor:

GNIPData$Name.of.Site <- as.factor(GNIPData$Name.of.Site)

Last, we rename the columns to conform to the general IsoriX format and we check that the data seem correct:

colnames(GNIPData) <- c("source_ID", "lat", "long", "elev", "year", "month", "source_value")

This is what the GNIPData looks like:

GNIPData

As you can see, the format is the same as the one for GNIPDataDE, which is precisely what we want.

3.2 Processing the source data

In order to build your isoscape with IsoriX, your dataset must be aggregated in such a way that each observation corresponds to the mean and variance of isotope values collected in one location, over a time period.

To aggregate the raw data, you can choose to aggregate your dataset on your own (e.g. using the package dplyr), or to use our function prepsources(). This function allows you to apply some restriction on the data, such as selecting only particular months, years, or locations.

The function prepsources() also allows you to aggregate the data separately for different time periods (see section 6.3). Each time period define the temporal range of the data used to prepare a single set of isoscapes. Often, people are only interested in a single time period (e.g. data over all years they have been collected). But in some cases, it may be interested to aggregate data separately for different time periods. This can be useful, in particular, to build weighted isoscapes (see section 6.3).

Here we use the function prepsources() to select from the dataset GNIPData the observations available across all years (i.e. a single time period) within an extent of latitude and longitude that covers roughly Europe. This can be done as follows:

GNIPDataEUagg <- prepsources(data = GNIPData,
                             long_min = -30, long_max = 60,
                             lat_min = 30, lat_max = 70)

Let us now visualize the dataset:

GNIPDataEUagg

As you can see, we now have a single row of data per location. The column mean_source_value gives the average across months and years. The column var_source_value gives the variance between monthly measurements spanning across years.

3.3 Fitting the geostatistical models

Isoscapes are predictions stemming from statistical models. In this section we will show you how to fit such models based on the dataset we prepared in section 3.2. We refer to set of isoscapes because IsoriX does not create a single isoscape predicting the spatial distribution of isotopes (i.e. the main isoscape), but it also creates other isoscapes related to the main isoscape. For example, it also creates an isoscape indicating how reliable the predictions are in any particular locations. If a single dataset is being used (as in section 3.2), then a single set of isoscapes need to be built. Instead, if several datasets have been prepared, then you will have to accordingly build multiple sets of isoscapes (see section 6.3 in this case).

To fit the geostatisical model required to build a single set of isoscapes, you need to use the function isofit() (otherwise you need to use the function isomultifit() as shown in section 6.3.5). The function isofit() has several parameters that can be adjusted to fit different models (see ?isofit). Here we will consider two fixed effects (the elevation and the absolute latitude) on top of the default setting:

EuropeFit <- isofit(data = GNIPDataEUagg,
                    mean_model_fix = list(elev = TRUE, lat_abs = TRUE))

3.4 Saving the fitted models

Fitting models can take substantial computing time, depending on the amount of data you have and on the complexity of the models to be fit. Therefore, you may want to save the fitted models in order to be allowed to reuse them later (even after having opened a new R session), or in order to send them to your colleagues.

This can be done as follow:

saveRDS(EuropeFit, file = "EuropeFit.rds", compress = "xz")

The function saveRDS() will store your R object in a file. To use saveRDS(), you must provide the object you want as a first argument of the function. Then, file = defines the name of the file that will store the R object in your hard drive. You can also include a path to this name so to store the file wherever you want. This name can be different from the name of the object but here we stick to the same name for clarity. The last argument compress = is optional; it allows for the creation of smaller files without loosing any content, so we always use it.

For loading a saved object (in a new session of R for example), just use the function readRDS() as follows:

EuropeFit <- readRDS(file = "EuropeFit.rds")

Be careful, we do not recommend you to reuse saved object after updating either IsoriX, or spaMM, or both. After one of such update, the best practice to make sure that every thing is working properly is to refit all your models. By doing this you may also benefit from potentially new improvements we would have implemented.

There is another way to save and load object in R, which is to use the function save() and load(). However, it is better to get to used to saveRDS() and readRDS() because only those can be used to store objects that contain spatial information (which we will do below).

3.5 Examining the fitted models

3.5.1 Plotting basic information about the models

You can display some information about the model fits by typing:

plot(EuropeFit)

The first plot shows the relationship between the observed and predicted response for the fitted model called mean_fit, which corresponds to the fit of the mean isotopic values. The second plot shows the same thing for the model called disp_fit, which corresponds to the fit of the residual dispersion variance in the isotope values. Here you can see, in these first two plots points distributed practically along the 1:1 diagonal. A different slope would suggest a high residual variance of the data. We do not expect the points to fall exactly on the 1:1 lines because the model fit does not attempt to predict perfectly the observations used during the fit. Instead, the model fit produces a smooth surface that aims at reaching a maximum predictive power for locations not considered during the fit.

The third and fourth plots show the variation in spatial autocorrelation with the distance between locations captured by mean_fit and disp_fit, respectively. These plots thus give you an idea of the strength of the spatial autocorrelation captured by the models. Here the results suggest that the autocorrelation is very strong. Not considering this autocorrelation would thus lead here to a poor approximation of the isoscape and resulting assignments.

3.5.2 Examining the summary tables

To explore the fitted models you can start by simply typing their name and the information of each fit will be displayed:

EuropeFit
## 
## ### spaMM summary of the fit of the mean model ### 
## 
## formula: mean_source_value ~ 1 + elev + lat_abs + (1 | source_ID) + Matern(1 | 
##     long + lat)
## REML: Estimation of corrPars and lambda by REML.
##       Estimation of fixed effects by ML.
## Estimation of lambda by 'outer' REML, maximizing restricted logL.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate  Cond. SE t-value
## (Intercept)  65.0326 3.412e+01   1.906
## elev         -0.0106 8.132e-04 -13.031
## lat_abs      -2.3374 4.467e-01  -5.233
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
## Distance(s): Earth 
##                    --- Correlation parameters:
##         2.nu        2.rho 
## 3.197889e-01 3.024563e-05 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    source_ID  :  1e-05 
##    long + lat  :  987.6  
## # of obs: 327; # of groups: source_ID, 327; long + lat, 326 
##  -------------- Residual variance  ------------
## Prior weights: 336 12 10 5 12 ...
## phi was fixed [through  "phi" ~ 0 + offset(pred_disp) ] to 276.851 69.7507 200.306 384.211 68.8934 ...
##  ------------- Likelihood values  -------------
##                         logLik
## logL       (p_v(h)): -1103.073
## Re.logL  (p_b,v(h)): -1104.910
## NULL
## 
## 
## ### spaMM summary of the fit of the residual dispersion model ### 
## 
## formula: var_source_value ~ 1 + (1 | source_ID) + Matern(1 | long + lat)
## Estimation of corrPars and lambda by REML (P_bv approximation of restricted logL).
## Estimation of fixed effects by ML (P_v approximation of logL).
## Estimation of lambda by 'outer' REML, maximizing restricted logL.
## family: Gamma( link = log ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)    5.883    1.942    3.03
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
## Distance(s): Earth 
##                    --- Correlation parameters:
##         2.nu        2.rho 
## 3.122032e-01 1.477985e-05 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    source_ID  :  3.635e-05 
##    long + lat  :  4.404  
## # of obs: 322; # of groups: source_ID, 322; long + lat, 321 
##  --- Residual variation ( var = phi * mu^2 )  --
## Prior weights: 335 11 9 4 11 ...
## phi was fixed to 2 
##  ------------- Likelihood values  -------------
##                        logLik
## logL      (P_v(h)): -2066.554
## Re.logL (P_b,v(h)): -2064.972
## 2.rho are at their lower optimization bound. 
## NULL
## 
## [models fitted with spaMM version 4.4.16] 
## 
## NULL

The object EuropeFit created in section 3.3 is a list containing the two fits aforementioned: mean_fit and disp_fit. This is why you see information about those two corresponding fits. These summary tables are created behind the scenes by spaMM, so, learning a little about spaMM and mixed models is useful to derive meanings from these outputs. (We are also planning to write an extensive description of model outputs to guide you.)

3.5.3 Doing more stuff with the fitted models

Each of model fit is an object of class HLfit which has been created by spaMM. You can thus directly manipulate those fits using spaMM for advance usage. For example, if you want to compare different models using information criterion, you can call the AIC() function implemented by spaMM by simply typing:

AIC(EuropeFit$mean_fit)
##        marginal AIC:     conditional AIC:      dispersion AIC: 
##            2220.1460            1840.6432            2217.8207 
##        effective df: 
##             106.5233

Note that we re-exported some spaMM functions for you to use without the need to load the package spaMM (e.g. AIC.HLfit() which is implicitly here), but you will have to call library(spaMM) to access all spaMM functions.

3.6 Building the isoscapes

3.6.1 Prepare the structural raster

To build the isoscape using the geostatistical models fitted above, you need to provide the function isoscape() with a structural raster. Such raster defines the locations at which we want to predict the isotopic values. It must also contains the values for the predictors used in the geostatistical models at such locations.

We will thus start by preparing such structural raster. Here, as a basis for the structural raster, we will download a high resolution elevation raster from Internet using our function getelev:

getelev(file = "input/elevation_world_z5.tif")

You may not need to pass any argument to getelev(), but here we chose to define ourselves where to store the elevation raster and how to call such file using file = "input/elevation_EU_z5.tif". We did not alter the default resolution, which is high enough for our example, but doing so would be easy: one simply needs to define a value lower or larger than 5 for an argument called z (see ?getelev() for details and here for information on what value for the zoom parameter z really implies).

We then import the high resolution elevation raster and transform it into an R object of class SpatRaster object using the package terra:

## we import the high resolution raster
ElevWorld <- rast("input/elevation_world_z5.tif")

## we check that the import worked
ElevWorld
## class       : SpatRaster 
## dimensions  : 9899, 20950, 1  (nrow, ncol, nlyr)
## resolution  : 0.0171833, 0.01718412  (x, y)
## extent      : -180, 179.9902, -85.05449, 85.05113  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : elevation_world_z5.tif 
## name        : elevation_world_z5 
## min value   :             -10697 
## max value   :               7963
## we draw a simple plot to check what we just downloaded
plot(ElevWorld)

Then, you may need to crop and resize this raster to the appropriate size before creating the isoscapes.

Cropping is used to define a rectangular area over which we will predict the isotope values. If the area you define to build your isoscapes is too small, you may miss crucial information. For example, a set of isoscapes that cover an area that is too small may lead you to miss the real origin of your migratory organisms during the assignment step. Similarly, if the area to be used for building the isoscapes is too large, you may extrapolate and the model will thus create unreliable predictions.

Resizing the raster is used to reduce its resolution by aggregating neighboring cells in the raster. If the selected resolution is too large, computations may take too much time or memory to work. If the selected resolution is too coarse, you may smooth too much the predictors over the landscape (e.g. the peaks and valleys of a mountain chain will be reduced to a plateau), which will lead to a unrealistic isoscapes that are too smooth. To crop and resize the structural raster, you can use our function prepraster().

Geeky note: In principle you could directly crop and resize the raster to the dimensions you want before downloading it (getelev() does allows for you to do that). However, in practice, we recommend you to download a raster that is a larger than what you need (getelev() downloads the full world by default, which should always be sufficient) and with a resolution overshooting a little what you may need. The benefit of relying on prepraster() rather than getelev() to crop and downsize the structural raster is that you can study the impact of your decisions by simply rerunning prepraster() again – with different settings – without having to download again a large amount of data from the net.

Here, the structural raster downloaded with getelev() covers the entire planet, so we will crop it to the limits of Europe, which we also used to select our source data (see section 3.2).

To crop the elevation raster to a particular size, you can either provide a set of coordinates to prepraster()by means of the argument manual_crop. Alternatively, you can simply provide a fitted model to the function prepraster() by means of the argument isofit. In this latter case, the function will automatically determine the coordinates required for the cropping, from the coordinates of the weather stations used during the fitting procedure. We choose here to use this simpler way of defining the cropping area:

ElevEurope <- prepraster(raster = ElevWorld,
                         isofit = EuropeFit,
                         aggregation_factor = 4)

You may see that we also chose here to reduce the resolution of the elevation raster by choosing an aggregation factor of 4. We recommend to first use a large aggregation factor (e.g. 10) to draft your workflow and quickly notice errors in your code. Once things are working as they should, we recommend you then use the lowest aggregation factor that you hardware and patience can handle (i.e. 1 if possible). As mentioned above, if you need isoscapes with a even greater resolution, use getelev() with a value of the parameter z that is higher than 5 (an increment of 1 – i.e. z = 6 already makes a big difference as shown here). Here we set the aggregation_factor to 4 as something in between which produces plots nice enough for this tutorial, without being too heavy for us to recompile this bookdown quickly; but, for real applications, you should use a higher resolution than used here.

You can easily check what your structural raster looks like by plotting it. Compared to the map plotting above for ElevWorld, we will now draw a slightly more advanced map using the functions of the packages lattice and rasterVis which we made available in IsoriX:

levelplot(ElevEurope,
          margin = FALSE,
          col.regions = viridisLite::turbo, ## the colour palette
          main = "Structural raster") + 
layer(lpolygon(CountryBorders, border = "white")) +
layer(lpolygon(OceanMask, border = "white", col = "lightgrey"))

Note that, in this raster, the highest elevation does not correspond to the highest elevations that are known in the depicted area (the maximal value in the raster is 4113m, but Mount Elbrus is 5642m tall). This is not a bug but an expected result of aggregation. Indeed, each cell of the raster only contains a single value which corresponds to a summary statistic (the mean) applied to all locations aggregated together in the cell.

Geeky note: You may change what summary statistic is applied when using prepraster() using the argument aggregation_fn, but you have no control over the aggregation happening within getelev() since it uses ultimately a package that does not allow for controlling the summary statistics. Whatever you do, think carefully on the consequences of your choice upon how you are planning to use the isoscapes (e.g. for assigning species living only at the top of the mountains, using aggregation_fn = max may make more sense than the default).

3.6.2 Predicting the isoscapes

We will now build the isoscapes using the function isoscape(). The function simply takes the structural raster and the fitted model as arguments:

EuropeIsoscape <- isoscape(raster = ElevEurope,
                           isofit = EuropeFit)

The function isoscape() creates 8 different rasters that correspond to different aspects of the isoscape (see ?isoscape() for details).

We can check which rasters have been created by typing:

EuropeIsoscape$isoscapes
## class       : SpatRaster 
## dimensions  : 623, 1352, 8  (nrow, ncol, nlyr)
## resolution  : 0.06873321, 0.06873649  (x, y)
## extent      : -31.55346, 61.37383, 28.08577, 70.9086  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source(s)   : memory
## names       :        mean, mean_predVar, mean_~idVar, mean_respVar,       disp, disp_predVar, ... 
## min values  : -124.360001,     2.875219,    66.75802,     70.27485,   66.75802,    0.0125852, ... 
## max values  :    5.775823,   223.086144,  1557.88958,   1576.86777, 1557.88958,    0.6028812, ...

3.7 Plotting the isoscapes

One particular strength of IsoriX is the ability to produce nice looking and highly customisable plots of the isoscapes and assignment maps. All plotting functions in IsoriX present default settings so that a simple call to plot() on your object is sufficient to generate a nice looking plot.

For example, here you can obtain the main isoscape by typing:

plot(EuropeIsoscape)

This shows the prediction of the average isotope value in space (i.e. the point predictions).

For more information about how to improve your plots, check the section 6.1.

3.7.1 Other isoscapes

You can also plot any of the other isoscapes by specifying the name of the raster using the argument which. As examples, we are now going to plot the residual variation in isotope values predicted at each location, and the prediction variance:

plot(EuropeIsoscape, which = "mean_residVar")

This first isoscape quantifies the temporal variation at each location. Here the variation represents the between month variance spanning across years, but it will not always be so as it depends on what how you aggregated the data (see section 3.2). Interestingly, you can see that the variance is large and spatially structured. Believe or not, but most methods of assignment alternative to IsoriX consider such variance to be equal to zero!

plot(EuropeIsoscape, which = "mean_predVar")

This second isoscape quantifies the uncertainty in your point predictions. As you can see, the uncertainty increases when you go far from the source location. This isoscape can help to update the design of the data collection. It is also used internally when performing assignment. Indeed, if you are very uncertain of what the average isotope value is in a location, then you cannot rule out that any organism may have come from that location. It is unfortunate but an inescapable reality.