Chapter 3 Building Isoscapes
3.1 Preparing the raw data
To get an isoscape, you need data of the isotope composition in space. In this documentation, we will use data provided by the Global Networks of Isotopes in Precipitation (GNIP) but you can use whatever source of data you prefer, or have access to. In any case, just make sure that you format your data the same way we do it so that IsoriX interpret your dataset as it should. Precisely, your dataset should be a
data.frame with the same columns as the ones shown below in section 3.1.2.
3.1.1 The GNIP database
You can download precipitation isotope data for hydrogen and oxygen from the Global Networks of Isotopes in Precipitation (GNIP).
The GNIP data are free to download after the registration process is completed. The following link will bring you to the right place to create an account: https://websso.iaea.org/IM/UserRegistrationPage.aspx?returnpage=http://nucleus.iaea.org/wiser
Once your acount has been activated, download the data you need from here: https://nucleus.iaea.org/wiser/index.aspx (Tab Datasets)
For the time being, downloads are limited to 5,000 records per batch, which makes the compilation of huge databases fastidious. GNIP promised to come up, in the future, with datasets directly prepared for IsoriX to save you the trouble. We are eagerly waiting for this to happen!
3.1.2 A toy example: GNIPDataDE
Within IsoriX we have already included a small extract from GNIP for you to try out the functions of the package without having to worry about the data. This small dataset corresponds to \(\delta^2H\) values for Germany. The dataset has kindly been provided by Christine Stumpp. Here is what the beginning of the dataset looks like:
3.1.3 A real example: GNIPDataEU
Here we are going to show you how to prepare a dataset for IsoriX from a raw extract from GNIP. For this, we will use a file in which we have compiled the GNIP data for \(\delta^2H\) values for the whole world. We are unfortunately not being able to share these data with you but you can download these data yourself as explained in section 3.1.1.
We first import the data into R:
rawGNIP <- read.csv("./data/2016-10-31 Extract_ISORIX.csv")
This dataset contains
127637 rows and
24 columns. For example the first row contains the following information:
|Type.Of.Site||Precipitation Collectors - not specified|
|Source.of.Information||[Partners: Sverdrup Station, Norsk Polarinstitutt], [Info: Major revision in 2013. All laboratory records from 1992-2009 replaced. Meteo data record replaced for 1991-2012 with official record from Norwegion Meteorological Office.]|
|Media.Type||Water - Precipitation - rain & snow (“mixed”)|
|O18.Laboratory||International Atomic Energy Agency, Vienna, AT|
|H2.Laboratory||International Atomic Energy Agency, Vienna, AT|
|H3.Laboratory||International Atomic Energy Agency, Vienna, AT|
We are now going to reshape these data step-by-step to make them ready for IsoriX. We first reformat some temporal information:
rawGNIP$year.begin <- as.numeric(format(as.Date(rawGNIP$Begin.Of.Period), "%Y")) rawGNIP$year.end <- as.numeric(format(as.Date(rawGNIP$End.of.Period), "%Y")) rawGNIP$year.span <- rawGNIP$year.begin - rawGNIP$year.end rawGNIP$month.begin <- as.numeric(format(as.Date(rawGNIP$Begin.Of.Period), "%m")) rawGNIP$month.end <- as.numeric(format(as.Date(rawGNIP$End.of.Period), "%m")) rawGNIP$day.span <- as.Date(rawGNIP$Begin.Of.Period) - as.Date(rawGNIP$End.of.Period) rawGNIP$Year <- as.numeric(format(as.Date(rawGNIP$Date), "%Y")) rawGNIP$Month <- as.numeric(format(as.Date(rawGNIP$Date), "%m"))
Second, we identify the rows for which crucial information is missing. Because we are going to make an isoscape based on deuterium measurements, we only check for the availability of data for this isotope in particular. If you work on oxygen, you will have to adjust the script accordingly. We also check that the intervals during which precipitation water has been collected for each measurement correspond roughly to one month:
rows_missing_info <- is.na(rawGNIP$H2) | is.na(rawGNIP$day.span) | rawGNIP$day.span > -25 | rawGNIP$day.span < -35
Third, we only keep the rows and columns we are interested in:
columns_to_keep <- c("Name.of.Site", "Latitude", "Longitude", "Altitude", "Year", "Month", "H2") GNIPData <- rawGNIP[!rows_missing_info, columns_to_keep]
Fourth, we turn the variable
Name.of.Site into a factor:
GNIPData$Name.of.Site <- as.factor(GNIPData$Name.of.Site)
Last, we rename the columns to conform to the general IsoriX format and we check that the data seem correct:
colnames(GNIPData) <- c("source_ID", "lat", "long", "elev", "year", "month", "source_value")
This is what the beginning of
GNIPData looks like:
As you can see, the format is the same as the one for
GNIPDataDE, which is what we want.
3.2 Processing the raw data
In order to build your isoscape with IsoriX, your dataset must be aggregated in such a way that each observation corresponds to the mean and variance of isotope values collected in one location, and potentially over fixed periods of time.
The time interval considered corresponds to the interval for which single isoscapes will be build. For example, if the interval is the month you will end up with one set of isoscapes for each month of the year (even if those can later be aggregated to form yearly isoscape as we shall see in section XX [to come]).
To aggregate the data, you can choose to aggregate your dataset on your own, or to use our function
prepsources(). This function allows you to aggregate the data over different time intervals. It also allows you to apply some restriction on the data, such as selecting only particular months, years, or locations.
Here we will use the function
prepsources() to select from the dataset
GNIPData the observations available within an extent of latitude and longitude that covers roughly Europe.
3.2.1 Aggregation for a single set of isoscapes
We will start by using the default aggregation scheme, which simply produces an overall average per location.
GNIPDataEUagg <- prepsources(data = GNIPData, long_min = -30, long_max = 60, lat_min = 30, lat_max = 70)
Let us now visualise the beginning of the dataset:
As you can see, you now have observation for each location. The column
mean_source_value gives the average across months and years. The column
var_source_value gives the variance between monthly measurements spanning across years.
3.2.2 Aggregation for multiple sets of isoscapes
Depending on what you want to do, you may need to prepare the data for multiple sets of isoscapes. This is for example the case if you want to ultimately build a yearly isoscape for which the weight of the data from each month is being controlled, as it is the case for the widely use annual precipitation-weighted isoscape. We will choose this example and thus prepare the data in view of building 12 sets of isoscapes (one for each month).
This can be done again with the use of the function
prepsources() using the argument
split_by = "month":
GNIPDataEU12agg <- prepsources(data = GNIPData, long_min = -30, long_max = 60, lat_min = 30, lat_max = 70, split_by = "month")
Let us visualise the first 15 lines of the dataset:
head(GNIPDataEU12agg, n = 15)
As you can see, you now have observation for each month. The column
mean_source_value gives the average for each month across the years. The column
var_source_value gives the between-year variance in monthly measurements.
3.3 Fitting the geostatistical models
Isoscapes are predictions stemming from statistical models. In this section we will show you how to fit such models based on the dataset we prepared in section 3.2. We refer to set of isoscapes because IsoriX does not create a single isoscape predicting the spatial distribution of isotopes but it also creates other isoscapes related to the main isoscape. For example, it also creates an isoscape indicating how reliable the predictions are in any particular locations. If a single dataset is being used, then a single set of isoscapes need to be built. Instead, if several datasets have been prepared (see section 3.2.2), then you will have to accordingly build multiple sets of isoscapes.
3.3.1 Fitting the models for a single set of isoscapes
To fit the geostatisical model required to build a single set of isoscapes, you need to use the function
isofit(). This function has several parameters that can be adjusted to fit different models (see
?isofit). Here we will consider two fixed effects (the elevation and the absolute latitude) on top of the default setting:
EuropeFit <- isofit(data = GNIPDataEUagg, mean_model_fix = list(elev = TRUE, lat_abs = TRUE))
3.3.2 Fitting the models for a multiple set of isoscapes
If you have created multiple sets of data, you need to use the function
isomultifit() for fitting one set of isoscapes per set of data. As for
isofit(), several parameters can be adjusted to fit different models, but we will again use here the default settings of the function.
EuropeFit12 <- isomultifit(data = GNIPDataEU12agg, mean_model_fix = list(elev = TRUE, lat_abs = TRUE))
For the time being this function will fit sequentially the sets of isoscapes. So for the example used here, it means that 12 sets of isoscapes are fitted sequentially which thus takes more computing time than building a single set of isoscape. In the near future we will however revise the internal structure of this function for it to be able to use several CPUs simultaneously and thus make profit of IsoriX option used to define the number of CPUs allowed for IsoriX to work with (see section 2.5).
3.4 Saving the fitted models
Fitting models can take substantial computing time, depending on the amount of data you have and on the complexity of the models to be fit. Therefore, you may want to save the fitted models in order to be allowed to reuse them later (even after having opened a new R session), or in order to send them to your colleagues.
This can be done as follow:
save(EuropeFit, file = "EuropeFit.rda", compress = "xz") save(EuropeFit12, file = "EuropeFit12.rda", compress = "xz")
save() will (by default) store your R object in a file that can be found in your working directory. To use
save(), we must provide the object you want as a first argument of the function. Then,
file = defines the name of the file that will store the R object in your hard drive. You can also include a path to this name so to store the file wherever you want. This name can be different from the name object but naming the file as the object allows you to remember what the name of the stored object is. The last argument
compress = is optional; it allows for the creation of smaller files without loosing any content, so we always use it.
For loading a saved object (in a new session of R for example), just use the function
load() as follows (but make sure the saved object is in your working directory or include the path to the file names):
load(file = "EuropeFit.rda") load(file = "EuropeFit12.rda")
Be carefull, we do not recommend you to reused saved object after updating either IsoriX, or spaMM, or both. If you update, the best practice to make sure that every thing is working properly is to refit all your models. By doing this you may also benefit from potentially new optimisations we would have implemented.
3.5 Examining the fitted models
3.5.1 Plotting basic information about the models
You can display some information about the model fits by typing:
In the panel produced, the left column shows the relationship between the observed and predicted response (top) and the variation in spatial autocorrelation with the distance between location (bottom) captured by the model for the fit called
mean_fit, which corresponds to the fit of the mean isotopic values. The right column shows the same information for the fit called
disp_fit, which corresponds to the fit of the residual dispersion variance in the isotope values. On the first row you can see points distributed practically along the 1:1 diagonal. A different slope would suggest a high residual variance of the data. We do not expect the points to fall exactly on the line because the model fit does not attempt to predict perfectly the observations used during the fit. Instead, the model fit produces a smooth surface that aims at reaching a maximum predictive power for locations not considered during the fit. The second row gives you an idea of the strength of the spatial autocorrelation captured by the models. Here the results suggest that the autocorrelation is very strong. Not considering this autocorrelation would thus lead here to a poor approximation of the isoscape and resulting assignments.
3.5.2 Examining the summary tables
To simply explore the fitted models you can simply type their name and the information of each fit will be displayed:
## ## ### spaMM summary of the fit of the mean model ### ## ## formula: mean_source_value ~ 1 + elev + lat_abs + (1 | source_ID) + Matern(1 | ## long + lat) ## REML: Estimation of lambda, phi and corrPars by REML. ## Estimation of fixed effects by ML. ## Family: gaussian ( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 65.0449 3.411e+01 1.907 ## elev -0.0106 8.133e-04 -13.033 ## lat_abs -2.3376 4.467e-01 -5.233 ## --------------- Random effects --------------- ## Family: gaussian ( link = identity ) ## Distance(s): Earth ## --- Correlation parameters: ## 2.nu 2.rho ## 0.3198206566 0.0000302894 ## --- Variance parameters ('lambda'): ## lambda = var(u) for u ~ Gaussian; ## source_ID : 2.93e-10 ## long + lat : 986.9 ## --- Coefficients for log(lambda): ## Group Term Estimate Cond.SE ## source_ID (Intercept) -21.951 782.06189 ## long + lat (Intercept) 6.895 0.09591 ## # of obs: 327; # of groups: source_ID, 327; long + lat, 326 ## ------------- Residual variance ------------- ## Prior weights: 336 12 10 5 12 ... ## phi was fixed by an offset term: "phi" ~ 0 + offset(pred_disp) ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): -1103.066 ## p_beta,v(h) (ReL): -1104.904 ## lambda leverages numerically 1 were replaced by 1 - 1e-8 ## NULL ## ## ## ### spaMM summary of the fit of the residual dispersion model ### ## ## formula: var_source_value ~ 1 + (1 | source_ID) + Matern(1 | long + lat) ## Estimation of corrPars and lambda by Laplace REML approximation (p_bv). ## Estimation of fixed effects by Laplace ML approximation (p_v). ## Estimation of lambda by 'outer' REML, maximizing p_bv. ## Family: Gamma ( link = log ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 5.868 1.973 2.974 ## --------------- Random effects --------------- ## Family: gaussian ( link = identity ) ## Distance(s): Earth ## --- Correlation parameters: ## 2.nu 2.rho ## 3.158598e-01 1.477985e-05 ## --- Variance parameters ('lambda'): ## lambda = var(u) for u ~ Gaussian; ## source_ID : 1e-06 ## long + lat : 4.534 ## # of obs: 322; # of groups: source_ID, 322; long + lat, 321 ## -- Residual variation ( var = phi * mu^2 ) -- ## Prior weights: 335 11 9 4 11 ... ## phi was fixed to 2 ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): -2068.324 ## p_beta,v(h) (ReL): -2066.725 ## NULL ## ## [models fitted with spaMM version 2.4.35] ## ## NULL
EuropeFit created in section 3.3.1 is a list containing the two fits aforementioned:
disp_fit. This is why you see information about those two corresponding fits. These summary tables are created behind the scenes by spaMM, so, learning a little about spaMM and mixed models is useful to derive meanings from these outputs. (We are also planning to write an extensive description of model outputs to guide you.)
3.5.3 Doing more stuff with the fitted models
Each of model fit is an object of class
HLfit which has been created by spaMM. You can thus directly manipulate those fits using spaMM for advance usage. For example, if you want to compare different models using information criterion, you can call the
AIC() function implemented by spaMM by simply typing:
## marginal AIC: 2216.1319 ## conditional AIC: 1840.7183 ## dispersion AIC: 2213.8074 ## effective df: 106.5613 ## marginal AIC: conditional AIC: dispersion AIC: ## 2216.1319 1840.7183 2213.8074 ## effective df: ## 106.5613
Note that we re-exported some spaMM functions for you to use without the need to load the package spaMM (e.g.
AIC.HLfit() which is implicitly here), but you will have to do
library(spaMM) to access all spaMM functionalities.
3.6 Building the isoscapes
3.6.1 Prepare the structural raster
To build the isoscape using the geostatistical models fitted above, you need to provide the function
isoscape() with a structural raster that contains the location at which we want to predict the isotopic values, as well as the values for the covariates at such locations. We will thus start by preparing such raster. Here, as a basis for the structural raster, we will download a high resolution elevation raster from Internet which we have prepared for you. You can easily download it from R using our function
We then import the high resolution elevation raster and transform it into an R object of class
RasterLayer object using the package raster:
## we import the high resolution raster ElevWorld <- raster("gmted2010_30mn.tif") ## we check that the import worked ElevWorld
## class : RasterLayer ## dimensions : 20880, 43200, 902016000 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : -180.0001, 179.9999, -90.00014, 83.99986 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 ## data source : /Users/alex/Dropbox/Boulot/Mes_projets_de_recherche/R_packages/IsoriX_project/gmted2010_30mn.tif ## names : gmted2010_30mn ## values : -430, 8625 (min, max)
The information for the field
data source shows where the elevation data are located on your computer. The content will be thus be different for you. The other fields should however be identical.
Then, you may need to crop and resize this raster to the appropriate size. Cropping allow to define a rectangular area over which we will predict the isotope values. Some thinking must be put into that. If the area is too small, you may miss the real origin of your migratory organisms during the assignment step. If the area is too large, you may extrapolate and the model will thus create unreliable predictions. Here, the downloaded structural raster covers the entire planet but we will crop it to the limit of Europe which we also used to select our source data (see section 3.2).
You can do this easily by using our function
prepraster(). To crop the elevation raster to a particular size, you can either provide a set of coordinates to the function by means of the argument
manual_crop. Alternatively, you can simply provide a fitted model to the function
prepraster() by means of the argument
isofit. In this latte case, the function will determine the coordinates required for the cropping automatically, from the coordinates of the weather stations used during the fitting procedure. We choose here to use this latter alternative:
ElevEurope <- prepraster(raster = ElevWorld, isofit = EuropeFit, aggregation_factor = 4)
You may see that we also choose here to reduce the resolution of the elevation raster by choosing an aggregation factor of 10. The lower the aggregation factor, the higher the resolution of the elevation raster will be, but the longest the computation time and the largest the map produced will also be. We recommend to use the lowest aggregation factor that you hardware and patience can handle. Here we set the
aggregation_factor to 4 to speed up the example, but you should use lower number for real application.
levelplot(ElevEurope, margin = FALSE, main = "Structural raster") + layer(sp.polygons(CountryBorders)) + layer(sp.polygons(OceanMask, fill = "cyan"))
3.6.2 Predicting the isoscapes
We will now build the isoscapes using the function
isoscape(). The function simply takes the structural raster and the fitted model as arguments:
EuropeIsoscape <- isoscape(raster = ElevEurope, isofit = EuropeFit)
isoscape() creates 8 different rasters that correspond to different aspects of the isoscape (see
?isoscape() for details).
We can check which rasters have been created by typing:
## class : RasterBrick ## dimensions : 1283, 2787, 3575721, 8 (nrow, ncol, ncell, nlayers) ## resolution : 0.03333333, 0.03333333 (x, y) ## extent : -31.55847, 61.34153, 28.13319, 70.89986 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : in memory ## names : mean, mean_predVar, mean_residVar, mean_respVar, disp, disp_predVar, disp_residVar, disp_respVar ## min values : -125.69760517, 5.00833717, 66.33667585, 86.91057481, 66.33667585, 0.01068774, 2.00000000, 118.45584027 ## max values : 5.729769e+00, 2.242668e+02, 1.621787e+03, 1.635910e+03, 1.621787e+03, 6.059266e-01, 2.000000e+00, 5.450010e+05
3.6.3 More complex isoscapes? The example of building the precipitation-weighted annual average isoscapes
3.7 Plotting the isoscapes
One particular strength of IsoriX is the ability to produce nice looking and highly customisable plots of the isoscapes and assignment maps. All plotting functions in IsoriX present default settings so that a simple call to
plot() on your object is sufficient to generate a nice looking plot.
For example, here you can obtain the main isoscape by typing:
This shows the prediction of the average isotope value in space (i.e. the point predictions).
3.7.1 Other isoscapes
You can also plot any of the other isoscapes by specifying the name of the raster using the argument
which. As examples, we are now going to plot the residual variation in isotope values predicted at each location, and the prediction variance:
plot(EuropeIsoscape, which = "mean_residVar")
This first isoscape quantifies the temporal variation at each location. Here it is the between month variance spanning across years but it will not always be so as it depends on what how you aggregated the data (see section 3.2). Interestingly, you can see that the variance is large and spatially structured. Believe or not but most alternative method of assignment consider such variance to be equal to zero!
plot(EuropeIsoscape, which = "mean_predVar")
This second isoscape quantifies the uncertainty in your point predictions. As you can see, the uncertainty increases when you go far from the source location. This isoscape can help to update the design of the data collection. It is also used internally when performing assignment. Indeed, if you are very uncertain of what the average isotope value is in a location, then you cannot rule out that any organism may have come from that location. It is unfortunate but an inescapable reality.
3.7.2 Fiddling with the plots
If you want to change anything on the existing plot, have a look at
?plots. The help file should provide you with all the details you need to customise your plots. Here is are a couples of example:
plot(EuropeIsoscape, sources = list(pch = 3, col = "orange"), borders = list(col = "white"), mask = list(fill = "darkgrey"), palette = list(range = c(-130, 10), step = 1, n_labels = 10, fn = "rainbow"))
plot(EuropeIsoscape, title = "H Isoscape", sources = list(draw = FALSE), borders = list(borders = NULL), mask = list(fill = "black"), palette = list(range = c(-130, 20), step = 30, fn = NULL))
For example, you can see that it is possible to provide a function defining the colours using the argument
fn from the list
palette. Note that when it is set to
NULL, the famous palette viridis is used instead of our default palette.
If you want to add things on top of the plots, you can do it as you would do it for any plot created using the R graphical package lattice. The package lattice is very powerful but not always very easy to use. We may also provide you with an implementation of IsoriX around the now widely used package ggplot2 but that would imply a lot of work so we are still thinking about it…
We will also add plotting tutorials in this bookdown.
3.7.3 Why you should save plots or how to remove white lines in the middle of the plots?
Displaying the plot directly using R is very inefficient when it comes to high resolution isoscapes. It may take a very long time and it may not fully work. For example we have noticed that you often have display artefact showing some random white lines. The way out that we found is to directly create files storing the images using the package Cairo.
First you must make sure Cairo is installed and can be loaded:
If it complains try to see why. If you have not installed it, just install the package. One noticeable source of trouble comes with MacOS. On this operating system, the package Cairo does not always install/load successfully. Problems happen when the program xquartz is not present on the system. This program used to be shipped by defaults on old version of MacOS but it is no longer the case. So if you are in trouble, please start by installing xquartz (outside R).
If Cairo is installed and loaded successfully, typing
?Cairo will show you all possible file formats you can save your plots into (png, jpg, tiff, pdf…).
Here we will show how to save our isoscape both as a png and as a pdf.
Let us start by creating a PNG file with the main isoscape:
CairoPNG(filename = "output/Myisoscape.png", height = 1080, width = 1920, res = 100) plot(EuropeIsoscape) dev.off()
So you see, it is simple isn’t it! You first initialize the creation of the plot with the function
CairoPNG(), then you call your plot, then you tell your computer that you are done with
You just need to specify the
filename, and the resolution of the file. Concerning the file name, if like us, you don’t indicate the path in the
filename. The file will be now in your working directory which you can easily see by typing
getwd()! Go in your working directory and open it. You should see a nice plot.
Concerning the resolution, the height and width are here considered to be in pixels (default setting, you can choose other units if you want using the argument
units). Here we chose the so-called Full-HD or 1080p resolution (1080x1920) because we wanted to observe the isoscape carefully on a monitor of that resolution. If your screen is Full-HD, try it and display the plot in full screen to get better results (if the plot does not match the definition of your screen, things can get ugly), if your screen is not, try another resolution. If you don’t know what resolution your screen has, you can visit http://www.whatismyscreenresolution.com/.
res is very useful as it rescales the line and fonts in the plot. So if every thing is too small increase the value and if everything looks too bold and ugly, decreases it.
And for a PDF it is very similar:
CairoPDF(file = "output/Myisoscape.pdf", height = 10, width = 15) plot(EuropeIsoscape) dev.off()
The arguments slightly differ with the function creating PNGs. The argument setting the name of the file is now called
file and not
filename, the resolution has now to be provided in inches and the argument
res does not exist for pdfs, but default results are usually fine.
If your goal was to produce isoscapes, you should be done by now. The following steps of this tutorial are for users interested in inferring the origin of organisms based on their isotopic signature.