Chapter 7 Analysis covariates

Once we have created the building blocks for our data analysis dataframes, we must bring in the variables which will be used in the modelling steps. It is important to not that there are millions of ways to add covariates - both in terms of how you do it, and where you derive the data from. The covariates you use will depend on the questions you have, and the context of your survey. The examples provided here are not comprehensive and serve only as a guide!

Create a new .R script

Call it 02_example_covariates.R.

Load the required packages

library(kableExtra);library(dplyr); library(sf); library(MODISTools); library(lubridate); library(corrplot); library(traitdata); library(terra); library(osmdata); library(elevatr)

We can simplify the covariate options we have available into two distinct categories:

- Species traits

The traits are species-level covariates which we think are important in structuring their responses to other covariates, such as human modification.

# Start by reading in your species list
sp_summary <- read.csv("data/processed_data/AlgarRestorationProject_species_list.csv", header=T)

- Location-level covariates

Location-level covariates are characteristics of the camera locations which are either fundamental to your question (such as the habitat type, degree of human modification, or distance to the nearest road), or they are things you are not directly interested in but must account for in your analyses. The way we derive and treat these variables are identical however.

locs <- read.csv("data/processed_data/AlgarRestorationProject_camera_locations.csv", header=T)

7.1 Species traits

It is easier than ever before to add trait data to your species lists, particularly with the advent of R packages which pool multiple data sources such as the traitdata database, which to date, compiles data from 32 different sources.

Below we use this package to add trait data to the project species list:

# This package isn't available on Cran, so we must use the remotes package
#library(remotes)
#remotes::install_github("RS-eco/traitdata", build_vignettes = T, force=T)
# Load the library
library(traitdata)

To pull the data for a specific database we use the following code:

data("elton_mammals")

To explore the full list of available datasets click this link.

Let’s take a look at what categories we have available to us:

 head(elton_mammals) %>% kbl() %>% scroll_box(height = "200px") %>%
  kable_paper("striped", full_width = F)
MSW3_ID Genus Species Family Diet.Inv Diet.Vend Diet.Vect Diet.Vfish Diet.Vunk Diet.Scav Diet.Fruit Diet.Nect Diet.Seed Diet.PlantO Diet.Source Diet.Certainty ForStrat.Value ForStrat.Certainty ForStrat.Comment Activity.Nocturnal Activity.Crepuscular Activity.Diurnal Activity.Source Activity.Certainty BodyMass.Value BodyMass.Source BodyMass.SpecLevel Full.Reference scientificNameStd
1 Tachyglossus aculeatus Tachyglossidae 100 0 0 0 0 0 0 0 0 0 Ref_1 ABC G A 1 1 0 Ref_1 ABC 3025.00 Ref_117 1 Nowak R.M. (1999). Walker’s mammals of the world. Sixth edition edn. The Johns Hopkins University Press, Baltimore, Maryland Tachyglossus aculeatus
2 Zaglossus attenboroughi Tachyglossidae 100 0 0 0 0 0 0 0 0 0 Ref_65 ABC G A 1 0 0 Ref_1 ABC 8532.39 Ref_2, Ref_3 0 Leary, T., Seri, L., Flannery, T., Wright, D., Hamilton, S., Helgen, K., Singadan, R., Menzies, J., Allison, A., James, R., Aplin, K., Salas, L. & Dickman, C. 2008.�Zaglossus attenboroughi. In: IUCN 2010. IUCN Red List of Threatened Species. Version 2010. Zaglossus attenboroughi
3 Zaglossus bartoni Tachyglossidae 100 0 0 0 0 0 0 0 0 0 Ref_2 D1 G A 1 0 0 Ref_1 ABC 7180.00 Ref_131 1 Genus Average Zaglossus bartoni
4 Zaglossus bruijni Tachyglossidae 100 0 0 0 0 0 0 0 0 0 Ref_1 ABC G A 1 0 0 Ref_1 ABC 10139.50 Ref_117 1 Nowak R.M. (1999). Walker’s mammals of the world. Sixth edition edn. The Johns Hopkins University Press, Baltimore, Maryland Zaglossus bruijni
5 Ornithorhynchus anatinus Ornithorhynchidae 80 0 0 20 0 0 0 0 0 0 Ref_1 ABC G A 1 1 1 Ref_1 ABC 1484.25 Ref_117 1 Nowak R.M. (1999). Walker’s mammals of the world. Sixth edition edn. The Johns Hopkins University Press, Baltimore, Maryland Ornithorhynchus anatinus
6 Caluromys philander Didelphidae 20 0 0 0 10 0 20 0 10 40 Ref_1 ABC Ar A 1 1 0 Ref_1 ABC 229.25 Ref_117 1 Nowak R.M. (1999). Walker’s mammals of the world. Sixth edition edn. The Johns Hopkins University Press, Baltimore, Maryland Caluromys philander

Lets make a new column sp which matches the species column in our ‘sp_summary’ dataset. We will use this as the “key” variable to extract the trait data.

elton_mammals$sp <- paste0(elton_mammals$Genus,"." ,elton_mammals$Species)

We do not want to take all of the trait data, so lets subset to BodyMass.Value and the activity data Activity.Nocturnal Activity.Crepuscular Activity.Diurnal.

tmp <- elton_mammals[, c("sp","BodyMass.Value", "Activity.Nocturnal", "Activity.Crepuscular",   "Activity.Diurnal")]

# Lets rename the columns to make them more usable
tmp <- tmp %>% rename(
              mass_g = BodyMass.Value,
              act_noct = Activity.Nocturnal,
              act_crep = Activity.Crepuscular,
              act_diur = Activity.Diurnal)

sp_summary <- left_join(sp_summary, tmp)

And then check our output:

sp_summary %>% kbl() %>% scroll_box(height = "200px") %>%
  kable_paper("striped", full_width = F)
class order family genus species sp common_name mass_g act_noct act_crep act_diur
Mammalia Artiodactyla Cervidae Alces alces Alces.alces moose 356998.16 1 1 0
Mammalia Artiodactyla Cervidae Cervus canadensis Cervus.canadensis elk NA NA NA NA
Mammalia Artiodactyla Cervidae Odocoileus virginianus Odocoileus.virginianus white-tailed deer 55508.56 1 1 0
Mammalia Artiodactyla Cervidae Rangifer tarandus Rangifer.tarandus caribou 86033.98 0 0 1
Mammalia Carnivora Canidae Canis latrans Canis.latrans coyote 13406.33 1 1 0
Mammalia Carnivora Canidae Canis lupus Canis.lupus gray wolf 32183.33 1 1 0
Mammalia Carnivora Canidae Vulpes vulpes Vulpes.vulpes red fox 5476.17 1 1 0
Mammalia Carnivora Felidae Lynx canadensis Lynx.canadensis canada lynx 9373.25 1 0 0
Mammalia Carnivora Mustelidae Lontra canadensis Lontra.canadensis river otter 8087.42 1 1 0
Mammalia Carnivora Mustelidae Martes americana Martes.americana american marten 1250.00 1 0 0
Mammalia Carnivora Ursidae Ursus americanus Ursus.americanus black bear 99949.36 1 0 0
Mammalia Lagomorpha Leporidae Lepus americanus Lepus.americanus snowshoe hare 1710.02 1 0 0
Mammalia Lagomorpha Leporidae Oryctolagus cuniculus Oryctolagus.cuniculus rabbit 1832.22 1 0 0
Mammalia Rodentia Sciuridae Tamiasciurus hudsonicus Tamiasciurus.hudsonicus red squirrel 201.17 0 0 1

If there are any NA’s, it could be for several reasons:

  1. There is no trait data for that species - in this case you could either:
  • leave them as NA’s (excluding them from later analyses) or if you are lucky, your analysis framework might be able to accommodate missing trait data

  • Give the species the mean values obtained from other species in its genus

  1. There is a mismatch in taxonomic resolution - you are working with a subspecies that isn’t recognized. This is the case here with elk! Lets replace it with the data for (Cervus elaphus)
sp_summary[sp_summary$sp=="Cervus.canadensis", c("mass_g", "act_noct","act_crep","act_diur")] <- 
    elton_mammals[elton_mammals$sp=="Cervus.elaphus", c("BodyMass.Value", "Activity.Nocturnal", "Activity.Crepuscular", "Activity.Diurnal")]

Whatever you do, remember to report it in your methods section!

Let’s save our species list for a rainy day!

write.csv(sp_summary, paste0("data/processed_data/", locs$project_id[1],"_species_list.csv"), row.names = F)

7.2 Camera station covariates

It is common to have a suite of covariates which you would like to investigate the effects of in your datasets. These could take the form of habitat designations or treatment types. These may already be included with your deployment data, or you may need to derive them from a variety of remote sources. In their simplest form, these variable are time invariant (they do not change), however you may have variables which change through time as well (we discuss these at the end). In the following steps, we walk through the process of manipulating and deriving example covariates.

For the time invariant covariates, we will add them to our locs dataframe imported above.

7.2.1 Locally collected covariates

You may have collected some data in the field when deploying or checking your camera traps, and kept that data separate from your camera trap data (e.g. vegetation assessments). Provided that the naming convention you gave to these dataframes is the same as in your camera data (e.g. the location is in a column called placename) - you can do a ’left_join()` to merge the two datasets.

Import a sample set of local covariates:

local_covs <- read.csv("data/raw_data/example_covariates/example_dataframe.csv")

Lets take a look at the data structure:

placename line_of_sight_m
ALG001 137.12500
ALG002 131.52778
ALG003 353.65833
ALG004 158.04167
ALG005 305.81944
ALG006 60.12500
ALG007 310.58333
ALG008 112.75000
ALG009 299.02778
ALG010 102.30556
ALG011 223.56944
ALG012 140.91667
ALG013 394.56944
ALG014 196.87500
ALG015 163.11111
ALG016 116.11111
ALG017 138.19444
ALG018 304.29167
ALG019 330.97222
ALG020 204.40278
ALG021 264.94444
ALG022 229.13889
ALG023 218.29167
ALG024 425.43056
ALG025 56.97222
ALG026 200.05556
ALG027 252.95833
ALG028 277.50000
ALG029 206.52778
ALG030 43.38889
ALG031 334.27778
ALG032 83.00000
ALG033 165.00000
ALG034 337.79167
ALG035 439.61111
ALG036 62.69444
ALG037 392.61111
ALG038 352.75000
ALG039 339.76389
ALG040 182.75000
ALG041 109.25000
ALG042 219.56944
ALG043 62.41667
ALG044 374.26389
ALG045 294.83333
ALG046 34.50000
ALG047 363.80556
ALG048 392.93056
ALG049 80.77778
ALG050 139.36111
ALG051 208.12500
ALG052 13.95833
ALG053 99.30556
ALG054 35.47222
ALG055 189.06944
ALG056 55.41667
ALG057 93.04167
ALG058 80.91667
ALG059 41.00000
ALG060 189.90278
ALG061 11.16667
ALG062 16.00000
ALG063 28.94444
ALG064 34.27778
ALG065 20.11111
ALG066 36.94444
ALG067 188.83333
ALG068 72.27778
ALG069 22.85556
ALG070 28.61111
ALG071 20.05556
ALG072 52.44444
ALG073 55.94444

It is a dataframe where the survey locations are rows and the local covariates, in this case line_of_sight_m, are columns.

To add this data to our station data, we use a left_join() operation from the dplyr() package. It uses a key variable which is common in both data frames to add data from the “right-hand side” to the rows in the “left-hand side” which are not already present. Any rows present in the right-hand side which are not in the left-hand side will be skipped.

locs <- left_join(locs, local_covs)   # From the dplyr package

For more examples of joins using dplyr() see: https://dplyr.tidyverse.org/reference/mutate-joins.html

7.2.2 Remotely collected covariates

To exploit remotely collected data sources we need to use a package to help us with spatial data.

7.2.2.1 Key skills: sf package

The most intuitive package to learn spatial operations in R is the simple features package (a.k.a. sf). sf allows you to use spatial dataframes in the style of a typical R dataframe. We use this package frequently as it allows you to rapidly change coordinate projection systems (e.g. lat/long to UTM) and rapidly perform spatial operations.

Lets convert our “normal” dataframe to an sf dataframe:

locs_sf <- st_as_sf(locs,                              # We specify the dataframe 
                    coords=c("longitude", "latitude"), # The XY coordinates
                    crs=4326)                          # And the projection code

What does an sf object look like? Like a normal dataframe but with a weird header:

locs_sf
## Simple feature collection with 38 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -112.6467 ymin: 56.15983 xmax: -112.3848 ymax: 56.49352
## Geodetic CRS:  WGS 84
## First 10 features:
##                 project_id placename feature_type line_of_sight_m
## 1  AlgarRestorationProject    ALG027     HumanUse       252.95833
## 2  AlgarRestorationProject    ALG029     HumanUse       206.52778
## 3  AlgarRestorationProject    ALG031     HumanUse       334.27778
## 4  AlgarRestorationProject    ALG032     HumanUse        83.00000
## 5  AlgarRestorationProject    ALG035     HumanUse       439.61111
## 6  AlgarRestorationProject    ALG036     NatRegen        62.69444
## 7  AlgarRestorationProject    ALG037     HumanUse       392.61111
## 8  AlgarRestorationProject    ALG038     HumanUse       352.75000
## 9  AlgarRestorationProject    ALG039     HumanUse       339.76389
## 10 AlgarRestorationProject    ALG043     NatRegen        62.41667
##                      geometry
## 1   POINT (-112.4735 56.3328)
## 2  POINT (-112.5483 56.39474)
## 3   POINT (-112.482 56.30899)
## 4  POINT (-112.3968 56.40197)
## 5  POINT (-112.4761 56.38428)
## 6  POINT (-112.4058 56.23178)
## 7  POINT (-112.4449 56.27898)
## 8  POINT (-112.4792 56.27039)
## 9  POINT (-112.4094 56.30127)
## 10 POINT (-112.5842 56.38715)

That header is important - it tells you the type of data you have (lines, points, polygons etc), and the projection information (CRS).

We like using sf as it is very easy to transform coordinates into different projections using st_transform(). Use the website epsg.io to find the CRS codes for the projection you want - e.g. UTM 12N = 26712, then plug it into the following:

locs_utm <- st_transform(locs_sf, crs=26712)

Check the header of locs_utm and you should see that the CRS has changed!

Plotting sf objects is a little bit odd at first. If you try to plot them normally you get lots of replicated plots (one for each column) - try it:

plot(locs_utm)

It can be useful as it varied the colors based on the properties of the column. Typically, however, we just want to plot the points themselves. We do that by wrapping the object in st_geometry() this just extracts the geometry of the object.

plot(st_geometry(locs_utm))
axis(1)
axis(2)

We will use st_geometry() frequently below.

For more in depth information of sf functionality see: https://r-spatial.github.io/sf/articles/sf1.html

7.2.3 Extracting data from local rasters

Often we have raster data layers stored which we would like to link to our camera locations. We have included one such example here, a raster which reflects the depth from the soil surface to the water table - a proxy for habitat type in this study site. The layer comes from the 1m Wet Area Mapping (WAM) layer:

White, Barry, et al. “Using the cartographic depth-to-water index to locate small streams and associated wet areas across landscapes.” Canadian Water Resources Journal 37.4 (2012): 333-347.

NOTE the raster has been down scaled to reduce its size for this course - it is no longer at 1m resolution.

The only time we deviate from the sf package is to deal with rasters. Raster objects in R are processed really slowly, especially if the raster is large. So instead we use the terra package.

library(terra)
# Import the example raster using the stars package
ras <- rast("data/raw_data/example_covariates/example_raster.tif")
# Covert your sf locations to the same projection as your raster then put it in terra `vect` format,
locs_terra <- locs_sf %>% 
                st_transform(crs=st_crs(ras)) %>% # change the projection to match the raster
                vect() # Turn it into a terra object

Lets check our layers match up!

plot(ras) # The terra package makes nice raster plots with legends
plot(locs_terra, add=T) # Add the survey locations as black dots

Great! Now lets buffer our camera locations by 250 meters, and take the average depth to water for each location:

# Buffer by 250m
locs_terra <- buffer(locs_terra,250)

# Extract the values to a temporary object - tmp 
tmp <- raster::extract(ras, locs_terra, fun=mean)

# Make a new column in locs_sf called water_depth_m
# They are ordered the same way so no need for a fancy join
locs_sf$water_depth_m <- tmp$Depth2WatAlgar

Finally, lets check the distribution of our data!

# Basic boxplot in base R
boxplot(locs_sf$water_depth_m)

Most locations are on the water table (lowland sites), others are above it (upload sites), and they have different vegetation characteristics in the field.

7.2.4 elevatr package

Camera studies often occur over a range of elevations - and we can quickly extract these elevations using the elevatr package and an sf dataframe.

library(elevatr)
locs_sf <- get_elev_point(locs_sf, 
                          src="aws", #Amazon Web Service Terrain Tiles - available globally
                          z = 12)  # z specifies the zoom level, the lower the value the faster the code runs, but the coarser the elevation values are

The src option specifies the sources of the DEM data. We use aws Amazon Web Service Terrain Tiles - which are available globally.

The z option specifies the resolution of the underlying DEM, the high the value, the more detailed it is. However, it will take longer to run so do not go crazy.

Let’s plot the output:

boxplot(locs_sf$elevation)

An elevation of ~ 500m was expected. Great!

If you want to download a full elevation raster for your area of interests, see the introduction to elevatr

7.2.5 Open Street Maps

Open Street Map (OSM) is an incredible resource for generating covariates for camera trap studies. For example, we might be interested in the distance to the nearest rivers, roads, or trails. All of these anthropogenic features are available in OSM!

CAREFUL OSM data is user contributed and often incomplete and patchy. Always plot your data and never assume it is complete without checking it first. For an example fo this see water bodies below.

First lets load the osmdata package.

library(osmdata)

The types of features we can extract using the osmdata package are listed here: https://wiki.openstreetmap.org/wiki/Map_features.

7.2.5.1 Highways

Camera trap projects are often interested in human disturbance, of which, highways are an important part.

Let’s start by defining our area of interest. All osmdata queries begin with a bounding box defining the area of the query:

# First buffer our points by 10km to create an area of interest (aoi)
aoi <- st_bbox(st_buffer(locs_sf, 10000)) # Units are in meters 

We then use this bounding box to return all of the features which cross into it:

highway <- opq(aoi) %>% #using the bounding box
           add_osm_feature(key="highway") %>% #extract all highway features
           osmdata_sf()  # convert them into simple features format

The data you extract is its own “class” of data made up from multiple data types:

str(highway)

Which looks very intimidating! However, the key thing is that it is made up of multiple data slices, each of which represents an sf dataset. Let’s take a look at three of these

  • $osm_points
  • $osm_lines
  • $osm_polygons
par(mfrow=c(1,3))
plot(st_geometry(highway$osm_points), main="osm_points")
plot(st_geometry(highway$osm_lines), main="osm_lines")
plot(st_geometry(highway$osm_polygons), main="osm_polygons")

The points or the lines datasets look must useful to us, there is nothing in the polygon layer.

Let’s use the lines element and add out camera stations:

par(mfrow=c(1,1))
plot(st_as_sfc(aoi))     # st_as_sfc created a polygon from a `bbox` object
plot(st_geometry(highway$osm_lines), add=T)
plot(st_geometry(locs_sf), col="red", add=T)

We can now calculate the distances from our cameras to these objects using the following codes:

st_nearest_feature gives us the index number of the feature which is closest to each station.

We can the use this to request the distance from that nearest feature to each camera station using st_distance. Which, put together, looks like:

# Create an index of the nearest object in `highway$osm_lines` to locs_sf
index <- st_nearest_feature(locs_sf, highway$osm_lines)

# Use that index to ask for the distance to that object
locs_sf$road_dist_m <- st_distance(locs_sf, highway$osm_lines[index,], 
                                   by_element=T) # Note `by_element=T` tells st_distance to evaluate things line by line. 

7.2.5.2 water bodies

We also might want to calculate the distances to the nearest water body, and important resource for wildlife. We can do that using the following:

water <- opq(aoi) %>%
           add_osm_feature(key="water") %>%
           osmdata_sf()

Lets check our data:

par(mfrow=c(1,3))
plot(st_geometry(water$osm_points), main="osm_points")
plot(st_geometry(water$osm_lines), main="osm_lines")
plot(st_geometry(water$osm_polygons), main="osm_polygons")

In this instance, the lines and the polygons are incomplete, out best bet is the points file!

index <- st_nearest_feature(locs_sf, water$osm_points)

locs_sf$water_dist_m <- st_distance(locs_sf, water$osm_points[index,], by_element=T) # Note `by_element=T` tells st_distance to evaluate things line by line. 

For more examples of using the osmdata package see: the projects github page

7.2.6 Vegetation productivity

7.2.6.1 MODISTools

MODIStools is an R interface to the MODIS Land Products Subsets web services. It allows for easy access to ‘MODIS’ time series directly to your computer! These are the data layers commonly used to extract normalized difference vegetation index (NDVI) and Enhanced Vegetation Index (EVI) information. When using MODIStools you should reference:

Hufkens (2022). The MODISTools package: an interface to the MODIS Land Products Subsets Web Services

Also click that link for more details on how to use it.

Let’s load the package:

library(MODISTools)

For MODIStools to work, we need to provide a dataframe with specific column names:

  • site_name - the placename
  • `lat’
  • ‘long’
modis_locs <- locs %>% 
  select("placename", "longitude", "latitude") %>% 
  rename(site_name=placename, lat=latitude, lon=longitude)

We can then look at the available bands for different products.

Two commonly used ones are MOD13Q1 for the derivation of NDVI/EVI, and MOD15A2H for the derivation of leaf area index (LAI).

# list available bands for a product
bands <- mt_bands(product = "MOD13Q1") #MOD15A2H
head(bands)
##                                     band                description
## 1          250m_16_days_blue_reflectance Surface Reflectance Band 3
## 2 250m_16_days_composite_day_of_the_year       Day of year VI pixel
## 3                       250m_16_days_EVI         16 day EVI average
## 4           250m_16_days_MIR_reflectance Surface Reflectance Band 7
## 5                      250m_16_days_NDVI        16 day NDVI average
## 6           250m_16_days_NIR_reflectance Surface Reflectance Band 2
##                    units    valid_range fill_value scale_factor add_offset
## 1            reflectance     0 to 10000      -1000       0.0001          0
## 2 Julian day of the year       1 to 366         -1         <NA>       <NA>
## 3   EVI ratio - No units -2000 to 10000      -3000       0.0001          0
## 4            reflectance     0 to 10000      -1000       0.0001          0
## 5  NDVI ratio - No units -2000 to 10000      -3000       0.0001          0
## 6            reflectance     0 to 10000      -1000       0.0001          0

When we run MODIStools the underlying algorithm chooses the best available pixel value from all the acquisitions from the 16 day period. The criteria used is lowest cloud cover, lowest satellite view angle, and the highest NDVI/EVI value.

# list available dates for a product at a location
dates <- mt_dates(product = "MOD13Q1", lat = modis_locs$lat[1], lon = modis_locs$lon[1]) #MOD15A2H

# Get the first and last date!
first(dates$calendar_date); last(dates$calendar_date)
## [1] "2000-02-18"
## [1] "2022-09-30"

In the interest of processing time, lets not pull the NDVI scores for the full date range. Instead, we will focus on mid summer in 2019.

Be patient, this might take a while!

site_ndvi <- mt_batch_subset(product = "MOD13Q1",
                              df=modis_locs,
                              band = "250m_16_days_NDVI",
                              start = "2019-07-01",
                              end = "2019-08-31",
                              km_lr = 0,         # Use these options if you want to buffer the value (km left)
                              km_ab = 0,         # Use these options if you want to buffer the value (km above)
                              internal = TRUE)

The raw output is somewhat intimidating:

site_ndvi[1:10, ] %>% 
  kbl() %>% 
  scroll_box(height = "300px") %>%
  kable_paper("striped", full_width = F)
xllcorner yllcorner cellsize nrows ncols band units scale latitude longitude site product start end complete modis_date calendar_date tile proc_date pixel value
1.1 -6933243.15 6263756.27 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.33280 -112.4735 ALG027 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019193 2019-07-12 h11v03 2019210000536 1 7918
2.1 -6933243.15 6263756.27 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.33280 -112.4735 ALG027 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019209 2019-07-28 h11v03 2019226013340 1 7794
3.1 -6933243.15 6263756.27 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.33280 -112.4735 ALG027 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019225 2019-08-13 h11v03 2019248132009 1 7777
4.1 -6933243.15 6263756.27 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.33280 -112.4735 ALG027 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019241 2019-08-29 h11v03 2019262164703 1 7341
1.11 -6926756.76 6270705.96 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.39474 -112.5483 ALG029 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019193 2019-07-12 h11v03 2019210000536 1 7983
2.11 -6926756.76 6270705.96 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.39474 -112.5483 ALG029 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019209 2019-07-28 h11v03 2019226013340 1 8134
3.11 -6926756.76 6270705.96 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.39474 -112.5483 ALG029 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019225 2019-08-13 h11v03 2019248132009 1 7933
4.11 -6926756.76 6270705.96 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.39474 -112.5483 ALG029 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019241 2019-08-29 h11v03 2019262164703 1 6749
1.12 -6938107.98 6261208.01 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.30899 -112.4820 ALG031 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019193 2019-07-12 h11v03 2019210000536 1 7553
2.12 -6938107.98 6261208.01 231.656358264 1 1 250m_16_days_NDVI NDVI ratio - No units 0.0001 56.30899 -112.4820 ALG031 MOD13Q1 2019-07-01 2019-08-31 TRUE A2019209 2019-07-28 h11v03 2019226013340 1 7986

So lets simplify it to the key elements of information and rename them to match our camera data where appropriate:

ndvi_simple <- site_ndvi %>% 
  select(   site, band, calendar_date, value) %>% 
  rename(placename=site)

ndvi_simple[1:10, ] %>% 
  kbl() %>% 
  scroll_box(height = "300px") %>%
  kable_paper("striped", full_width = F)
placename band calendar_date value
1.1 ALG027 250m_16_days_NDVI 2019-07-12 7918
2.1 ALG027 250m_16_days_NDVI 2019-07-28 7794
3.1 ALG027 250m_16_days_NDVI 2019-08-13 7777
4.1 ALG027 250m_16_days_NDVI 2019-08-29 7341
1.11 ALG029 250m_16_days_NDVI 2019-07-12 7983
2.11 ALG029 250m_16_days_NDVI 2019-07-28 8134
3.11 ALG029 250m_16_days_NDVI 2019-08-13 7933
4.11 ALG029 250m_16_days_NDVI 2019-08-29 6749
1.12 ALG031 250m_16_days_NDVI 2019-07-12 7553
2.12 ALG031 250m_16_days_NDVI 2019-07-28 7986

So we have multiple observations per site. Lets take an average and add it to our locs_sf dataframe.

tmp <- ndvi_simple %>%             #Take the NDVI layer
  group_by(placename) %>%          # Group observations by the placename
  summarize(mean_ndvi=mean(value)) # Take the mean of the values and call the new column `mean_ndvi`

# Add the new data to our locations dataframe
locs_sf <- left_join(locs_sf, tmp)

And check the output:

boxplot(locs_sf$mean_ndvi,
        ylab="Mean NDVI score",
        las=1)

It is possible to generate an NDVI score for each month that each camera is active, however that would take too long to produce for this course!

7.2.7 Digging deeper

If you want to dig into estimating NDVI metrics from camera trap viewshed, rather than from satellite data, check out the phenopix R package. It allows the user to extract visual information from time lapse images. It provides a quantitative daily measure of vegetation phenology at each site (e.g. green-up, senescence, snow cover).

Alberton, B. et al. 2017. Introducing digital cameras to monitor plant phenology in the tropics: applications for conservation. Perspect. Ecol. Conserv

Filippa, G. et al. 2017. phenopix: Process Digital Images of a Vegetation Cover. R package version 2.3.1.

The Phenopix package has a five step process:

  1. a region of interest (ROI) is identified;
  2. the red, green, and blue digital numbers from each image in the time series is extracted and an index of relative ‘greenness’ is computed and plotted from the digital numbers;
  3. the vegetation indices’ data points are filtered to remove inconsistencies;
  4. a curve is fit to the data and phenophases are determined from the curve;
  5. phenophase uncertainties are calculated.

To see an application and comparison of these metrics, we highly recommend that you check out Catherine Sun’s (WildCo alumni) paper on the subject:

Sun, Catherine, et al. “Simultaneous monitoring of vegetation dynamics and wildlife activity with camera traps to assess habitat change.” Remote Sensing in Ecology and Conservation 7.4 (2021): 666-684.

And the code associated with this publication on the WildCo GitHub Page

7.2.8 Google Earth Engine

BONUS CODE

Google Earth Engine is now available within the R environment using the incredible rgee package. Google Earth Engine is a game changer for accessing large freely available datasets, of the sort that might be used in data synthesis projects. The list of data sets you can access is here: Google Earth Engine Catalogue.

Why use rgee? Because it allows you to access vast datasets stored in the cloud, which you do not need to download on your computer! Its However, it can be tricky to set up - and well beyond the scope of this course! If you are feeling a little bit adventurous, you can follow the steps in this rgee vignette”:

Introduction to rgee

This thread also helped me gcloud issues

If you do manage to get it setup, it opens are world of possibilities!

Lets look at a rgee workflow for camera traps. First load the rgee package.

library(rgee)
ee_Initialize()
## ── rgee 1.1.5 ─────────────────────────────────────────────────────── earthengine-api 0.1.323 ── 
##  ✔ user: not_defined
##  ✔ Initializing Google Earth Engine:
 ✔ Initializing Google Earth Engine:  DONE!
## 
 ✔ Earth Engine account: users/cwbeirne 
## ────────────────────────────────────────────────────────────────────────────────────────────────

Next let’s check if python and R are communicating properly:

ee_check()

If they are - you should see something like this:

knitr::include_graphics("images/analysis_covariates/rgee_check.PNG")

7.2.8.1 Discrete example

The way rgee works is you first need to call a data layer, for example the 100m global landcover dataset.

#Land Cover
LC_dataset <- ee$Image("COPERNICUS/Landcover/100m/Proba-V-C3/Global/2019")$select('discrete_classification')

We can quickly plot the layer to see what it looks like before moving on:

Map$setCenter(-88.6, 26.4, 1);
Map$addLayer(LC_dataset, {}, "Land Cover");

We can then extract the values of that layer to our camera locations using ee_extract:

habs <- ee_extract(LC_dataset, locs_sf)

# Read in the LCC key
codes <- read.csv("data/raw_data/example_covariates/LCC_codes.csv")

# Use left join to label the codes
habs <- left_join(habs, codes)

# And update the locs_sf dataframe
locs_sf$lcc_habitats <- habs$hab_code

Let’s checkout the habitat designations:

table(locs_sf$lcc_habitats)
## 
##  closed_forest_deciduous_broad closed_forest_evergreen_needle 
##                              1                             29 
##            closed_forest_other   open_forest_evergreen_needle 
##                              1                              1 
##              open_forest_other                         shrubs 
##                              4                              2

Well that was very fast and very useful!

7.2.8.2 Continuous example

Let’s repeat this process with a continuous variable like biomass. We will use the Global Aboveground and Belowground Biomass Carbon Density Maps.

Lets read in the layer

biomass <- ee$ImageCollection("NASA/ORNL/biomass_carbon_density/v1");

# Specify the layer we want to plot
bio = biomass$first()

# Set up the colors for the visualization
viz <- list(
  bands="agb",
  max = 129,
  min = 0,
  palette = c('#0c0c0c', '#071aff', '#ff0000', '#ffbd03', '#fbff05', '#fffdfd')
)

# Plot the map
Map$addLayer(eeObject = bio, visParams = viz)

Extract the data using ee_extract:

tmp <- ee_extract(biomass, locs_sf)
## The image scale is set to 1000.
locs_sf$biomass_agb <- tmp$X2010_agb

Make a plot to see the distribution of values.

7.2.8.3 Other layers of interest

Below we highlight other layers we have used in different analyses.

Global Human Modification Index

Human Modification Index

hmi <- ee$ImageCollection("CSP/HM/GlobalHumanModification")

# select the first image of the collection which corresponds to the cumulative HM  
hm = hmi$first()
viz <- list(
  bands="gHM",
  max = 1,
  min = 0,
  palette = c('#0c0c0c', '#071aff', '#ff0000', '#ffbd03', '#fbff05', '#fffdfd')
)

Map$addLayer(eeObject = hm, visParams = viz)

Protected area status

WDPA: World Database on Protected Areas

Note - you have to zoom in to see the polygons!

gpa  <- ee$FeatureCollection("WCMC/WDPA/current/polygons")
# Plot it (note you have to zoom in to see the data)
Map$addLayer(gpa, {}, "default display") 

As ever, these codes just scratch the surface of the power of Google Earth Engine. See the rgee documentation for more ideas!

7.3 Convert and save your covariates

# Convert columns to numeric
locs_sf$road_dist_m <- as.numeric(locs_sf$road_dist_m)

# Convert it back to a dataframe
locs_sf$geometry <- NULL

locs <- left_join(locs, locs_sf)


# Write the dataset

write.csv(locs, paste0("data/processed_data/", locs$project_id[1],"_camera_locations_and_covariates.csv"), row.names=F)

7.4 Correlations between predictors

So we have used a variety of different techniques to generate covariates for our subsequent analyses. However, it is important to note that we cannot just through these variables into a model.

One way to check if your different variables are confound/correlated is using the corrplot package.

library(corrplot)

# First we need to create a correlation matrix between the different variables of interest
M <- cor(locs[, c("line_of_sight_m", "water_depth_m", "elevation",
                     "road_dist_m", "mean_ndvi")])

Now lets make the basic corrplot:

corrplot(M)

The cells denote pairwise correlations between the rows and the columns. The great thing about corrplot is customization option are near endless - see the corrplot vignette.

Let’s make a better, more informative, corrplot!

corrplot(M,                              #The correlation matrix we made
         method="color",                 # How we want the cells 
         type="upper",                   # Just show the upper part (it is usually mirrored)
         order="hclust",                 # Order the variables using the hclust method
         addCoef.col = "black",          # Add coefficient of correlation  
         tl.col="black", tl.srt=45,      # Control the text label color and rotation
         diag=F                          # Suppress the diagonal correlations (which are 1 anyway)
         )

In general there is very low correlation between our different predictors! If we were seeing pairwise correlations >0.7 we perhaps wouldn’t include those in the same model.