Chapter 6 Raster Data - Continuous Variables

Raster data are fundamentally different from vector data in that they are referenced to a regular grid of rectangular (usually square) cells. The spatial characteristics of a raster dataset are defined by its spatial resolution (the height and width of each cell) and its origin (typically the upper left hand corner of the raster grid, which is referenced to a location in a coordinate reference system). The raster data format has a number of advantages and limitations compared to vector data. Continuous variables such as elevation, temperature, and precipitation as well as categorical data such as discrete vegetation types and land cover classes can usually be stored and manipulated more efficiently in a raster format. On the contrary, the raster format can be imprecise and inefficient for storing point and line features such as plot locations, roads, and streams.

The raster package provides a variety of specialized classes and functions for importing, processing, analyzing, and visualizing raster datasets. There are several other packages that can handle raster data. For example, the sp package supports the SpatialGridDataFrame class for gridded datasets and spatstat supports the im class for pixel image objects. However, raster is by far the most flexible and powerful package available in R for handling gridded data. IN this chapter we will introduce the raster package along with the rasterVis package, which contains additional functions to support graphing and mapping of raster data.

library(sf)           # Classes a methods for spatial data - simple features
library(raster)         # Classes and methods for raster data
library(rgdal)          # functions for spatial data input/output
library(RColorBrewer) # Creates nice color schemes
library(rasterVis)    # Advanced plotting functions for raster objects
library(ggplot2)      # Functions for graphing and mapping

6.1 Importing Raster Data

A raster object can be created by calling the raster() function and specifying an external image file as an argument. In this example, a dataset of land surface temperature (LST) measured by the MODIS sensor on board the Terra satellite is imported into R as a raster object. Invoking the print() function for a raster object provides information about the dimensions of the raster grid, cell size, geographic location, and other details. The summary() function provides information about the statistical distribution of raster values.

lst <- raster("MOD11A2_2017-07-12.LST_Day_1km.tif")
class(lst)
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
lst
## class      : RasterLayer 
## dimensions : 1110, 3902, 4331220  (nrow, ncol, ncell)
## resolution : 0.009009009, 0.009009009  (x, y)
## extent     : -104.4326, -69.27943, 30, 40  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +no_defs 
## source     : E:/work/projects/gdswr-book/MOD11A2_2017-07-12.LST_Day_1km.tif 
## names      : MOD11A2_2017.07.12.LST_Day_1km 
## values     : 0, 65535  (min, max)
summary(lst)
##         MOD11A2_2017.07.12.LST_Day_1km
## Min.                                 0
## 1st Qu.                              0
## Median                           14981
## 3rd Qu.                          15148
## Max.                             16050
## NA's                                 0

There are a number of helper functions, shown below, that can be used to extract specific characteristics of the raster object.

# Number of raster calls
ncell(lst)
## [1] 4331220
# Number of rows in the raster grid
nrow(lst)
## [1] 1110
# Number of columns in the raster grid
ncol(lst)
## [1] 3902
# Number of layers in the raster object
nlayers(lst)
## [1] 1
# Coordinate reference system of the raster object
crs(lst)
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
# Extend of the raster object
extent(lst)
## class      : Extent 
## xmin       : -104.4326 
## xmax       : -69.27943 
## ymin       : 30 
## ymax       : 40
# Spatial resolution of the raster object
res(lst)
## [1] 0.009009009 0.009009009

The raster package includes functions and methods for visualizing the distribution of data stored in the raster. These are handy shortcuts if you want to take a quick look at the distribution fo data values stored in the raster object.

hist(lst)

density(lst)

Raster objects can be modified via subscripting and can be provided as input to mathematical expressions. The code below replaces the zero values in the LST raster with NA values, applies a scaling factor to the data to convert digital numbers to degrees Kelvin, and converts the temperature units from Kelvin to Celsius.

lst[lst == 0] <- NA
lst_c <- lst * 0.02 - 273.16
hist(lst_c)

density(lst_c)

The cellStats() function can be used to generate statistical summaries of the pixels in a raster layer.

cellStats(lst_c, stat = "mean")
## [1] 30.27687
cellStats(lst_c, stat = "min")
## [1] 1.66
cellStats(lst_c, stat = "max")
## [1] 54.76
cellStats(lst_c, stat = "sd")
## [1] 3.652403

Raster objects can be exported using the writeRaster() function. The format of the exported image is specified using the format argument. Common output formats are “GTiff” (GeoTiff), “ascii” (ESRI ASCII text ), “CDF”, (NetCDF) and “HFA” (ERDAS Imagine image). The overwite=TRUE argument will replace existing files with the same name

writeRaster(lst_c, filename = "MOD11A2_2017-07-12.LST_Day_1km_DegC.tif", format="GTiff", overwrite=TRUE )

6.2 Maps of raster data

The ggplot() function can be used to make maps with raster data, as well as vector data. However, ggplot() does not work directly on raster objects. It only works with inputs in the form of a data frame. For a raster dataset, the data frame will need to contain an x and y coordinate for each cell along with one or more variables that are measured for each cell. This is a highly inefficient way to store raster data. Because raster cells are all the same size, we don’t need really all of the x and y coordinates - we just need to know the location of one cell (usually called the origin) and we can then use this information to determine the coordinates of any other cell. On the other hand, converting rasters to data frames will allow us to use ggplot() for all our maps and charts, and is easier than learning a completely different graphing system just for rasters.

To make it easier to convert rasters to data frames, we can write a custom function to help automate the process. Having a function prevents us from having to type multiple lines of code over and over again. The custom function, rasterdf() takes a required argument, x, which is the raster that we want to convert to a data frame. An optional argument, aggregate, can be specified to downscale the raster before converting to a data frame.The code in the function extracts the x and y coordinates from the raster, extracts the data values from the raster, and then combines them into a data frame that contains the coordinates and the data.

The code below uses the function() function to create a function. The result is written to a new object called rasterdf, which belongs to the function class.

rasterdf <- function(x, aggregate = 1) {
  resampleFactor <- aggregate        
  inputRaster <- x    
  inCols <- ncol(inputRaster)
  inRows <- nrow(inputRaster)
  # Compute numbers of columns and rows in the new raster for mapping
  resampledRaster <- raster(ncol=(inCols / resampleFactor), 
                            nrow=(inRows / resampleFactor))
  # Match to the extent of the original raster
  extent(resampledRaster) <- extent(inputRaster)
  # Resample data on the new raster
  y <- resample(inputRaster,resampledRaster,method='ngb')

  # Extract cell coordinates into a data frame
  coords <- xyFromCell(y, seq_len(ncell(y)))
  # Extract layer names
  dat <- stack(as.data.frame(getValues(y)))
  # Add names - 'value' for data, 'variable' to indicate different raster layers
  # in a stack
  names(dat) <- c('value', 'variable')
  dat <- cbind(coords, dat)
  dat
}

class(rasterdf)
## [1] "function"
rasterdf
## function(x, aggregate = 1) {
##   resampleFactor <- aggregate        
##   inputRaster <- x    
##   inCols <- ncol(inputRaster)
##   inRows <- nrow(inputRaster)
##   # Compute numbers of columns and rows in the new raster for mapping
##   resampledRaster <- raster(ncol=(inCols / resampleFactor), 
##                             nrow=(inRows / resampleFactor))
##   # Match to the extent of the original raster
##   extent(resampledRaster) <- extent(inputRaster)
##   # Resample data on the new raster
##   y <- resample(inputRaster,resampledRaster,method='ngb')
## 
##   # Extract cell coordinates into a data frame
##   coords <- xyFromCell(y, seq_len(ncell(y)))
##   # Extract layer names
##   dat <- stack(as.data.frame(getValues(y)))
##   # Add names - 'value' for data, 'variable' to indicate different raster layers
##   # in a stack
##   names(dat) <- c('value', 'variable')
##   dat <- cbind(coords, dat)
##   dat
## }
lst_df <- rasterdf(lst_c, aggregate = 10)
summary(lst_df)
##        x                 y             value               variable    
##  Min.   :-104.39   Min.   :30.05   Min.   : 9.24   getValues(y):43290  
##  1st Qu.: -95.64   1st Qu.:32.48   1st Qu.:28.04                       
##  Median : -86.86   Median :35.00   Median :29.44                       
##  Mean   : -86.86   Mean   :35.00   Mean   :30.19                       
##  3rd Qu.: -78.07   3rd Qu.:37.52   3rd Qu.:31.30                       
##  Max.   : -69.32   Max.   :39.95   Max.   :54.76                       
##                                    NA's   :18907

Now we can map the raster with ggplot() using the geom_raster() function. Most of the other ggplot() functions have been covered in previous examples. The coord_fixed() function is called with argumentratio = 1to use the same scale on the x and y axes. Thelegend.position()argument to thetheme()` function is used to place the legend at the bottom of the map.

ggplot(data = lst_df) +
  geom_raster(aes(x = x, y = y, fill = value)) +
  scale_fill_gradient(name = "Degrees C", low = "yellow", high = "red") +
  coord_fixed(expand = F, ratio = 1) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x="longitude", y="latitude") +
  theme(legend.position="bottom")

The raster package includes functions for a variety of basic raster data processing procedures. To clip out a smaller portion of the LST dataset, we can use the crop() function.

clipext <- extent(-85, -83, 33, 35)
lst_clip <- crop(lst_c, clipext)
lst_clip_df <- rasterdf(lst_clip)
ggplot(data = lst_clip_df) +
  geom_raster(aes(x = x, y = y, fill = value)) +
  scale_fill_gradient(name = "Degrees C", low = "yellow", high = "red") +
  coord_fixed(expand = F, ratio = 1) +
  labs(title = "LST-Adjusted Temperature (Terra Day)",
       x="longitude", y="latitude")

The raster package also contains many other functions for carrying out basic raster GIS operations, including aggregating and resampling to different cell sizes, focal statistics, and zonal statistics. Many of these functions will be demonstrated in later examples.

Often, it is helpful to overlay boundary features on top of a raster to provide context. Here, the clipped raster is displayed with county boundaries overlaid. To do this, the geom_sf() function can be used like in the example from the previous chapter. The fill = NA argument displays the polygons as a hollow wireframe so that the raster data can be seen underneath. Calling geom_sf() forces the x and y axes to have the same scales. So there is no need to include the coord_fixed() function like in the preceding examples.

ga_sf <- st_read(dsn = "GA_SHP.shp")
## Reading layer `GA_SHP' from data source `E:\work\projects\gdswr-book\GA_SHP.shp' using driver `ESRI Shapefile'
## Simple feature collection with 159 features and 18 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -85.60516 ymin: 30.35576 xmax: -80.75143 ymax: 35.00066
## geographic CRS: NAD83
ggplot() +  
  geom_raster(data = lst_clip_df, aes(x = x, y = y, fill = value)) +
  geom_sf(data=ga_sf, color = "grey50", fill = NA, size = 1) +
  scale_fill_gradient(name = "Degrees C", low = "yellow", high = "red") +
  coord_sf(xlim = c(-85, -83),  ylim = c(33, 35), expand = F) +
  labs(title = "LST-Adjusted Temperature (Terra Day)")

6.3 Combining Rasters into Bricks and Stacks

Next, we will import some meterological grids from the North American Land Data Assimilation System (NLDAS). These gridded binary data are stored as GRIB archives. For arcane reasons that we don’t need to discuss here, we need to first use readGDAL() to import each data layer before converting it to a raster. When this code is run, it will generate a large number of warnings that can be ignored. In the GRIB archives, band 1 contains the temperature data and band 10 contains the precipitation data.

temp2012 <- raster(readGDAL("NLDAS_FORA0125_M.A201207.002.grb", band=1))
temp2013 <- raster(readGDAL("NLDAS_FORA0125_M.A201307.002.grb", band=1))
temp2014 <- raster(readGDAL("NLDAS_FORA0125_M.A201407.002.grb", band=1))
temp2015 <- raster(readGDAL("NLDAS_FORA0125_M.A201507.002.grb", band=1))
temp2016 <- raster(readGDAL("NLDAS_FORA0125_M.A201607.002.grb", band=1))
temp2017 <- raster(readGDAL("NLDAS_FORA0125_M.A201707.002.grb", band=1))
prec2012 <- raster(readGDAL("NLDAS_FORA0125_M.A201207.002.grb", band=10))
prec2013 <- raster(readGDAL("NLDAS_FORA0125_M.A201307.002.grb", band=10))
prec2014 <- raster(readGDAL("NLDAS_FORA0125_M.A201407.002.grb", band=10))
prec2015 <- raster(readGDAL("NLDAS_FORA0125_M.A201507.002.grb", band=10))
prec2016 <- raster(readGDAL("NLDAS_FORA0125_M.A201607.002.grb", band=10))
prec2017 <- raster(readGDAL("NLDAS_FORA0125_M.A201707.002.grb", band=10))

There are various plotting functions and methods available in the raster and rasterVis package. The plot() method for rasters can be used to generate scatterplots of data from multiple rasters.

plot(temp2012, temp2013, xlab="2012", ylab = "2013")
## Warning in .local(x, y, ...): plot used a sample of 95.9% of the cells. You can
## use "maxpixels" to increase the sample)

Because of the large number of cells in most raster datasets, simply plotting them all as points results is a large dark blob. Setting the gridded=TRUE argument produces a gridded scatterplot, in which the value of each grid cell represents the density of points in that portion of the scatterplot.

plot(temp2012, temp2013, xlab="2012", ylab = "2013", gridded=TRUE)

Multiple raster objects can be combined to produce a stack or a brick. These data objects are frequently used for space-time data because they can store multiple rasters from different dates. Stacks and bricks differ in how they access and store the raster data - bricks are typically faster to process but can take up large amounts of memory. The following code creates two raster bricks, one containing July temperature data from 2012-2017 and one containing July precipitation data from 2012-2017.

tempbrick <- brick(temp2012, temp2013, temp2014, 
                   temp2015, temp2016, temp2017)
names(tempbrick) <- c("July.2012", "July.2013", "July.2014", 
                      "July.2015", "July.2016", "July.2017")
tempbrick
## class      : RasterBrick 
## dimensions : 224, 464, 103936, 6  (nrow, ncol, ncell, nlayers)
## resolution : 0.125, 0.125  (x, y)
## extent     : -125.0005, -67.0005, 25.0005, 53.0005  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +R=6371200 +no_defs 
## source     : memory
## names      : July.2012, July.2013, July.2014, July.2015, July.2016, July.2017 
## min values :  4.150000,  3.180000,  3.320000,  4.010000,  3.090000,  5.270039 
## max values :  37.35000,  38.95000,  39.17000,  36.63000,  38.83000,  40.18004
precbrick <- brick(prec2012, prec2013, prec2014, 
                   prec2015, prec2016, prec2017)
names(precbrick) <- c("July.2012", "July.2013", "July.2014", 
                      "July.2015", "July.2016", "July.2017")
precbrick
## class      : RasterBrick 
## dimensions : 224, 464, 103936, 6  (nrow, ncol, ncell, nlayers)
## resolution : 0.125, 0.125  (x, y)
## extent     : -125.0005, -67.0005, 25.0005, 53.0005  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +R=6371200 +no_defs 
## source     : memory
## names      : July.2012, July.2013, July.2014, July.2015, July.2016, July.2017 
## min values :         0,         0,         0,         0,         0,         0 
## max values : 1695.1808,  764.4928,  390.6496,  466.9184,  416.3968,  424.2816

The cellStats() function can be applied to stacks and bricks as well as individual raster layers. Each layer in a stack or brick is summarized individually.

cellStats(tempbrick, stat = "mean")
## July.2012 July.2013 July.2014 July.2015 July.2016 July.2017 
##  24.73522  23.03793  22.64051  23.11549  23.42190  23.61176
cellStats(tempbrick, stat = "min")
## [1] 4.150000 3.180000 3.320000 4.010000 3.090000 5.270039
cellStats(tempbrick, stat = "max")
## [1] 37.35000 38.95000 39.17000 36.63000 38.83000 40.18004
cellStats(tempbrick, stat = "sd")
## July.2012 July.2013 July.2014 July.2015 July.2016 July.2017 
##  5.313249  4.747002  4.878721  4.798528  5.369573  5.137213

The following code uses ggplot() to generate a map of all the raster layers in the stack or brick. First the raster bricks are converted to data frames using the custom tempbrick_df() function, and then them ggplot() is used with with facet_wrap() to display a composite plot with one year mapped in each facet. A vector dataset of state boundaries is overlaid to provide a spatial reference.

tempbrick_df <- rasterdf(tempbrick)
summary(tempbrick_df)
##        x                 y             value             variable     
##  Min.   :-124.94   Min.   :25.06   Min.   : 3.09    July.2012:103936  
##  1st Qu.:-110.47   1st Qu.:32.03   1st Qu.:19.59    July.2013:103936  
##  Median : -96.00   Median :39.00   Median :23.53    July.2014:103936  
##  Mean   : -96.00   Mean   :39.00   Mean   :23.43    July.2015:103936  
##  3rd Qu.: -81.53   3rd Qu.:45.97   3rd Qu.:27.62    July.2016:103936  
##  Max.   : -67.06   Max.   :52.94   Max.   :40.18    July.2017:103936  
##                                    NA's   :140982
states_sf <- read_sf("conterminous_us_states.shp")

ggplot() +
  geom_raster(data = tempbrick_df, aes(x = x, y = y, fill = value)) +
  geom_sf(data=states_sf, fill=NA,color="grey50", size=0.25) +
  scale_fill_gradient(name = "Degrees C", low = "yellow", high = "red") +
  coord_sf(expand = F) +
  facet_wrap(~ variable, ncol = 2) + 
  labs(title = "Mean July Temperature") +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank())

6.4 Computations on Raster Objects

Many statistical summary functions have methods for raster objects. When used with stacks and bricks, these function will be applied separately to each pixel to summarize the data through time. For example, the following code generate a mean value for each pixel for the four years of data in the brick.

meantemp <- mean(tempbrick)
meantemp_df <- rasterdf(meantemp)
ggplot() +
  geom_raster(data = meantemp_df, aes(x = x, y = y, fill = value)) +
  geom_sf(data=states_sf, fill=NA,color="grey50", size=0.25) +
  scale_fill_gradient(name = "Degrees C", low = "yellow", high = "red") +
  coord_sf(expand = F) +
  labs(title = "Mean July Temperature, 2012-2017") +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank())

The following code subtracts each of the four annual datasets in the brick from the mean temperature layer to generate maps of temperature anomalies. In this example, tempbrick is a raster brick with six layers and meantemp is a single raster layer. When we use the subtraction operator, meantemp is subtracted from each layer of tempbrick, producing a new raster brick with six layers called tempanom. The original layer names from tempbrick are then assigned to tempanom.

tempanom <- tempbrick - meantemp
names(tempanom) <- names(tempbrick)
tempanom_df <- rasterdf(tempanom)
ggplot() +
  geom_raster(data = tempanom_df, aes(x = x, y = y, fill = value)) +
  geom_sf(data=states_sf, fill=NA,color="grey50", size=0.25) +
  scale_fill_gradient2(name = "Degrees C", low = "blue", mid = "lightyellow", 
                       high = "red") +
  coord_sf(expand = F) +
  facet_wrap(~ variable, ncol = 2) + 
  labs(title = "Mean July Temperature") +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank())

In a most real application, we would use more than six years to data (typically 30 years) to do this type of anomaly calculation.

One or more layers can be extracted from a raster brick or stack using double brackets. As with lists in base R, raster layers can be extracted either by number or name.

precip1 <- precbrick[[1]]
precip1
## class      : RasterLayer 
## dimensions : 224, 464, 103936  (nrow, ncol, ncell)
## resolution : 0.125, 0.125  (x, y)
## extent     : -125.0005, -67.0005, 25.0005, 53.0005  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +R=6371200 +no_defs 
## source     : memory
## names      : July.2012 
## values     : 0, 1695.181  (min, max)
precip2 <- precbrick[["July.2012"]]
precip2
## class      : RasterLayer 
## dimensions : 224, 464, 103936  (nrow, ncol, ncell)
## resolution : 0.125, 0.125  (x, y)
## extent     : -125.0005, -67.0005, 25.0005, 53.0005  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +R=6371200 +no_defs 
## source     : memory
## names      : July.2012 
## values     : 0, 1695.181  (min, max)
precip3 <- precbrick[[1:3]]
precip3
## class      : RasterBrick 
## dimensions : 224, 464, 103936, 3  (nrow, ncol, ncell, nlayers)
## resolution : 0.125, 0.125  (x, y)
## extent     : -125.0005, -67.0005, 25.0005, 53.0005  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +R=6371200 +no_defs 
## source     : memory
## names      : July.2012, July.2013, July.2014 
## min values :         0,         0,         0 
## max values : 1695.1808,  764.4928,  390.6496

Here are some additional examples of graphs of raster bricks using functions from the raster and rasterVis packages. The hist() and the density() functions from the raster package will automatically generate a histogram or density plot for each layer in a stack or brick.

hist(tempbrick)

density(tempbrick)

The densityplot() function from the rasterVis package generates a density plot for each layer in a stack or brick and combines them in a single plot.

densityplot(tempbrick)

The pairs() function from the raster package generates a scatterplot for each pair of layers in the lower panel, a histograms for each layer along the diagonal, and correlation coefficient for each pair of layers in the upper panel.

pairs(tempbrick)

pairs(sqrt(precbrick))

The splom() package from the rasterVis package is imilar to pairs(), but it generates gridded scatterplots and displays a smoothed density functions for each layer along the diagonal.

splom(tempbrick)

splom(sqrt(precbrick))

These functions are very handy for generating quick, interactive visualizations for large raster objects. However, keep in mind that you can always convert your raster data into a data frame and graph it with ggplot() if you want to have more control over the layout of your plot.

6.5 Practice

  1. Summarize the precipitation brick to create a new raster where each cell contains the mean July precipitation from 2012-2017. Convert the precipitation data from millimeters to inches and generate a map of these data with US counties overlaid.

  2. Generate a new brick of square-root transformed precipitation for the years 2012-2017. Use these data to calculate July precipitation anomalies for the years 2012-2017 and generate a faceted plot showing these annual maps.

  3. Generate a new raster where each cell contains the difference between July 2015 temperature and July 2016 temperature. Generate a histogram of this raster and use to it assess changes in temperature between 2015 and 2016.