Module 5 Introduction to Raster in R

In this tutorial, we will review the fundamental principles, packages and metadata/raster attributes that are needed to work with raster data in R. We discuss the three core metadata elements that we need to understand to work with rasters in R: CRS, extent and resolution. It also explores missing and bad data values as stored in a raster and how R handles these elements. Finally, it introduces the GeoTiff file format.

Learning Objectives

  • Understand what a raster dataset is and its fundamental attributes.

  • Know how to explore raster attributes in R.

  • Be able to import rasters into R using the raster package.

  • Be able to quickly plot a raster file in R.

  • Understand the difference between single- and multi-band rasters.

Additional Resources

Raster Data in R Basics

Terra CheatSheet

Terra Tutorial

Download Data

NEON TEACHING DATA SUBSET: AIRBORNE REMOTE SENSING DATA

The LiDAR and imagery data used to create this raster teaching data subset were collected over the National Ecological Observatory Network’s Harvard Forest and San Joaquin Experimental Range field sites and processed at NEON headquarters. The entire dataset can be accessed by request from the NEON Data Portal.

5.1 About Raster Data

What is raster data? Raster data is any pixelated (or gridded) data where each pixel is associated with a specific geographic location. The value of a pixel can be continuous (e.g. elevation) or categorical (e.g. land use).

A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects that data to a given location. This includes the raster’s extent and cell size, the number of rows and columns, and its CRS system. Examples include natural landscape, land use, heat maps, elevation maps, etc.

5.2 Important Attributes of Raster Data

Extent: The spatial extent of an R spatial object represents the geographic “edge” or location that is the furthest north, south, east and west. In other words, extent represents the overall geographic coverage of the spatial object.

Resolution: A raster has horizontal (x and y) resolution. This resolution represents the area on the ground that each pixel covers. The units for our data are in meters. Given our data resolution is 1 x 1, this means that each pixel represents a 1 x 1 meter area on the ground.

Source: National Ecological Observatory Network

Coordinate Reference System (CRS): The Coordinate Reference System or CRS tells R where the raster is located in geographic space. It also tells R what method should be used to “flatten” or project the raster in geographic space.

Maps of the United States in different projections. Notice the differences in shape associated with each different projection. These differences are a direct result of the calculations used to “flatten” the data onto a 2-dimensional map.

Source: M. Corey, opennews.org

5.3 Vector to Raster

In this section we will practice methods to convert data from a vector format (points, lines, polygons) to a raster object. For this we need to install and call the terra package as well as other previous packages we commonly use (\(sf\) and \(dplyr\)). For this example, we will be using the New Haven data from the previous lectures. Recall from this data set, we have the following: (1) a spatial data frame \((blocks\_sf)\) on demographic characteristics of the census blocks of New Haven, (2) a spatial data frame \((breach\_sf)\) on locations of breaches of the peace, and (3) a spatial data frame \((roads\_sf)\) on roads in New Haven.

We want to convert these different data types to raster formats. Here are the steps for converting vector to raster data:

  1. Create a skeleton raster of empty grid cells using the \(rast()\) function. Data inputs for this function are: number of rows \(nrows\), number of columns \(ncols\), and extent (ext). Higher number of rows and cells creates a finer grid. The extent can be given in two ways: (1) using the \(extent()\) function where you input your vector data, or (2) defining the minimum and maximum of longitude and latitude.

  2. Create a spatial raster object using the \(rasterize()\) function. This function fuses the vector data with the skeleton grid you created in step 1.

We will go through the different steps to convert vector to raster data for points, lines, and polygons using the New Haven data set.

5.3.1 Converting Point Data

Using the two steps described above, we will convert the locations of breaches of the peace into raster format. In the first line of code, we extract the matrix of long and lat coordinates from the sf data frame using the \(st\_coordinates()\) function. Using the \(rast()\) function we create an empty grid of cells, where we have specified this grid to have 50 rows and columns. The extent of the grid captures the geographic area of our map. We use the \(extent()\) function where we input the matrix of longitude and latitude of the breach locations. (Note: you can experiment with the number of rows and columns to see how the map changes). Next we set the projection of the raster map to be EPSG: 4326 using the \(crs()\) function. Finally we use the \(rasterize()\) function to fuse information from the empty raster grid and the matrix of breach locations. The inputs for this function are: (1) the point data in the form of a matrix where \(x\) and \(y\) columns refer to longitude and latitude, (2) the raster grid created in step 1, and (3) the function \(fun\) on how to summarize multiple points within a grid cell.

#Create a matrix of coordinates
coords <- st_coordinates(breach_sf)
names(coords) <- c("x", "y")

#We can find the min and max long and lat values.
xmin <- min(coords[,1])
xmax <- max(coords[,1])
ymin <- min(coords[,2])
ymax <- max(coords[,2])


temp <- terra::vect(breach_sf)

#Create a skeleton of a raster
r <- terra::rast(temp, nrow = 50, ncol = 50)
crs(r) <- "EPSG:4326"

#We can get the size of the raster
nrow(r) # number of rows
[1] 50
ncol(r) # number of columns
[1] 50
dim(r) # dimension
[1] 50 50  1
ncell(r) # number of cells
[1] 2500
#Create a raster object with both grid and points data. We are summing points within each cell. 
points_raster <- terra::rasterize(temp, # coordinate data
                                  r, #empty raster grid
                                  fun = sum #sum the points in a given cell
                                  )

#Create the outline of census blocks
blocks_union <- st_combine(blocks_sf)



#Run these lines together to map the raster
terra::plot(points_raster, size = 10)
terra::plot(blocks_union, add=T)
# data frame with the coordinates and the number of locations within each cell.
points <- as.data.frame(coords)
valuesatpoints <- terra::extract(points_raster,points)
 head(cbind(points, valuesatpoints))
         X        Y ID lyr.1
1 551419.5 181266.3  1     1
2 556319.5 177580.7  2     1
3 551423.1 172304.5  3     3
4 550261.9 182613.2  4     2
5 555168.5 172163.4  5     2
6 549133.6 169623.6  6     2

5.3.2 Converting Lines Data

We use the same steps as we did in the point data above. We have an additional step for lines where we create a spatial vector using the \(vect()\) function where this input is the \(roads\_sf\) data frame. Using the \(rast()\) function we create an empty grid of cells, where we have 100 rows and columns. We can increase the number of cells to 1000 by 1000 to show the roads in greater detail. The extent is set directly using the \(lines\_vector\) object, i.e., we do not need to specify it here. Next we set the projection of the raster map to be EPSG: 4326. Again we use the same \(rasterize()\) function to fuse information from the empty grid and the \(lines\_vector\) object. The inputs for this function are: (1) the lines vector data \(lines\_vector\), (2) the raster grid \(r\).

#Create a raster object with both grid and lines data. 
lines_vector <- terra::vect(roads_sf)
#Create an empty raster grid
r <- terra::rast(lines_vector, nrow = 100, ncol = 100)
crs(r) <- "EPSG:4326"

#Create a raster map of the roads
lines_raster <- terra::rasterize(lines_vector, #data
                                 r #empty raster grid of cells
                                 )



#Run these lines together to map the raster
terra::plot(lines_raster, size = 10)
terra::plot(blocks_union, add=T)
#Change the nrow and ncol to be 1000 to see the difference.

5.3.3 Converting Polygon Data

We can use a similar process as above. We have the added step for polygons where we create a spatial vector using the \(vect()\) function where this input is the \(blocks\_sf\) data frame. Using the \(rast()\) function with create an empty grid of cells, where we have specific this grid to have 100 rows and columns. We set the extent using the min and max values of the map coordinates set above. Next we set the projection of the raster map to be EPSG: 4326. We use the same \(rasterize()\) function to fuse information from the empty grid and the blocks vector \(v\). The inputs for this function are: (1) the blocks vector data \(v\), (2) the raster grid \(r\). We can create the raster to color by a particular variable. The \(VACANT\) variable captures the number of vacant housing units by census block.

#Create a raster object with both grid and polygon data. We are summing points within each cell. 

# Polygons (SpatVector)
v <- terra::vect(blocks_sf) #Create a spatial vector

#Create an empty grid of cells
r <- terra::rast(v, ncols = 100, nrows = 100)
crs(r) <- "EPSG:4326"

#Create a raster map of the census block data
areas_raster <- terra::rasterize(v, #census block data
                                 r, #empty grid
                                 "VACANT" #which variable you want to map
                                 )
crs(areas_raster) <- "EPSG:4326"


#Run these lines together to map the raster
terra::plot(areas_raster, size = 10)
terra::plot(blocks_union, add=T)

5.4 Manipulating raster data

In some cases, raster data is saved as a \(geotiff\) file. In our example we will be using the \(tif\) files of the Harvard Forest raster data.

What is a GEOTIFF file?

A .tif file stores metadata or attributes about the file as embedded tif tags. For instance, your camera might store a tag that describes the make and model of the camera or the date the photo was taken when it saves a .tif. A GeoTIFF is a standard .tif image format with additional spatial (georeferencing) information embedded in the file as tags. These tags can include the following raster metadata:

  1. A Coordinate Reference System (CRS)

  2. Spatial Extent (extent)

  3. Values that represent missing data (NoDataValue)

  4. The resolution of the data

To follow the example, you must download the HARV_dmsCrop.tif file. This raster represents the map of the tree elevation for Harvard Forest derived from the NEON AOP LiDAR sensor.

We use the \(terra\) package and the function \(rast()\) to read in the \(tif\) file. The tif file has all the raster attribute information already stored, which is automatically read in using \(rast()\). When we explore the raster object we are able to identify the number of rows/columns and layers, as well as the resolution, extent, and CRS system. It also gives the min and max value of tree elevation as 305.07 and 416.07 meters.

# Load raster into R yup its that easy.
DSM_HARV <- terra::rast("images/HARV/DSM/HARV_dsmCrop.tif")

# View raster structure
DSM_HARV 
## class       : SpatRaster 
## dimensions  : 1367, 1697, 1  (nrow, ncol, nlyr)
## resolution  : 1, 1  (x, y)
## extent      : 731453, 733150, 4712471, 4713838  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 18N (EPSG:32618) 
## source      : HARV_dsmCrop.tif 
## name        : HARV_dsmCrop 
## min value   :       305.07 
## max value   :       416.07
plot(DSM_HARV, main="Continuous Elevation Map\n NEON Harvard Forest Field Site")
The raster map shows a high resolution (fine) grid of continuous elevation heights across the geographic area (extent). The green colors denote higher elevations compared to the pink colors with lower elevations.

5.4.0.1 Categorical Elevation Map of the NEON Harvard Forest Site

Some rasters contain categorical data where each pixel represents a discrete class such as a landcover type (e.g., “forest” or “grassland”) rather than a continuous value such as elevation or temperature. Some examples of classified maps include:

  1. Landcover/land-use maps.

  2. Tree height maps classified as short, medium, tall trees.

  3. Elevation maps classified as low, medium and high elevation.

Below we plot the Harvard Forest raster data and include break points of 250, 350, 380, and 500 for categories of elevation. We include these breaks in the \(plot()\) function. The legend of this map shows the colors representing each discrete class.

# Demonstration image for the tutorial
DSM_HARV <- terra::rast("images/HARV/DSM/HARV_dsmCrop.tif")

# add a color map with 5 colors
col= terrain.colors(3)

# add breaks to the colormap (4 breaks = 3 segments)
brk <- c(250,350, 380,500)

# DEM with a custom legend
plot(DSM_HARV, 
    col=col, #Set the color palette
    breaks=brk, #Set the break points
    main="Classified Elevation Map\nNEON Harvard Forest Field Site",
    plg = list(legend = c("High Elevation", "Middle","Low Elevation")))
#Change the col to heat.colors to try a different palette.

This plot shows the same raster map using a categorical variable. Elevation has been broken down into low, middle, high, and the color change corresponds the the category assigned to each grid cell.

5.5 NoData Values in Rasters

Raster data often has a NoDataValue associated with it. This is a value assigned to pixels where data are missing or no data were collected.

By default the shape of a raster is always square or rectangular. So if we have a dataset that has a shape that isn’t square or rectangular, some pixels at the edge of the raster will have \(NoDataValue's\). This often happens when the data were collected by an airplane which only flew over some part of a defined region.

In the image below, the pixels that are black have \(NoDataValue's\). The camera did not collect data in these areas.

# no data demonstration code - not being taught 
# Use stack function to read in all bands
RGB_stack <- terra::rast("images/HARV/RGB_Imagery/HARV_RGB_Ortho.tif")


# Create an RGB image from the raster stack
par(col.axis="white",col.lab="white",tck=0)
plotRGB(RGB_stack, r = 1, g = 2, b = 3, 
        axes=TRUE, main="Raster With NoData Values\nRendered in Black")

In the next image, the black edges have been assigned NoDataValue. R doesn’t render pixels that contain a specified NoDataValue. R assigns missing data with the NoDataValue as NA.

RGB_stack <- terra::rast("images/HARV/RGB_Imagery/HARV_RGB_Ortho.tif")


RGB_stack[RGB_stack == 0] <- NA

# RGB_stack[RGB_stack == 999] <- NA


# Create an RGB image from the raster stack
plotRGB(RGB_stack, r = 1, g = 2, b = 3,
        axes=TRUE, main="Raster With No Data Values\nNoDataValue= NA")

The assigned NoDataValue varies across disciplines; -9999 is a common value used in both the remote sensing field and the atmospheric fields. It is also the standard used by the National Ecological Observatory Network (NEON).

If we are lucky, our GeoTIFF file has a tag that tells us what is the NoDataValue. If we are less lucky, we can find that information in the raster’s metadata. If a NoDataValue was stored in the GeoTIFF tag, when R opens up the raster, it will assign each instance of the value to NA. Values of NA will be ignored by R as demonstrated above.

5.6 Bad Data Values in Rasters

Bad data values are different from \(NoDataValue's\). Bad data values are values that fall outside of the applicable range of a dataset.

Examples of Bad Data Values:

  • We dont expect extreme tree elevations greater than 500 meters in the Harvard Forest. So we need to check for extreme values.
  • Reflectance data in an image will often range from 0-1 or 0-10,000 depending upon how the data are scaled. Thus a value greater than 1 or greater than 10,000 is likely caused by an error in either data collection or processing.

5.6.1 Find Bad Data Values

Sometimes a raster’s metadata will tell us the range of expected values for a raster. Values outside of this range are suspect and we need to consider than when we analyze the data. Sometimes, we need to use some common sense and scientific insight as we examine the data - just as we would for field data to identify questionable values.

5.7 Create A Histogram of Raster Values

We can explore the distribution of values contained within our raster using the hist() function which produces a histogram. Histograms are often useful in identifying outliers and bad data values in our raster data.

DSM_HARV <- terra::rast("images/HARV/DSM/HARV_dsmCrop.tif")

# View the total number of pixels (cells) in is our raster 
ncell(DSM_HARV)
[1] 2319799
# create histogram that includes with all pixel values in the raster
hist(DSM_HARV, 
     main="Distribution of DSM Values\n All Pixel Values Included\n NEON Harvard Forest Field Site",
     xlab="DSM Elevation Value (m)",
     ylab="Frequency",
     col="wheat4")

5.8 Raster Bands

The Digital Surface Model object (DSM_HARV) that we’ve been working with is a single band raster. This means that there is only one dataset stored in the raster: surface elevation in meters for one time period. However, raster data can also be multi-band meaning that one raster file contains data for more than one variable or time period for each cell. The HARV_RGB_Ortho raster contains 3 layers that represent different bandwidths in the electromagnetic spectrum or color spectrum.

By default the rast() function imports multiple layers of the raster file. We can use the rast() function to import a multi-band raster. The three layers can be plotted together using the \(plotRGB()\) function to create a full color plot.

RGB_stack <- terra::rast("images/HARV/RGB_Imagery/HARV_RGB_Ortho.tif")

RGB_stack
class       : SpatRaster 
dimensions  : 2317, 3073, 3  (nrow, ncol, nlyr)
resolution  : 0.25, 0.25  (x, y)
extent      : 731998.5, 732766.8, 4712956, 4713536  (xmin, xmax, ymin, ymax)
coord. ref. : WGS 84 / UTM zone 18N (EPSG:32618) 
source      : HARV_RGB_Ortho.tif 
names       : HARV_RGB_Ortho_1, HARV_RGB_Ortho_2, HARV_RGB_Ortho_3 
min values  :                0,                0,                0 
max values  :              255,              255,              255 
#Plotting individual layers
plot(RGB_stack)
# Create an RGB image from the raster stack to include multiple layers
par(col.axis="white",col.lab="white",tck=0)
plotRGB(RGB_stack,
        axes=TRUE, main="Multiple layer raster")
This plot shows how combining the three color spectrum layers results in a full color raster with a natural comprehensive image of the natural environment.

Lab 4 Activity

  1. Using your R Markdown file from Lab 3, read in your project data in \(sf\) format. Call the libraries needed (namely that \(sf\) and \(terra\)).

  2. Convert your point data set to a raster data set using a resolution of 10 rows by 10 columns. Then plot the raster plot of your point data.

  3. Convert your point data set to a raster data set using a resolution of 100 rows by 100 columns. Then plot the raster plot of your point data.

  4. Explain the difference between the raster maps from steps 2 and 3.

  5. Convert your polygon data set to a raster data set using a resolution of 10 rows by 10 columns. Then plot the raster plot of your polygon data. Note: you must set the color of the raster using a variable name (the variable of interest).

  6. Convert your polygon data set to a raster data set using a resolution of 100 rows by 100 columns. Then plot the raster plot of your polygon data.Note: you must set the color of the raster using a variable name (the variable of interest).

  7. Explain the difference between the raster maps from 5 and 6.

  8. Create a raster map of your polygon data that has a resolution that gives the “best” visualization of the spatial patterns. Meaning experiment with the number of cols and rows to create the best visualization.

  9. Plot the color related to a variable of interest by setting 4 intervals/break points and color the map using the \(terrain.colors()\) or \(heat.colors()\) or another palette.

  10. Check for missing and bad values in your data by creating a histogram of your polygon data values.