Chapter 10 Combining Vector Data with Discrete Raster Data
This chapter will introduce several new datasets and additional examples of integrating vector and raster data. The Cropland Data Layer (CDL) is a categorical raster dataset that maps agricultural land use in the United States. It provides annual data on locations where specific crop types such as corn, soybeans, wheat, and many others are planted. These maps are produced by the United States Department of Agriculture (USDA), which collects massive amounts of field-level data on planted crops from farmers. The data are used to train machine learning algorithms that predict the spatial pattern of crop types using satellite remote sensing data (Boryan et al. 2011). The CDL datasets can be visualized and accessed through the CropScape online platform (https://nassgeodata.gmu.edu/CropScape/).
The National Hydrography Dataset (NHD) contains vector data on the water drainage network in the United States. It consists of points, lines, and polygons that represent hydrological unit boundaries as well as features such as lakes, ponds, rivers, and streams. These data can be downloaded through the United State Geological Survey (USGS) National Map (https://apps.nationalmap.gov/downloader/).
This chapter will use downloaded data files characterizing crop types, hydrological unit boundaries, and stream data from the Middle Big Sioux Subbasin in eastern South Dakota. However, these data can also be accessed directly through R using the CropScapeR (Chen 2021) and nhdR (Stachelek 2022) packages.
Analyzing changes in the relative abundance of crop types and other land cover classes is important for understanding how agricultural land use is changing over time. In eastern South Dakota and other locations at the edge of the U.S. “Corn Belt”, there is concern that expanding cropland is reducing the cover of grasslands, including lands managed for pasture and hay as well as native prairie remnants (Michael C. Wimberly et al. 2017). Knowing whether these changes are occurring close to streams and other water bodies is important for assessing their impacts on wildlife habitat, carbon storage, and water quality.
The required R packages have all been used before with the exception of ggspatial (Dunnington 2022), which provides some additional functions for map annotations when using ggplot()
. The rasterdf()
function is also sourced for mapping the raster data in ggplot()
.
library(tidyverse)
library(sf)
library(terra)
library(scales)
library(colorspace)
library(ggspatial)
source("rasterdf.R")
The raster data include TIF files for 2010 and 2020 CDL data covering a portion of eastern South Dakota that were downloaded using the CropScape website. The vector data include shapefiles with flowlines, watershed boundaries, and subwatershed boundaries within the Middle Big Sioux subbasin in eastern South Dakota.
# Read in the data
<- rast("CDL_2010_clip.tif")
crop2010 <- rast("CDL_2020_clip.tif")
crop2020 <- st_read("NHDFlowline.shp", quiet = TRUE)
streams <- st_read("WBDHU10.shp", quiet = TRUE)
wsheds <- st_read("WBDHU12.shp", quiet = TRUE) subwsh
Before trying to combine the two data sources, it is important to check their coordinate reference systems. The NHD data are in a geographic coordinate system.
writeLines(st_crs(wsheds)$WktPretty)
## GEOGCS["NAD83",
## DATUM["North_American_Datum_1983",
## SPHEROID["GRS 1980",6378137,298.257222101]],
## PRIMEM["Greenwich",0],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AXIS["Latitude",NORTH],
## AXIS["Longitude",EAST],
## AUTHORITY["EPSG","4269"]]
The CDL data are in a projected Albers Equal Area coordinate system.
writeLines(st_crs(crop2010)$WktPretty)
## PROJCS["unnamed",
## GEOGCS["NAD83",
## DATUM["North_American_Datum_1983",
## SPHEROID["GRS 1980",6378137,298.257222101004]],
## PRIMEM["Greenwich",0],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4269"]],
## PROJECTION["Albers_Conic_Equal_Area"],
## PARAMETER["latitude_of_center",23],
## PARAMETER["longitude_of_center",-96],
## PARAMETER["standard_parallel_1",29.5],
## PARAMETER["standard_parallel_2",45.5],
## PARAMETER["false_easting",0],
## PARAMETER["false_northing",0],
## UNIT["metre",1,
## AUTHORITY["EPSG","9001"]],
## AXIS["Easting",EAST],
## AXIS["Northing",NORTH]]
For the reasons discussed in Chapter 8, all three vector datasets are reprojected to match the coordinate system of the raster data rather than reprojecting the raster data.
<- st_transform(wsheds, crs(crop2010))
wsheds_aea <- st_transform(subwsh, crs(crop2010))
subwsh_aea <- st_transform(streams, crs(crop2010)) streams_aea
10.1 Visualizing and Manipulating Vector Data
The first step is to explore the watersheds within the Middle Big Sioux Subbasin by generating a labeled map. First, columns with x and y coordinates are added to the wsheds
object to make it possible to label the watersheds. For map labels, it is advisable to use st_point_on_surface()
instead of st_centroid()
to ensure that the label points fall inside the polygons. The st_coordinates()
function converts these coordinates into a geometry
column that can be interpreted by sf functions. Then a labeled map of watersheds is generated by using geom_sf()
to plot the watershed boundaries (the default fill color is gray) and using geom_text()
to write the name of each watershed at its label point (Figure 10.1).
<- cbind(wsheds_aea,
wsheds_aea st_coordinates(st_point_on_surface(wsheds_aea)))
ggplot(data = wsheds_aea) +
geom_sf(color = "black",
size = 0.1) +
geom_text(aes(x = X,
y = Y,
label = name),
size = 3,
fontface = "bold") +
coord_sf() +
theme_void()
Some of the labels are crowded and difficult to read. Also, because of the complex shapes of some of the watersheds, the point locations generated by st_point_on_surface()
are not well centered. When there are relatively few labels as in this example, it is often possible to adjust the text placement manually. This is the order of watersheds in the wsheds
data frame.
$name
wsheds## [1] "Lake Sinai-Big Sioux River"
## [2] "Deer Creek"
## [3] "Medary Creek"
## [4] "Dry Lake Number One"
## [5] "Lake Marsh"
## [6] "Lake Poinsett"
## [7] "Hidewood Creek"
## [8] "Oakwood Lakes"
## [9] "Sixmile Creek"
## [10] "North Deer Creek"
## [11] "Battle Creek"
The following code adjusts the text positions for “Lake Sinai-Big Sioux River” (position 1), “Deer Creek (position 2), and North Deer Creek (position 10). The hnudge
and vnudge
vectors contain the horizontal and vertical adjustments for all eleven watersheds. These values are in map units, which in this case are meters. Zero values are provided when the positions will not be adjusted. Then, the nudge_x = hnudge
and nudge_y = vnudge
arguments to geom_text()
are specified when generating the map text. Further refinements could be made, but the text here is at least a bit more readable. (Figure 10.2).
<- c(-10000, 5000, 0, 0, 0, 0, 0, 0, 0, 0, 0)
hnudge <- c(-7500, 2500, 0, 0, 0, 0, 0, 0, 0, 5000, 0)
vnudge
ggplot(data = wsheds_aea) +
geom_sf(color = "black",
size = 0.1) +
geom_text(aes(x = X,
y = Y,
label = name),
nudge_x = hnudge,
nudge_y = vnudge,
size = 3,
fontface = "bold") +
theme_void()
The label positions can also be modified using an alternative geom function. The geom_spatial_label_repel()
function from the ggspatial package automatically modifies the label placement to keep them from overlapping (Figure 10.3). The argument label = str_wrap(name, 15)
forces the labels to wrap so that the maximum width is 15 characters, which makes them more compact.
ggplot(data = wsheds_aea) +
geom_sf(color = "black",
size = 0.1) +
geom_spatial_label_repel(aes(x = X,
y = Y,
label = str_wrap(name, 15)),
size = 3,
fontface = "bold",
crs = st_crs(wsheds_aea)) +
theme_void()
The next map includes the watershed boundaries as well as the nested subwatershed boundaries displayed in dark gray (Figure 10.4). Because the data are from multiple data frames, the data
argument needs to be included in the geom_sf()
functions instead of the initial ggplot()
function. The annotation_scale()
function from the ggspatial package is also used to add a scale bar to better discern the sizes of the polygons.
ggplot() +
geom_sf(data = subwsh_aea, color = "darkgray") +
geom_sf(data = wsheds_aea, color = "black", fill = NA) +
annotation_scale(location = 'bl') +
theme_void()
The following analyses will focus on the North Deer Creek watershed. To extract the boundary for only this watershed, the filter()
function is used.
<- wsheds_aea %>%
wshed_ndc filter(name == "North Deer Creek")
A second filter()
function with a nested sf_covered_by()
function is then used to make a spatial query that selects all subwatershed polygons in the subwsh_aea
layer that fall within the North Deer Creek watershed.
<- subwsh_aea %>%
subwsh_ndc filter(lengths(st_covered_by(., wshed_ndc)) > 0)
10.2 Zonal Summaries of Discrete Raster Data
To summarize the crop type data for each subwatershed, the CDL data are first cropped and masked to the boundaries of the North Deer Creek watershed.
<- c(crop2010, crop2020)
cdlstack names(cdlstack) <- c("2010", "2020")
<- crop(cdlstack, vect(subwsh_ndc))
cdl_crop <- rasterize(vect(subwsh_ndc), cdl_crop)
subwsh_msk <- mask(cdl_crop, subwsh_msk) cdl_mbs
The approach that was introduced in Chapter 7 is used to reclassify the CDL into a smaller number of classes. The most widespread crops in this region are corn (1) and soybeans (5). Grasslands and pastures (37) non-alfalfa hay (176) and herbaceous wetlands (195) are combined into a single grass class. The others = 4
argument in the classify()
function is used to combine all the other codes into a single class.
<- c(1, 5, 37, 176, 195)
oldclas <- c(1, 2, 3, 3, 3)
newclas <- data.frame(oldclas, newclas)
lookup <- classify(cdl_mbs,
cdl_rc rcl = lookup,
others = 4)
The reclassified rasters are then plotted with the subwatershed boundaries overlaid (Figure 10.5).
<- c("Corn",
newnames "Soybeans",
"Grass",
"Other")
<- c("yellow",
newcols "green",
"tan",
"gray60")
<- desaturate(newcols,
newcols2 amount = 0.2)
<- rasterdf(cdl_rc)
cdl_df
ggplot(data = cdl_df) +
geom_raster(aes(x = x,
y = y,
fill =
as.character(value))) +
scale_fill_manual(name = "Crop Type",
values = newcols2,
labels = newnames,
na.translate = FALSE) +
geom_sf(data = subwsh_ndc,
fill = NA) +
facet_wrap(facets = vars(variable), ncol = 2) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
The zonal()
function, introduced in Chapter 9, is useful for generating statistical summaries of continuous data. However, a different approach is needed to summarize discrete data, where we want to estimate the areas of different classes within each polygon. We start by rasterizing the subwatershed polygons to match the raster grid of the land cover data. Then the crosstab()
function, which was used for change analysis in Chapter 7, is used to calculate the number of cells of each land cover type within each subwatershed polygon.
<- rasterize(vect(subwsh_ndc),
subwsh_r
cdl_rc, field = "ObjectID")
<- crosstab(c(subwsh_r, cdl_rc[["2010"]]))
cdlsum10 <- as_tibble(cdlsum10)
cdlsum10 <- crosstab(c(subwsh_r, cdl_rc[["2020"]]))
cdlsum20 <- as_tibble(cdlsum20) cdlsum20
Next, a series of dplyr()
and tidyr()
commands are used to convert the crosstabulation results into a long format for mapping and then compute the percent area of each crop type in each subwatershed. The process will be broken down into multiple steps.
The inner_join()
function is used to join the 2010 and 2020 crosstabulation results by the subwatershed code (stored in ObjectID
) and the crop type code (stored in X2010
and X2020
).
<- cdlsum10 %>%
cdljoin inner_join(cdlsum20,
by = c("ObjectID" = "ObjectID",
"X2010" = "X2020"))
cdljoin## # A tibble: 16 × 4
## ObjectID X2010 n.x n.y
## <chr> <chr> <int> <int>
## 1 38 1 42810 53551
## 2 39 1 22745 26070
## 3 40 1 35535 37192
## 4 41 1 6603 8636
## 5 38 2 48192 43019
## 6 39 2 22238 19146
## 7 40 2 40287 41062
## 8 41 2 8704 6928
## 9 38 3 32396 25490
## 10 39 3 13584 12382
## 11 40 3 19865 16614
## 12 41 3 17892 16703
## 13 38 4 21846 23184
## 14 39 4 7172 8141
## 15 40 4 13298 14117
## 16 41 4 7155 8087
The resulting table has the crop type codes stored in the X2010
column. The mutate()
function is used to convert these to a new croptype
column that is a factor with labels for each of the crop type names. The old X2010
column is discarded. New names are assigned to the n.x
and n.y
columns, which contain the counts of the 2010 and 2020 crop types for each subwatershed.
<- cdljoin %>%
cdljoin2 mutate(croptype = factor(X2010,
labels = newnames),
ObjectID = as.numeric(ObjectID)) %>%
select(-X2010) %>%
rename("cnt2010" = "n.x",
"cnt2020" = "n.y")
cdljoin2## # A tibble: 16 × 4
## ObjectID cnt2010 cnt2020 croptype
## <dbl> <int> <int> <fct>
## 1 38 42810 53551 Corn
## 2 39 22745 26070 Corn
## 3 40 35535 37192 Corn
## 4 41 6603 8636 Corn
## 5 38 48192 43019 Soybeans
## 6 39 22238 19146 Soybeans
## 7 40 40287 41062 Soybeans
## 8 41 8704 6928 Soybeans
## 9 38 32396 25490 Grass
## 10 39 13584 12382 Grass
## 11 40 19865 16614 Grass
## 12 41 17892 16703 Grass
## 13 38 21846 23184 Other
## 14 39 7172 8141 Other
## 15 40 13298 14117 Other
## 16 41 7155 8087 Other
The pivot_longer()
function is used to combine the 2010 and 2020 cell counts into a single cnt
column with year information stored in the year
column. The parse_number()
function is then used to convert year
from a character to a number.
<- cdljoin2 %>%
cdlpivot pivot_longer(cols = starts_with("cnt"),
names_to = "year",
values_to = "cnt") %>%
mutate(year = parse_number(year))
cdlpivot## # A tibble: 32 × 4
## ObjectID croptype year cnt
## <dbl> <fct> <dbl> <int>
## 1 38 Corn 2010 42810
## 2 38 Corn 2020 53551
## 3 39 Corn 2010 22745
## 4 39 Corn 2020 26070
## 5 40 Corn 2010 35535
## 6 40 Corn 2020 37192
## 7 41 Corn 2010 6603
## 8 41 Corn 2020 8636
## 9 38 Soybeans 2010 48192
## 10 38 Soybeans 2020 43019
## # … with 22 more rows
The total number of pixels for each combination of subwatershed and year is summed, and the percent area of each crop type within each subwatershed is calculated.
<- cdlpivot %>%
cdlperc group_by(ObjectID, year) %>%
mutate(tot = sum(cnt),
perc_crop = 100 * cnt / tot)
cdlperc## # A tibble: 32 × 6
## # Groups: ObjectID, year [8]
## ObjectID croptype year cnt tot perc_crop
## <dbl> <fct> <dbl> <int> <int> <dbl>
## 1 38 Corn 2010 42810 145244 29.5
## 2 38 Corn 2020 53551 145244 36.9
## 3 39 Corn 2010 22745 65739 34.6
## 4 39 Corn 2020 26070 65739 39.7
## 5 40 Corn 2010 35535 108985 32.6
## 6 40 Corn 2020 37192 108985 34.1
## 7 41 Corn 2010 6603 40354 16.4
## 8 41 Corn 2020 8636 40354 21.4
## 9 38 Soybeans 2010 48192 145244 33.2
## 10 38 Soybeans 2020 43019 145244 29.6
## # … with 22 more rows
Note that the same code can also be implemented more concisely as a single block of piped functions.
<- cdlsum10 %>%
cdlperc inner_join(cdlsum20,
by = c("ObjectID" = "ObjectID",
"X2010" = "X2020")) %>%
mutate(croptype = factor(X2010,
labels = newnames),
ObjectID = as.numeric(ObjectID)) %>%
rename("cnt2010" = "n.x",
"cnt2020" = "n.y") %>%
pivot_longer(cols = starts_with("cnt"),
names_to = "year",
values_to = "cnt") %>%
mutate(year = parse_number(year)) %>%
group_by(ObjectID, year) %>%
mutate(tot = sum(cnt),
perc_crop = 100 * cnt / tot)
The resulting table can be joined to the data frame containing the subwatershed polygons and used to map the percent of each crop type by subwatershed in 2010 and 2020 (Figure 10.6). The facet_wrap()
function with the formula year ~ croptype
is used to create a layout with years as rows and crop types as columns.
<- left_join(subwsh_ndc,
subwsh_perc
cdlperc, by = "ObjectID")
ggplot(data = subwsh_perc) +
geom_sf(aes(fill = perc_crop)) +
scale_fill_distiller(name="Percent",
palette = "YlGn",
breaks = pretty_breaks(),
direction = 1) +
facet_wrap(year ~ croptype,
ncol = 4) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
10.3 Summarizing Land Cover With Stream Buffers
The stream lines from the NHD are read in as three-dimensional data with Z coordinates, and they need to be converted to two-dimensional data before we can work with them. This step is accomplished using the st_zm()
function. The st_intersection()
function is then used to clip the streams to the North Deer Creek watershed boundary, and the result is plotted (Figure 10.7).
<- st_zm(streams_aea)
streams_aea <- st_intersection(streams_aea, wshed_ndc)
streams_ndc
ggplot() +
geom_sf(data = subwsh_ndc,
color = "black",
size = 1) +
geom_sf(data = streams_ndc,
color = "blue") +
theme_void()
Stream buffers are computed with the st_buffer()
function. The dist
argument provides the buffer width in map units, which in this case are meters. The output contains a separate buffer polygon for each line segment in the stream network (Figure 10.8).
<- st_buffer(streams_ndc, dist = 100)
streams_buff
ggplot() +
geom_sf(data = wshed_ndc,
color = "black",
size = 1) +
geom_sf(data = streams_buff,
color = "blue") +
theme_void()
The st_union()
function is used to combine the separate buffer polygons into a single polygon. Note that st_union()
creates a simple feature geometry list column (sfc
) that needs to be converted into an an sf
object before undertaking further analysis.
<- st_union(streams_buff)
streams_comb class(streams_comb)
## [1] "sfc_MULTIPOLYGON" "sfc"
<- st_sf(streams_comb)
streams_comb class(streams_comb)
## [1] "sf" "data.frame"
Where streams are close to the watershed boundary, their buffers may extend outside of the watershed. The st_intersection()
function combines the stream buffers with the North Deer Creek watershed polygon to create a polygon encompassing all stream buffers inside the watershed. The st_difference()
function similarly creates a polygon encompassing the parts of the North Deer Creek watershed that are outside the stream buffers.
<- st_intersection(streams_comb, wshed_ndc)
riparian <- st_difference(wshed_ndc, streams_comb) upland
The stream buffers can be mapped as polygon boundaries overlaid on top of the reclassified land cover dataset (Figure 10.9).
ggplot(data = cdl_df) +
geom_raster(aes(x = x,
y = y,
fill = as.character(value))) +
scale_fill_manual(name = "Land cover",
values = newcols2,
labels = newnames,
na.translate = FALSE) +
facet_wrap(facets = vars(variable), ncol = 2) +
geom_sf(data = streams_comb,
color = "darkblue",
fill = NA) +
coord_sf(expand = F) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
Two new raster datasets are created by masking the CDL dataset with the riparian and upland vector datasets.
<- rasterize(vect(riparian), cdl_rc)
riparian_mask <- mask(cdl_rc, riparian_mask)
cdl_rip <- rasterize(vect(upland), cdl_rc)
upland_mask <- mask(cdl_rc, upland_mask) cdl_up
The freq()
function is used to compute the pixel counts in the resulting riparian and upland raster datasets.
<- as_tibble(freq(cdl_rip))
freq_rip
freq_rip## # A tibble: 8 × 3
## layer value count
## <dbl> <dbl> <dbl>
## 1 1 1 26599
## 2 1 2 26396
## 3 1 3 45230
## 4 1 4 12117
## 5 2 1 29395
## 6 2 2 24737
## 7 2 3 40868
## 8 2 4 15342
<- as_tibble(freq(cdl_up))
freq_up
freq_up## # A tibble: 8 × 3
## layer value count
## <dbl> <dbl> <dbl>
## 1 1 1 81094
## 2 1 2 93025
## 3 1 3 38507
## 4 1 4 37354
## 5 2 1 96054
## 6 2 2 85418
## 7 2 3 30321
## 8 2 4 38187
The outputs are combined using bind_rows()
into a single data frame that contains pixel counts for each land cover class for the buffer and upland zones. This function will only work if the two data frames have the exact same column specifications, and it “stacks” them on top of one another. The landtype
column is created to identify the riparian and upland records, and the year
and croptype
columns are derived from the layer
and value
columns output by the freq()
function.
The pixel counts are used to calculate the percent cover for each subwatershed. First, the data are grouped by landtype
(stream buffer versus upland), and then the mutate()
function is run on the grouped data. Note that the results are different from the summarize()
function. The mutate()
function replicates the results of a summary (in this case, the sum()
function) for each record within a group rather than returning a single value for each group. Having the grouped total of pixel counts on each line makes it straightforward to calculate the percent cover of each class.
<- freq_rip %>%
cdl_chng bind_rows(freq_up) %>%
mutate(landtype = c(rep("Riparian", 8), rep("Upland", 8)),
year = factor(layer,
labels = c("2010", "2020")),
croptype = factor(value,
labels = newnames)) %>%
select(-layer, -value) %>%
group_by(landtype) %>%
mutate(totarea = sum(count),
percarea = 100 * count / totarea)
cdl_chng## # A tibble: 16 × 6
## # Groups: landtype [2]
## count landtype year croptype totarea percarea
## <dbl> <chr> <fct> <fct> <dbl> <dbl>
## 1 26599 Riparian 2010 Corn 220684 12.1
## 2 26396 Riparian 2010 Soybeans 220684 12.0
## 3 45230 Riparian 2010 Grass 220684 20.5
## 4 12117 Riparian 2010 Other 220684 5.49
## 5 29395 Riparian 2020 Corn 220684 13.3
## 6 24737 Riparian 2020 Soybeans 220684 11.2
## 7 40868 Riparian 2020 Grass 220684 18.5
## 8 15342 Riparian 2020 Other 220684 6.95
## 9 81094 Upland 2010 Corn 499960 16.2
## 10 93025 Upland 2010 Soybeans 499960 18.6
## 11 38507 Upland 2010 Grass 499960 7.70
## 12 37354 Upland 2010 Other 499960 7.47
## 13 96054 Upland 2020 Corn 499960 19.2
## 14 85418 Upland 2020 Soybeans 499960 17.1
## 15 30321 Upland 2020 Grass 499960 6.06
## 16 38187 Upland 2020 Other 499960 7.64
These data can be graphed as a barchart to compare the distribution of crop types by year and by riparian or upland landscape zone (Figure 10.10). Riparian areas have more grass cover relative to corn and soybeans than upland areas. However, the area of grass cover has decreased in both riparian and upland areas between 2010 and 2020.
ggplot(data = cdl_chng) +
geom_bar(aes(y = percarea,
x = year,
fill = croptype),
color = "black",
position = "dodge",
stat = "identity") +
scale_fill_manual(name = "Land Cover",
values = newcols) +
facet_wrap(facets = vars(landtype),
ncol = 2) +
labs(x = "Landscape Zone",
y = "% Area") +
theme_bw()
10.4 Summarizing Land Cover With Point Buffers
For this example, a set of twenty random points is generated within the North Deer Creek watershed. These could represent vegetation plots, wildlife sampling points, or the locations of other field-based measurements. Like the st_buffer()
function, st_sample()
generates an sfc
object that must be converted to an sf
object.
set.seed(32145)
<- st_sample(wshed_ndc, size = 20)
plots class(plots)
## [1] "sfc_POINT" "sfc"
<- st_as_sf(plots)
plots class(plots)
## [1] "sf" "data.frame"
The st_sample()
function uses a random number generator to produce x and y coordinates for the plot locations. The set.seed()
function provides a starting seed for the random number generator so that it will produce the same results every time the code is run. If no random number seed is provided, it is derived from the current time in the computer’s clock, and the results will be different every time the code is run.
The locations of these plots are plotted on top of the crop type data (Figure 10.11).
ggplot(data = cdl_df) +
geom_raster(aes(x = x, y = y, fill = as.character(value))) +
scale_fill_manual(name = "Land cover",
values = newcols2,
labels = newnames,
na.translate = FALSE) +
geom_sf(data = plots, fill = NA) +
facet_wrap(facets = vars(variable)) +
coord_sf(expand = F) +
theme_void() +
theme(strip.text.x = element_text(size=12, face="bold"))
To summarize the percent cover of each land cover class within a buffer around each point, buffer polygons are created with a 200 m radius around each point.
st_geometry_type(plots, by_geometry = FALSE)
## [1] POINT
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON ... TRIANGLE
<- st_buffer(plots, dist = 200)
plot_poly st_geometry_type(plot_poly, by_geometry = FALSE)
## [1] POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON ... TRIANGLE
To summarize the percent cover of each land cover class within a buffer around each point, the segregate()
function is used to convert the classified raster data into a stack of indicator (1, 0) rasters. The proportion of each crop type within each buffer is computed using the extract()
function with the fun = mean
argument.
<- segregate(cdl_rc[["2020"]])
cdl_stk names(cdl_stk) <- newnames
<- extract(cdl_stk,
plots_cdl vect(plot_poly ),
fun = mean,
na.rm = T)
The resulting data frame has one row for each plot and one column for each crop type.
plots_cdl## ID Corn Soybeans Grass Other
## 1 1 1.000000000 0.00000000 0.00000000 0.000000000
## 2 2 0.000000000 1.00000000 0.00000000 0.000000000
## 3 3 0.582733813 0.00000000 0.10791367 0.309352518
## 4 4 0.425531915 0.04255319 0.31914894 0.212765957
## 5 5 0.028571429 0.52142857 0.43571429 0.014285714
## 6 6 0.042857143 0.81428571 0.04285714 0.100000000
## 7 7 0.944055944 0.00000000 0.02797203 0.027972028
## 8 8 0.021276596 0.68085106 0.18439716 0.113475177
## 9 9 0.728571429 0.23571429 0.00000000 0.035714286
## 10 10 0.000000000 0.00000000 0.44927536 0.550724638
## 11 11 0.228571429 0.01428571 0.75000000 0.007142857
## 12 12 0.007142857 0.00000000 0.00000000 0.992857143
## 13 13 0.223076923 0.13076923 0.50000000 0.146153846
## 14 14 0.453237410 0.29496403 0.03597122 0.215827338
## 15 15 0.214285714 0.30000000 0.47142857 0.014285714
## 16 16 0.014285714 0.92857143 0.00000000 0.057142857
## 17 17 0.471014493 0.03623188 0.31159420 0.181159420
## 18 18 0.000000000 1.00000000 0.00000000 0.000000000
## 19 19 0.000000000 0.90647482 0.00000000 0.093525180
## 20 20 0.421428571 0.08571429 0.28571429 0.207142857
It is converted to a long format with all the crop type proportions in a single column and crop type names in a separate column.
<- plots_cdl %>%
plots_long pivot_longer(Corn:Other,
names_to = "croptype",
values_to = "percarea")
plots_long## # A tibble: 80 × 3
## ID croptype percarea
## <dbl> <chr> <dbl>
## 1 1 Corn 1
## 2 1 Soybeans 0
## 3 1 Grass 0
## 4 1 Other 0
## 5 2 Corn 0
## 6 2 Soybeans 1
## 7 2 Grass 0
## 8 2 Other 0
## 9 3 Corn 0.583
## 10 3 Soybeans 0
## # … with 70 more rows
The distribution of crop type proportions within the plot buffer can be visualized as a boxplot (Figure 10.12).
ggplot(data = plots_long) +
geom_boxplot(aes(x = croptype,
y = percarea)) +
xlab("Land Cover Class") +
ylab("Cover Proportion") +
theme_bw()
10.5 Practice
Repeat the stream buffer analysis for the Hidewood Creek watershed using a 200 m stream buffer width.
Generate 10 random plot locations in the North Deer Creek watershed using a different random number seed (12345). Reclassify the CDL data into three classes: 1) Corn plus soybean, 2) grass, and 3) all other crop types and generate a boxplot of their distributions within 300 m buffers around the plots.