5 Vector Geospatial Data

Chapter section list

Import geospatial data: Section 5.1
Creating simple maps: Section 5.2
Overlaying vector datasets: Section 5.3
Save spatial geodata files: Section 5.4
Choropleth maps: Section 5.5
Modifying map appearance: Section 5.6
Exporting graphics output: Section 5.7
Resources: Section 5.8
Practice

5.1 Import Geospatial Data

5.1.1 ESRI shapefile format

The data for import in chapter 5 are provided in ESRI shapefile format. This format was developed several decades ago but remains one of the widely used file formats for vector geospatial data. It is a multiple file format, where separate files contain the feature geometries, attribute table, spatial indices, and coordinate reference system.

R Code 5.1 : Import Geospatial Data

Code

glue::glue("############### import esri data #############")
okcounty <- sf::st_read("data/Chapter5/ok_counties.shp", quiet = TRUE)
tpoint <- sf::st_read("data/Chapter5/ok_tornado_point.shp", quiet = TRUE)
tpath <- sf::st_read("data/Chapter5/ok_tornado_path.shp", quiet = TRUE)

glue::glue("")
glue::glue("############### show data class #############")
class(okcounty)

glue::glue("")
glue::glue("############### show data with dplyr #############")
dplyr::glimpse(okcounty)

#> ############### import esri data #############
#> 
#> ############### show data class #############
#> [1] "sf"         "data.frame"
#> 
#> ############### show data with dplyr #############
#> Rows: 77
#> Columns: 8
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
#> $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
#> $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
#> $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
#> $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…

The {sf} objects contain a column called geometry. This is a special column that contains the geospatial information about the location of each feature. This column should not be modified directly. It is used by the functions in the {sf} package for geospatial data processing.

Note 5.1: Using {skimr} with {sf}

Normally I am using the skimr::skim() function for data summary. But for the {sf} data classes in the geometry column are no skimmers available. (Possible data types are: sfc_POINT, sfc_LINESTRING, sfc_POLYGON, sfc_MULTIPOINT, sfc_MULTILINESTRING, sfc_MULTIPOLYGON, and sfc_GEOMETRY.) In the above case the class(okcounty$geometry) = “sfc_POLYGON, sfc” and not user-defined for {skimr} The fall back to the “character” class is not useful. (sfc stands for “simple feature list column”.)

It is possible to adapt {skimr} for working with user defined data types using skimr::skim_with(). Resources that explain how to do this are:

Defining sfl’s for a package: General article that explains how to generate and use with user defined data types. sflstands for “skimr function list”. It is a list-like data structure used to define custom summary statistics for specific data types.
skim of {sf} objects: Discussion specific to the {sf} package.

At the moment I do not understand enough about the {sf} package to get into more details for writing an appropriate function. I wonder if there is not already a solution available as spatial data processing with R and the {sf} package is not an extremely rare use case.

In the R package {sf} (Simple Features), many functions are prefixed with st_. The st_ prefix is inspired by PostGIS, which refers with the abbreviation to “spatial type”. This prefix is used consistently throughout {sf} to indicate that a function operates on spatial data. In the context of {sf}, st_ serves as a namespace for spatial functions, allowing developers and users to easily identify and find functions related to spatial operations. This prefixing convention makes it simple to discover and use spatial functions.

Looking at the file names I noticed: All files have the same filename with different extensions. There are always four files with the extensions .dbf, .prj, .shp, .shx.

The shapefiles are imported to {sf} objects using the sf::st_read() function. The quiet = TRUE argument suppresses output to the console when importing spatial datasets. It

An example for the output when using quit = FALSE (the default option) is:

Reading layer ok_counties' from data source/Users/petzi/Documents/Meine-Repos/GDSWR/data/Chapter5/ok_counties.shp’ using driver `ESRI Shapefile’
Simple feature collection with 77 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -103.0025 ymin: 33.62184 xmax: -94.43151 ymax: 37.00163
Geodetic CRS: NAD83

To read in a shapefile, it is only necessary to specify the filename with a .shp extension. However, all the files, including the .shp file as well as the .dbf, .shx, and .prj files, need to be present in the directory from which the data are read.

The ok_counties.shp dataset contains county boundaries for the state of Oklahoma.
The ok_tornado_point.shp dataset and the ok_tornado_path.shp dataset contain historical information about tornadoes in Oklahoma.
- The points are the initial locations of tornado touchdown.
- The paths are lines that identify the path of each tornado after touchdown.
These data were derived from larger, national-level datasets generated by the National Oceanographic and Atmospheric Administration (NOAA) National Weather Service Storm Prediction Center.

5.1.2 Conversion data sf <-> sp

R Code 5.2 : {sf} data to {sp} data and reverse

Code

glue::glue("############### convert from sf to sp data #############")
okcounty_sp <- sf::as_Spatial(okcounty) # sf::as(okcounty, 'Spatial') does not work!
class(okcounty_sp)

glue::glue("")
glue::glue("############### convert from sp to sf data #############")
okcounty_sf <- sf::st_as_sf(okcounty_sp)
class(okcounty_sf)

#> ############### convert from sf to sp data #############
#> [1] "SpatialPolygonsDataFrame"
#> attr(,"package")
#> [1] "sp"
#> 
#> ############### convert from sp to sf data #############
#> [1] "sf"         "data.frame"

5.2 Creating simple maps

A good strategy to get an overview about the data is to plot the data as map. There are two options: Using ggplot2::geom_sf() or base::plot().

5.2.1 Draw Oklahoma county boundaries

To generate a map of counties using ggplot2::ggplot() with a {sf} object, the ggplot2::geom_sf() function is used.

From the view of the {ggplot2} package the ggplot2::geom_sf() is an unusual geom because it will draw different geometric objects depending on what simple features are present in the data: you can get points, lines, or polygons. For text and labels, you can use ggplot2::geom_sf_text() and ggplot2::geom_sf_label().

Code Collection 5.1 : Plotting Oklahoma county boundaries

ggplot2
base::plot()

R Code 5.3 : Oklahoma county boundaries with {ggplot2}

Code

ggplot2::ggplot(data = okcounty) +
  ggplot2::geom_sf(fill = NA) +
  ggplot2::theme_void()

Graph 5.1: Oklahoma county boundaries plotted with {ggplot2}

fill = NA makes the counties transparent.

(To get the same result as in the base R approach I used ggplot2::theme_void() to hide the coordinates which is shown in the original book example.)

R Code 5.4 : Oklahoma county boundaries with base::plot()

Code

graphics::par(mar = c(0, 0, 0, 0))
base::plot(okcounty$geometry)

Graph 5.2: Oklahoma county boundaries plotted with base::plot()

From R Graph Gallery I learend that I could also use bese R to plot spatial geodata. But everybody agrees that using {ggplot2} is the preferred approach.

Note 5.2: Too much space around cholorpleth map

As you can see from both graphics there is ample space aorund the map. I do not know how to remove it. Therefore I wrote a question on StackOverflow. I used a simple example provide by the {sf} package.

5.2.2 Inspect `tpoint` and `tpath`

Because {sf} objects are a type of data frame, they can be modified using the normal {tidyverse} functions. Let’s look at the two other R objects we’ve generated in R Code 5.1.

Code Collection 5.2 : Display structure of the tornado files

tpoint
tpath

R Code 5.5 : Glimpse at tpoint

Code

dplyr::glimpse(tpoint)

#> Rows: 4,092
#> Columns: 23
#> $ om       <dbl> 192, 27, 38, 57, 60, 61, 50, 52, 96, 108, 113, 117, 119, 76, …
#> $ yr       <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
#> $ mo       <dbl> 10, 2, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ dy       <dbl> 1, 27, 27, 28, 28, 28, 2, 3, 11, 16, 22, 24, 29, 4, 4, 4, 7, …
#> $ date     <chr> "1950-10-01", "1950-02-27", "1950-03-27", "1950-04-28", "1950…
#> $ time     <chr> "21:00:00", "10:20:00", "03:00:00", "14:17:00", "19:05:00", "…
#> $ tz       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ st       <chr> "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "…
#> $ stf      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
#> $ stn      <dbl> 23, 1, 2, 5, 6, 7, 3, 4, 15, 16, 17, 18, 19, 8, 9, 10, 11, 12…
#> $ mag      <dbl> 1, 2, 2, 3, 4, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1…
#> $ inj      <dbl> 0, 0, 0, 1, 32, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, …
#> $ fat      <dbl> 0, 0, 0, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ loss     <dbl> 4, 4, 3, 5, 5, 4, 4, 3, 2, 3, 0, 4, 2, 4, 3, 5, 0, 4, 3, 4, 3…
#> $ closs    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ slat     <dbl> 36.73, 35.55, 34.85, 34.88, 35.08, 34.55, 35.82, 36.13, 36.82…
#> $ slon     <dbl> -102.52, -97.60, -95.75, -99.28, -96.40, -96.20, -97.02, -95.…
#> $ elat     <dbl> 36.8800, 35.5501, 34.8501, 35.1700, 35.1300, 34.5501, 35.8201…
#> $ elon     <dbl> -102.3000, -97.5999, -95.7499, -99.2000, -96.3500, -96.1999, …
#> $ len      <dbl> 15.8, 2.0, 0.1, 20.8, 4.5, 0.8, 1.0, 1.0, 0.5, 7.3, 1.5, 1.0,…
#> $ wid      <dbl> 10, 50, 77, 400, 200, 100, 100, 33, 77, 100, 100, 33, 33, 293…
#> $ fc       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ geometry <POINT [°]> POINT (-102.52 36.73), POINT (-97.6 35.55), POINT (-95.…

The points are the initial locations of tornado touchdowns.

R Code 5.6 : Glimpse at tpath

Code

dplyr::glimpse(tpath)

#> Rows: 4,092
#> Columns: 23
#> $ om       <dbl> 192, 27, 38, 57, 60, 61, 50, 52, 96, 108, 113, 117, 119, 76, …
#> $ yr       <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
#> $ mo       <dbl> 10, 2, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
#> $ dy       <dbl> 1, 27, 27, 28, 28, 28, 2, 3, 11, 16, 22, 24, 29, 4, 4, 4, 7, …
#> $ date     <chr> "1950-10-01", "1950-02-27", "1950-03-27", "1950-04-28", "1950…
#> $ time     <chr> "21:00:00", "10:20:00", "03:00:00", "14:17:00", "19:05:00", "…
#> $ tz       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ st       <chr> "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "OK", "…
#> $ stf      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
#> $ stn      <dbl> 23, 1, 2, 5, 6, 7, 3, 4, 15, 16, 17, 18, 19, 8, 9, 10, 11, 12…
#> $ mag      <dbl> 1, 2, 2, 3, 4, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1…
#> $ inj      <dbl> 0, 0, 0, 1, 32, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, …
#> $ fat      <dbl> 0, 0, 0, 1, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ loss     <dbl> 4, 4, 3, 5, 5, 4, 4, 3, 2, 3, 0, 4, 2, 4, 3, 5, 0, 4, 3, 4, 3…
#> $ closs    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ slat     <dbl> 36.73, 35.55, 34.85, 34.88, 35.08, 34.55, 35.82, 36.13, 36.82…
#> $ slon     <dbl> -102.52, -97.60, -95.75, -99.28, -96.40, -96.20, -97.02, -95.…
#> $ elat     <dbl> 36.8800, 35.5501, 34.8501, 35.1700, 35.1300, 34.5501, 35.8201…
#> $ elon     <dbl> -102.3000, -97.5999, -95.7499, -99.2000, -96.3500, -96.1999, …
#> $ len      <dbl> 15.8, 2.0, 0.1, 20.8, 4.5, 0.8, 1.0, 1.0, 0.5, 7.3, 1.5, 1.0,…
#> $ wid      <dbl> 10, 50, 77, 400, 200, 100, 100, 33, 77, 100, 100, 33, 33, 293…
#> $ fc       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
#> $ geometry <LINESTRING [°]> LINESTRING (-102.52 36.73, ..., LINESTRING (-97.6 …

The paths are lines that identify the path of each tornado after touchdown.

From dplyr::glimpse() we get an idea about the data structure. But we do not know the numeric span covered by the variable. This is especially important for our next task to focus on data from the last five years. We know from Code Collection 5.2 that the dataset starts with the year 1950 but we have no clue about the middle or end of the dataset.

For this reason I have designed a special functions that returns the first and last dataset and several random data. The default number of data shown is eight but this can be changed using a second parameter.

Code Collection 5.3 : Show some random tornado data, including first and last record

tpoint
tpath

R Code 5.7 : Show random tpoint data

Code

pb_glance_data(tpoint)

#> Simple feature collection with 10 features and 23 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -102.52 ymin: 34.23 xmax: -95.774 ymax: 36.8768
#> Geodetic CRS:  NAD83
#>     obs     om   yr mo dy       date     time tz st stf stn mag inj fat  loss
#> 1     1    192 1950 10  1 1950-10-01 21:00:00  3 OK  40  23   1   0   0 4e+00
#> 2   634    519 1960  8  9 1960-08-09 12:00:00  3 OK  40  96   0   0   0 1e+00
#> 3  1098    261 1969  6 11 1969-06-11 15:00:00  3 OK  40  18   0   0   0 0e+00
#> 4  1177    188 1971  4 26 1971-04-26 19:56:00  3 OK  40  11   0   0   0 0e+00
#> 5  1252    189 1973  4 19 1973-04-19 23:45:00  3 OK  40  11   2   2   0 6e+00
#> 6  2369    371 1996  3 24 1996-03-24 12:55:00  3 OK  40   1   0   0   0 0e+00
#> 7  2609   1209 1999  5 31 1999-05-31 19:24:00  3 OK  40 110   1   0   0 1e-02
#> 8  3218    300 2010  5 10 2010-05-10 18:28:00  3 OK  40  47   1   0   0 3e-03
#> 9  4069 619637 2021  1 30 2021-01-30 13:46:00  3 OK  40   0  -9   0   0 0e+00
#> 10 4092 620265 2021  7 10 2021-07-10 18:25:00  3 OK  40   0   1   0   0 5e+04
#>    closs    slat      slon    elat      elon  len wid fc
#> 1      0 36.7300 -102.5200 36.8800 -102.3000 15.8  10  0
#> 2      0 34.2500  -98.2300 34.2501  -98.2299  0.1  10  1
#> 3      0 36.4500  -98.0200 36.4501  -98.0199  0.1  10  0
#> 4      0 36.6000  -96.1000 36.6001  -96.0999  0.1  10  0
#> 5      0 34.7000  -97.3000 34.8700  -97.0800 16.8 250  0
#> 6      0 34.2300  -97.2700 34.2300  -97.2700  0.1  25  0
#> 7      0 34.7700  -99.4500 34.7000  -99.4000  2.5 100  0
#> 8      0 35.4239  -95.7757 35.4354  -95.7349  2.6 600  0
#> 9      0 36.8768  -95.7740 36.8794  -95.7679  0.4  75  0
#> 10     0 36.0880  -96.7200 36.0730  -96.7040  1.4 100  0
#>                    geometry
#> 1     POINT (-102.52 36.73)
#> 2      POINT (-98.23 34.25)
#> 3      POINT (-98.02 36.45)
#> 4        POINT (-96.1 36.6)
#> 5        POINT (-97.3 34.7)
#> 6      POINT (-97.27 34.23)
#> 7      POINT (-99.45 34.77)
#> 8  POINT (-95.7757 35.4239)
#> 9   POINT (-95.774 36.8768)
#> 10    POINT (-96.72 36.088)

R Code 5.8 : Show random tpath data

Code

pb_glance_data(tpath)

#> Simple feature collection with 10 features and 23 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: -102.52 ymin: 34.23 xmax: -95.7349 ymax: 36.88
#> Geodetic CRS:  NAD83
#>     obs     om   yr mo dy       date     time tz st stf stn mag inj fat  loss
#> 1     1    192 1950 10  1 1950-10-01 21:00:00  3 OK  40  23   1   0   0 4e+00
#> 2   634    519 1960  8  9 1960-08-09 12:00:00  3 OK  40  96   0   0   0 1e+00
#> 3  1098    261 1969  6 11 1969-06-11 15:00:00  3 OK  40  18   0   0   0 0e+00
#> 4  1177    188 1971  4 26 1971-04-26 19:56:00  3 OK  40  11   0   0   0 0e+00
#> 5  1252    189 1973  4 19 1973-04-19 23:45:00  3 OK  40  11   2   2   0 6e+00
#> 6  2369    371 1996  3 24 1996-03-24 12:55:00  3 OK  40   1   0   0   0 0e+00
#> 7  2609   1209 1999  5 31 1999-05-31 19:24:00  3 OK  40 110   1   0   0 1e-02
#> 8  3218    300 2010  5 10 2010-05-10 18:28:00  3 OK  40  47   1   0   0 3e-03
#> 9  4069 619637 2021  1 30 2021-01-30 13:46:00  3 OK  40   0  -9   0   0 0e+00
#> 10 4092 620265 2021  7 10 2021-07-10 18:25:00  3 OK  40   0   1   0   0 5e+04
#>    closs    slat      slon    elat      elon  len wid fc
#> 1      0 36.7300 -102.5200 36.8800 -102.3000 15.8  10  0
#> 2      0 34.2500  -98.2300 34.2501  -98.2299  0.1  10  1
#> 3      0 36.4500  -98.0200 36.4501  -98.0199  0.1  10  0
#> 4      0 36.6000  -96.1000 36.6001  -96.0999  0.1  10  0
#> 5      0 34.7000  -97.3000 34.8700  -97.0800 16.8 250  0
#> 6      0 34.2300  -97.2700 34.2300  -97.2700  0.1  25  0
#> 7      0 34.7700  -99.4500 34.7000  -99.4000  2.5 100  0
#> 8      0 35.4239  -95.7757 35.4354  -95.7349  2.6 600  0
#> 9      0 36.8768  -95.7740 36.8794  -95.7679  0.4  75  0
#> 10     0 36.0880  -96.7200 36.0730  -96.7040  1.4 100  0
#>                          geometry
#> 1  LINESTRING (-102.52 36.73, ...
#> 2  LINESTRING (-98.23 34.25, -...
#> 3  LINESTRING (-98.02 36.45, -...
#> 4  LINESTRING (-96.1 36.6, -96...
#> 5  LINESTRING (-97.3 34.7, -97...
#> 6  LINESTRING (-97.27 34.23, -...
#> 7  LINESTRING (-99.45 34.77, -...
#> 8  LINESTRING (-95.7757 35.423...
#> 9  LINESTRING (-95.774 36.8768...
#> 10 LINESTRING (-96.72 36.088, ...

5.2.3 Visualization of the Oklahoma tornado data (2016-2021)

Because {sf} objects are a type of data frame, they can be modified using the normal {tidyverse} functions.

A reduced dataset for the years 2016-2021 and only with the columns ID (om), the year (yr), and the date (date) and is prepared in the first tab reduce data.
Initiation points of tornadoes in Oklahoma from 2016–2021 is shown in tab initiation points.
Tab tornados path shows the paths of tornadoes in Oklahoma from 2016–2021.
Initiation points of tornadoes in Oklahoma from 2016–2021 with years represented by the color aesthetic is in tab color aesthetic.
In the final tab facets you will see the initiation points of tornadoes in Oklahoma from 2016–2021 with years mapped as separate facets.

Code Collection 5.4 : Show different visualization of the Oklahoma tornado data (2016-2021)

R Code 5.9 : Filter data from 2016 to 2021 and select only three columns (ID, year and date)

Code

tpoint_16_21 <- tpoint |> 
  dplyr::filter(yr >= 2016 & yr <= 2021) |> 
  dplyr::select(om, yr, date)

tpath_16_21 <- tpath |> 
  dplyr::filter(yr >= 2016 & yr <= 2021)  |> 
  dplyr::select(om, yr, date)

(For this R code chunk is no output available)

R Code 5.10 : Show initiation points of tornadoes in Oklahoma from 2016–2021

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::geom_sf(data = tpoint_16_21)

Graph 5.3: Initiation points of tornadoes in Oklahoma from 2016–2021.

Because each function maps a different dataset, the data argument must be provided in each ggplot2::geom_sf() function instead of in the ggplot2::ggplot() function.
I am using as default theme the ggplot2::theme_bw() function (see setup chunk of this chapter) to display the map over a white background while retaining the graticules.

R Code 5.11 : Show tornadoes paths in Oklahoma from 2016–2021

Code

ggplot2::ggplot() +                              
  ggplot2::geom_sf(data = okcounty, fill = NA) + 
  ggplot2::geom_sf(data = tpath_16_21,           
          color = "red",                         
          size = 1)

Graph 5.4: Paths of tornadoes in Oklahoma from 2016-2021.

To make the tornado path lines easier to see in relation to the county boundaries, they are displayed in red and their sizes are increased to be larger (size = 1) than the default line width of 0.5.

R Code 5.12 : Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = tpoint_16_21, 
          ggplot2::aes(color = forcats::as_factor(yr))) + # (1)
  ggplot2::geom_sf(data = okcounty, fill = NA) +
# ggplot2::scale_color_discrete(name = "Year") +          # (2)
  ggokabeito::scale_color_okabe_ito(name = "Year") +      # (2)
  ggplot2::coord_sf(datum = NA) +                         # (3)
  ggplot2::theme_void()                                   # (3)

Graph 5.5: Initiation points of tornadoes in Oklahoma from 2016-2021 with years represented by the color aesthetic.

To view the years of the tornadoes on the map, an aesthetic can be specified.

Line Comment 1: In the book the color argument is specified as base::as.factor(yr) so that the year is displayed as a discrete variable instead of a continuous variable. Instead of the base function I have used forcats::as_factor(yr).

Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform. (Base R sorts in the current locale which can vary from place to place.) (from the {forcats)} help file).

Line Comment 2: The ggplot2::scale_color_discrete() function is used to set the legend name. But the used (standard) color scale is not appropriate for people with color-vision deficiency (CVD). I have therefore used ggokabeito::scale_color_okabe_ito().

Line Comment 3: The book says that the ggplot2::theme_void() function removes the plot axes and labels and shows only the map. I suppose that this is not correct. ggplot2::coord_sf(datum = NA) removes the plot axes and labels; ggplot2::theme_void() removes the border frame around the graphic.

Code

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcounty, 
          fill = NA, 
          color = "gray") +
  ggplot2::geom_sf(data = tpoint_16_21, size = 0.75) +
  ggplot2::facet_wrap(ggplot2::vars(yr), ncol = 2) +
  ggplot2::coord_sf(datum = NA) +
  ggplot2::theme_void()

Alternately, ggplot2::facet_wrap() can be used to display the tornadoes for each year on a separate map. In comparison to the previous tab the sizes of the points are reduced slightly from the standard size = 1 to size = 0.75, so that they are better suited for the smaller maps.

Note 5.3

With the exception of the facet graphics there is too much horizontal space above and below the {sf} graphic. Is this a known problem? How to reduce the horizontal space for {sf} graphics plotted with {ggplot2}?

Solution: Remove empty space in maps

I found a solution after posting the question in StackOverflow: I need to set the figure size in the quarto chunk options so your figure has the right aspect ratio in the document. As far as I can see there are two options:

Reducing the heigt of the figure from its standard height of 5 inches. For instance to three inches with fig-heigt: 3 in the quarto chunk option. See an example in R Code 5.20 or R Code 5.22. (To see the chunk options together with the code I have used echo: fenced for these two chunks.)
Changing the aspect ratio from 1 to a smaller value, for instance to 3/4 with ggplot2::theme(aspect.ratio = 3/4). See an example in R Code B.13.

5.3 Overlaying Vector Datasets

5.3.1 A first spatial join

The number of tornado points in each county can be calculated using the sf::st_join() function to carry out a spatial join. A spatial join with {sf} is different from the joinwith {dplyr}: sf::st_join() links rows from the two tables based on the spatial locations instead of their attributes.

In this case the functions compares the point coordinates of the tpoint_16_21 dataset in its geometry column with the polygon coordinates from the geometry column of the okcounty dataset. It joins tpoint_16_21 with the geometry row that includes the appropriate polygon from okcounty containing the point coordinates.

R Code 5.14 : Overlaying vector datasets with a spatial join

Code

countypnt <- sf::st_join(tpoint_16_21, okcounty)  

dplyr::glimpse(countypnt)

#> Rows: 434
#> Columns: 11
#> $ om       <dbl> 613662, 613675, 613676, 613677, 613678, 613727, 613728, 61372…
#> $ yr       <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2…
#> $ date     <chr> "2016-03-23", "2016-03-30", "2016-03-30", "2016-03-30", "2016…
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "001", "113", "105", "131", "035", "139", "139", "139", "139"…
#> $ COUNTYNS <chr> "01101788", "01101844", "01101840", "01101853", "01101805", "…
#> $ AFFGEOID <chr> "0500000US40001", "0500000US40113", "0500000US40105", "050000…
#> $ GEOID    <chr> "40001", "40113", "40105", "40131", "40035", "40139", "40139"…
#> $ NAME     <chr> "Adair", "Osage", "Nowata", "Rogers", "Craig", "Texas", "Texa…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ geometry <POINT [°]> POINT (-94.5042 35.6916), POINT (-96.0151 36.2151), POI…

5.3.2 Count tornados per county

Afterward, each row in countypnt data contains additional columns from the okcounty dataset that correspond to the county that the tornado with it point coordinates is within. The dataset contains one record for each tornado.

To compute the total number of tornadoes per county, countypnt must be grouped by the GEOID county code or by the county NAME (here by GEOID county code).

But before grouping and summarizing, countypnt must be converted from an {sf} object to a normal data frame using sf::st_drop_geometry().

R Code 5.15 : Count tornados per county

Code

glue::glue("#### show class before `sf::st_drop_geometry()` #####")
base::class(countypnt)
countypnt <- sf::st_drop_geometry(countypnt)
glue::glue("")
glue::glue("##### show class after `sf::st_drop_geometry()` ######")
base::class(countypnt)


countysum <- countypnt |> 
  dplyr::group_by(GEOID) |> 
  dplyr::summarize(tcnt = dplyr::n())  

glue::glue("")
glue::glue("##### glimpse at the new summarized data frame` ######")
dplyr::glimpse(countysum)

#> #### show class before `sf::st_drop_geometry()` #####
#> [1] "sf"         "data.frame"
#> 
#> ##### show class after `sf::st_drop_geometry()` ######
#> [1] "data.frame"
#> 
#> ##### glimpse at the new summarized data frame` ######
#> Rows: 75
#> Columns: 2
#> $ GEOID <chr> "40001", "40005", "40007", "40009", "40011", "40013", "40015", "…
#> $ tcnt  <int> 6, 3, 4, 8, 1, 4, 10, 5, 7, 5, 3, 12, 10, 5, 5, 1, 7, 9, 7, 8, 2…

5.3.3 Associate polygons with tornado counts

In the next step we join okcounty to countysum so that each polygon is associated with the appropriate tornado summary.

R Code 5.16 : Associate each polygon with the appropriate tornado summary

Code

countymap <- okcounty   |>
  dplyr::left_join(countysum, by = "GEOID")  |>        # (1)
  dplyr::mutate(tcnt = 
        base::ifelse(base::is.na(tcnt), 0, tcnt)) |>   # (2)
  dplyr::mutate(area = sf::st_area(okcounty),
         tdens = 10^3 * 10^3 * tcnt / area)   |>       # (3)
  units::drop_units()                                  # (4)


dplyr::glimpse(countymap)

#> Rows: 77
#> Columns: 11
#> $ STATEFP  <chr> "40", "40", "40", "40", "40", "40", "40", "40", "40", "40", "…
#> $ COUNTYFP <chr> "077", "025", "011", "107", "105", "153", "001", "053", "059"…
#> $ COUNTYNS <chr> "01101826", "01101800", "01101793", "01101841", "01101840", "…
#> $ AFFGEOID <chr> "0500000US40077", "0500000US40025", "0500000US40011", "050000…
#> $ GEOID    <chr> "40077", "40025", "40011", "40107", "40105", "40153", "40001"…
#> $ NAME     <chr> "Latimer", "Cimarron", "Blaine", "Okfuskee", "Nowata", "Woodw…
#> $ LSAD     <chr> "06", "06", "06", "06", "06", "06", "06", "06", "06", "06", "…
#> $ tcnt     <dbl> 1, 12, 1, 10, 6, 2, 6, 0, 4, 9, 3, 10, 12, 1, 2, 8, 13, 4, 5,…
#> $ geometry <POLYGON [°]> POLYGON ((-95.50766 35.0292..., POLYGON ((-103.0025 3…
#> $ area     <dbl> 1890663261, 4766283042, 2427121029, 1657249513, 1503893122, 3…
#> $ tdens    <dbl> 0.0005289149, 0.0025176851, 0.0004120108, 0.0060340944, 0.003…

Line comment 1: Using dplyr::left_join() instead of dplyr::inner_join() ensures that all of the county polygons are retained in the output of the join. (dplyr::inner_join() only keeps observations from x that have a matching key in y, whereas dplyr::left_join() keeps all observations in x.)

Line comment 2: If there are between 2016-2021 several tornados per county than we get several rows. But the reverse is also true: If a county has had no tornado in the years 2016-2021 this county gets NA values as the number of tornados.

As a matter of fact a few counties had no tornadoes during 2016–2021 and are therefore missing from countysum, resulting in NA values in the joined table. In this case, we know that NA means zero tornadoes, so the we must replace NA values by zeroes. I did this with the dplyr::mutate() function instead of base::replace(). Besides this approach does not need the . symbol of the {magrittr} packages (exporting into dplyr) for refering to the database (respectively its equivalent _ for the R pipe). See for details Note 3.1.

Line comment 3: The second dplyr::mutate() function computes the area of each county using sf::st_area() and then calculates the density of tornadoes per county. Density is initially in tornadoes per square meter but is converted to tornadoes per 1000 km^2.

Line comment 4: The sf::st_area() function returns a column with explicit measurement units, but these are removed using the units::drop_units() function for simplicity. For more information see the vignettes and help pages of the {units} package.

5.4 Save spatial geodata files

5.4.1 ESRI format

The sf::st_write() function can be used to save sf objects to a variety of file formats. In many cases, the function can determine the output format from the output filename extension. The following code saves the county-level tornado summaries in ESRI shapefile format. The append = FALSE option overwrites the shapefile if it already exists.

R Code 5.17 : Save spatial data files into ESRI format

Code

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornadosum.shp", 
         append = FALSE)

After a message what the script does

Writing layer oktornadosum' to data sourcedata/Chapter5/oktornadosum.shp’ using driver `ESRI Shapefile’ Writing 77 features with 10 fields and geometry type Polygon.

I got for every feature (= 77 rows) a warning message emitted by the GDAL library:

Warning: GDAL Message 1: Value 1890663260.74707699 of field area of feature 0 not successfully written. Possibly due to too larger number with respect to field width

It turned out that this is a misleading warning and that one should not use the old ESRI format but the newer and better Open Geospatial Consortium (OGC) GeoPackage format. See also StackOverflow and the answer from the {sf} developer:

The general recommendation is to not use shapefiles: the format is not an open standard, it has many limitations and modern formats are available. A good alternative is GeoPackage.

5.4.2 GeoPackage format

GeoPackage is also mentioned as an alternative in the book. The data are stored in an SQLite database that may contain one or more layers. In this example, the delete_dsn = TRUE argument overwrites the entire GeoPackage. Leaving this argument at its default value of FALSE would add a new layer to an existing database.

R Code 5.18 : Save spatial geodata in GeoPackage format

Code

sf::st_write(countymap, 
         dsn = "data/Chapter5/oktornado.gpkg", 
         layer = "countysum",
         delete_dsn = TRUE)

#> Deleting source `data/Chapter5/oktornado.gpkg' using driver `GPKG'
#> Writing layer `countysum' to data source 
#>   `data/Chapter5/oktornado.gpkg' using driver `GPKG'
#> Writing 77 features with 10 fields and geometry type Polygon.

5.4.3 GeoJSON format

Another commonly-used open geospatial data format is GeoJSON. It is based on Javascript Object Notation (JSON), a human-readable text format that stores data in ASCII files. The layer_options argument must be set to “RFC7946 = YES” to save the data in the newest GeoJSON specification.

R Code 5.19 : Save spatial geodata in GeoJSON format

Code

sf::st_write(obj = countymap, 
             dsn = "data/Chapter5/oktornado.geojson", 
             layer_options = "RFC7946 = YES",
             delete_dsn = TRUE)

#> Deleting source `data/Chapter5/oktornado.geojson' using driver `GeoJSON'
#> Writing layer `oktornado' to data source 
#>   `data/Chapter5/oktornado.geojson' using driver `GeoJSON'
#> options:        RFC7946 = YES 
#> Writing 77 features with 10 fields and geometry type Polygon.

Here again I had to add delete_dsn = TRUE (append = FALSE did not work for this format!). Otherwise I would get an error message that the dataset already exists.

5.5 Choropleth Maps

5.5.1 Filling with colors (standard)

Another way to display the tornadoes is as a choropleth map, where summary statistics for each county are displayed as different colors. The county-level tornado density can be as a choropleth using the fill aesthetic with ggplot2::geom_sf(). By default, the fill colors are based on a dark-to-light blue color ramp. The ggplot2::theme_void() function eliminates the axes and graticules and displays only the map on a white background.

R Code 5.20 : Densities of tornadoes mapped as a choropleth.

Code

```{r}
#| label: fig-choropleth-filled-colors
#| fig-cap: "Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth."
#| fig-height: 3

ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens)) +
  ggplot2::theme_void() +
  ggplot2::coord_sf()
```

Graph 5.7: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth.

5.5.2 Mapping symbols

To map symbols, the county polygons must first be converted to points. The sf::st_centroid() generates a point feature located at the centroid of each county. The sf::st_geometry_type() function returns the feature geometry type. Setting by_geometry = FALSE returns one geometry type for the entire dataset instead of for every feature.

R Code 5.21 : Convert county polygons to points

Code

glue::glue("##### Return geometry type before converted to points #####")
sf::st_geometry_type(okcounty, by_geometry = FALSE)


############# Return the centroid of a geometry
okcntrd = sf::st_centroid(countymap)

#> Warning: st_centroid assumes attributes are constant over geometries

Code

glue::glue("")
glue::glue("##### Return geometry type after converted to points #####")
sf::st_geometry_type(okcntrd, by_geometry = FALSE)

#> ##### Return geometry type before converted to points #####
#> [1] POLYGON
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE
#> 
#> ##### Return geometry type after converted to points #####
#> [1] POINT
#> 18 Levels: GEOMETRY POINT LINESTRING POLYGON MULTIPOINT ... TRIANGLE

Note 5.4: How to get rid of the warning?

At the moment I do not know how to suppress the warning. Possible pointers to solve this problem are:

sf GitHub: suppress specific warning message
Spatial Data Science: Chapter 5

When, while manipulating geometries, attribute values are retained unmodified, support problems may arise. If we look into a simple case of replacing a county polygon with the centroid of that polygon on a dataset that has attributes, we see that R package sf issues a warning:

Warning: st_centroid assumes attributes are constant over geometries

The tornado counts can be mapped using the okcentrd dataset with the size aesthetic. One point is plotted for each county centroid, and the size of the point is proportional to the number of tornadoes in the county.

R Code 5.22 : Choropleth map using graduated symbols

Code

```{r}
#| label: fig-choropleth-mappying-symbols
#| fig-cap: "Numbers of tornadoes in Oklahoma counties from 2016-2021 mapped as graduated symbols."
#| fig.height: 3

ggplot2::ggplot() +
  ggplot2::geom_sf(data = okcntrd, ggplot2::aes(size = tcnt)) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()
```

Graph 5.8: Numbers of tornadoes in Oklahoma counties from 2016-2021 mapped as graduated symbols.

5.6 Modifying Map Appearance

5.6.1 {RColorBrewer}: Color palettes for choropleth mapping

The {RColorBrewer} package provides a selection of palettes designed for choropleth mapping (Harrower and Brewer 2003). The display_brewer_all() function generates a chart with examples of all the available palettes (Graph 5.9).

R Code 5.23 : Show color palettes of the {RColorBrewer} package

Code

RColorBrewer::display.brewer.all()

Graph 5.9: Color palettes available in the RColorBrewer package.

There are three types of ColorBrewer palettes.

The top group in Graph 5.9 contains sequential palettes that are suitable for mapping ordered data along numerical scales (e.g., temperatures ranging from 0 to 30 degrees C) or ordinal categories (e.g., temperatures classified as cold, warm, and hot). These sequential palettes may include a single color or multiple colors, but have no clear breaks in the scale.
The middle group in Graph 5.9 contains qualitative palettes, which use color to distinguish between different categories without implying order.
The lower group in Graph 5.9 contains divergent palettes that emphasize a specific breakpoint in the data. Divergent palettes are often used to indicate values that are above or below the mean or to highlight values that are higher or lower than zero.

More details about these palettes, including recommendations for color schemes that are most effective for different types of computer displays and for accommodating color-blind viewers, are available at http://colorbrewer2.org.

5.6.2 Specifying a color palette for continuous data

Additional {ggplot2} functions can be added to improve the appearance of the map.

R Code 5.24 : Specifying a color palette for continuous data with {RColorBrewer}

Code

ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens * 10^3)) + # (1)
  ggplot2::scale_fill_distiller(                        # (2)
    name = base::expression("Tornadoes/1000 km"^2),     # (3) 
    palette = "YlOrRd",                                 # (4)
    breaks = scales::extended_breaks(n = 6),            # (5)
    direction = 1) +                                    # (6)
  ggplot2::theme_void() +                               # (7)
  ggplot2::theme(legend.position = "bottom")            # (8)

Graph 5.10: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with a custom palette.

Line comment 1: In contrast to the code in the book, I had to multiply the column for the tornado densities (tdens) with 10^3. Otherwise I would get decimal numbers overlapping each other after moving the legend to the bottom.
Line comment 2: The ggplot2::scale_fill_distiller() function allows the specification of a different color ramp.
Line comment 3: The base::expression() function is used for specifying the name argument for ggplot2::scale_fill_distiller() and to add text with a superscript.
Line comment 4: In this example we have used the “YlOrRd” palette from the {RColorBrewer} package. As the name says it starts from yellow and goes to red.
Line comment 5: The book uses the superseded scales::pretty_breaks() function instead the newer scales::breaks_pretty() function. This standard R break algorithm is primariy useful for date/times, for numeric scales the scales::extended_breaks() function does a slightly better job. n = 6 is the number of desired breaks. You may get slightly more or fewer breaks that requested. (After trying it out I learned that in this case the n parameter wouldn’t be necessary to get the same result.
Line comment 6: The default value is direction = -1 and produces scales from dark to light colors. We want the reverse representing lighter colors with few and dark colors with many tornados.
Line comment 7: Note that “complete” themes like ggplot2::theme_void() will remove any settings made by a previous ggplot2::theme() function. Therefore, it is necessary to call ggplot2::theme_void() before ggplot2::theme() to implement specific theme setting settings.
Line comment 8: We moved the legend to the bottom of the map to better accomodate the longer legend title.

5.6.3 Specifying a color palette for discrete data

The {RColorBrewer} palettes each contain a finite number of colors that are intended to be associated with categories in a choropleth map. Note that the ggplot2::scale_fill_distiller() function used to generate the color scale for the map in Graph 5.10 operates a bit differently. This function takes a ColorBrewer palette and converts it to a continuous color ramp.

The next map example will show how to define categories and map each one as a distinctive color. To view the colors for a given number of categories and a specific palette, the RColorBrewer::display.brewer.pal() function is used with the number of categories as the first argument and the palette name as the second palette (Graph 5.11).

R Code 5.25 : ColorBrewer discrete color palette

Code

RColorBrewer::display.brewer.pal(4, "YlOrRd")

Graph 5.11: The ColorBrewer ‘YlOrRd’ (yellow to red) color palette with four categories.

Rather than using continuous scales for color and size, it is often recommended to aggregate the data into a small number of classes (typically 3-6). Using discrete classes makes it easier to associate each color or symbol in the map with a specific range of values.

To accomplish this step, we need to add a couple of new classified variables using dplyr::mutate(). The base::cut() function is used to split the continuous variables based on user-specified breaks. The incidence variable is split based on quantiles (i.e., percentiles) defined in the qbrks object. The population breaks are manually specified.

R Code 5.26 : Generating discrete classes

Code

numclas = 4                                  # (1)
qbrks <- seq(0, 1, length.out = numclas + 1) # (2)
qbrks

countymap <- countymap |>                     # (3)
  dplyr::mutate(tdens_c1 = base::cut(tdens * 10^3,
                        breaks = stats::quantile(tdens * 10^3, breaks = qbrks),
                        include.lowest = T))

base::class(countymap$tdens_c1)
base::levels(countymap$tdens_c1)

#> [1] 0.00 0.25 0.50 0.75 1.00
#> [1] "factor"
#> [1] "[0,1.28]"    "(1.28,2.42]" "(2.42,3.69]" "(3.69,10.4]"

Line comment 1: We want to create four different colors for the tornado density.
Line comment 2: Whatever the number of different colors, because of the end point to create we need to add 1 to this number.
Line comment 3: We create a new column tdens_c1 for the break values of the tornado densities. The tdens_c1 column is a discrete factor with 4 levels instead of a continuous numerical variable as in Graph 5.10.

Because the new tdens_c1 column is a discrete factor instead of a continuous numerical variable as in Graph 5.10 we are now going to use the ggplot2::scale_fill_brewer() function is now used instead of ggplot2::scale_fill_distiller(). The comma-separated numbers specify the range of tornado densities for each of the categories (Figure 5.11).

R Code 5.27 : Mapping densities of tornados with color categories

Code

ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = base::expression("Tornadoes/1000 km"^2),   
                    palette = "YlOrRd") +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom")

Graph 5.12: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with colors

5.6.4 Specifying graduate symbols with categories

Similar to choropleth maps, graduated symbol maps are often easier to interpret if they include a limited number of symbol sizes. To generate a classified map of tornado counts, they are converted to discrete categories using the base::cut() function. Instead of using quantiles, the breakpoints for the classification are selected manually and stored in the brkpts vector.

R Code 5.28 : Generating discrete classes for symbol mapping

Code

maxcnt <- max(okcntrd$tcnt)
brkpts <- c(0, 2, 5, 10, maxcnt)
okcntrd <- okcntrd |> 
  dplyr::mutate(tcnt_c1 = base::cut(tcnt,
                        breaks = brkpts,
                        include.lowest = TRUE)
                )

base::levels(okcntrd$tcnt_c1)

#> [1] "[0,2]"   "(2,5]"   "(5,10]"  "(10,16]"

The resulting map has four symbol sizes, each associated with a specific range of tornado counts.

R Code 5.29 : Mapping densities of tornados with symbol categories

Code

ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c1)) +
  ggplot2::scale_size_discrete(name = "Tornados") +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom")

#> Warning: Using size for a discrete variable is not advised.

Graph 5.13: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with symbols.

Watch out! 5.1: Should I get rid of the warning?

In Graph 5.13 we got the warning “Using size for a discrete variable is not advised.” There are two different opinions about this warning:

It is just “a design tip to be broken when necessary”. “… this warning is paternalistic and preachey. It has no place in a code warning, it is a style recommendation.”
“Using size to map a non-ordinal categorical variable may suggest an ordinal relationship to the viewer, when none was intended. I find that’s quite serious as it could result in miscommunication”.

One recommendation is to “convert the categories to numbers and then do the mapping. That avoids the warning and ensures that the mapping happens exactly the way you want it to.”

We are in fact mapping a ordinal categorical variable. Therefore the warning should be ignored. But for the purpose of learning I will change to code so that there is no warning message generated.

In the next two code chunks I am going to convert the categories to numbers (R Code 5.30) and then I will do the mapping (R Code 5.31).

R Code 5.30 : Generating numbers for discrete classes

Code

okcntrd <- okcntrd |> 
  dplyr::mutate(tcnt_c2 = dplyr::case_when(
    tcnt_c1 == "[0,2]"   ~ 1,
    tcnt_c1 == "(2,5]"   ~ 2,
    tcnt_c1 == "(5,10]"  ~ 3,
    tcnt_c1 == "(10,16]" ~ 4,
    .default = NA
    )
  )


base::unique((ordered(okcntrd$tcnt_c2)))
base::class(okcntrd$tcnt_c2)

#> [1] 1 4 3 2
#> Levels: 1 < 2 < 3 < 4
#> [1] "numeric"

The mapping now needs with ggplot2::scale_size() a different scale (instead of ggplot2::scale_size_discrete()). However, I also need to adapt the legend labels to recover the binned values.

R Code 5.31 : Mapping numbers with symbol categories

Code

ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Tornados",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void() +
  ggplot2::theme(legend.position = "bottom")

Graph 5.14: Densities of tornadoes in Oklahoma counties from 2016-2021 mapped as a choropleth with four categories with symbols. (Without warning)

5.7 Exporting Graphics Output

Until now the generated maps and charts using {ggplot2} are output as HTML and stored in the file for chapter 5 of my book notes.

However, it is often necessary to export maps and other graphics to files and explicitly specify their dimensions and resolution. This is usually the case when generating graphics for publications that must meet specific size and formatting criteria.

The simplest way to do this is with the ggplot2::ggsave() function.

Code Collection 5.5 : Title for code collection

R Code 5.32 : Output to a portable network graphics (PNG)

Run this code chunk manually to save the PNG graphics only once.

Code

ggplot2::ggsave("data/Chapter5/tornado.png", 
       width = 6, 
       height = 4, 
       dpi = 300)
ggplot2::ggsave("data/Chapter5/tornado2.png", 
       width = 15, 
       height = 10, 
       units = "cm", 
       dpi = 100)

(For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.)

This example exports the most recent output of ggplot2::ggplot() to a portable network graphics (PNG) file called tornado.png with dimensions of 6 x 4 inches and a resolution of 300 pixels per inch.

Other units besides inches can be used by specifying the units argument.

R Code 5.33 : Output to a tagged image file format (TIFF)

Run this code chunk manually to save the TIFF graphics only once.

Code

ggplot2::ggsave("data/Chapter5/tornado.tiff", 
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")

(For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.)

When saving a TIFF, a compression type can be specified.

R Code 5.34 : Output to a Joint Photographic Experts Group file format (JPEG)

Run this code chunk manually to save the JPEG graphics only once.

Code

ggplot2::ggsave("tornado.jpeg", 
       width = 6, 
       height = 4, 
       dpi = 300, 
       quality = 90)

(For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.)

When saving a JPEG, a compression type can be specified.

R Code 5.35 : Output to a Portable Document Format file format (PDF)

Run this code chunk manually to save the PDF graphics only once.

Code

ggplot2::ggsave(
  "data/Chapter5/tornado.pdf", 
   width = 6, 
   height = 4
  )

(For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.)

When saving a PDF, the dpi argument is ignored.

R Code 5.36 : Output to an R object

Run this code chunk manually to save the R objects only once.

Code

choropleth <- ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = "Density",   
                    palette = "YlOrRd") +
  ggplot2::theme_void()

gradsymbol <- ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Count",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()

ggplot2::ggsave("data/Chapter5/choropleth.tiff", 
       plot = choropleth,
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")

ggplot2::ggsave("data/Chapter5/gradsymbol.tiff",
       plot = gradsymbol,
       width = 6, 
       height = 4, 
       dpi = 300, 
       compression = "lzw")

(For this R code chunk is no visible output available. However, a file was saved if you have run the code chunk manually.)

The output of ggplot2::ggplot() can also be saved as an R object that can be output to graphics files using ggplot2::ggsave(). The plot argument is used to specify the ggplot object to be saved.

There are several options to combine saved graphs and maps into a composite figure:

Base R with the graphics::par() function using mfrow or mfcol as parameters.
{ggpubr} package using the ggpubr::ggarrange() function (Kassambara 2023).
{patchwork} package using very flexible arrangements with plot arithmetic +, -, |, *, & (Pedersen 2024).
{cowplot} package with the cowplot::plot_grid() function (Wilke 2024).

I am going to demonstratge here just the last possibility as is this the example used in the book.

R Code 5.37 : Save combined graphics with the {cowplot} package

Code

choropleth <- ggplot2::ggplot(data = countymap) +
  ggplot2::geom_sf(ggplot2::aes(fill = tdens_c1)) +
  ggplot2::scale_fill_brewer(name = "Density",   
                    palette = "YlOrRd") +
  ggplot2::theme_void()

gradsymbol <- ggplot2::ggplot(data = okcntrd) +
  ggplot2::geom_sf(ggplot2::aes(size = tcnt_c2)) +
  ggplot2::scale_size(
    name = "Count",
    labels = base::levels(okcntrd$tcnt_c1)
    ) +
  ggplot2::geom_sf(data = okcounty, fill = NA) +
  ggplot2::theme_void()

cowplot::plot_grid(choropleth, gradsymbol, 
          labels = c("\n\nA) Choropleth Map", 
                     "\n\nB) Graduated Symbols",
                     label_size = 12),
          ncol = 1, 
          hjust = 0, 
          label_x = 0, 
          align = "hv")

The cowplot::plot_grid() function provides a variety of options for arranging figures in a regular grid. This basic example provides a label for each subplot and specifies additional arguments to plot the maps in a single column, justify the labels, move the labels besides each map, and align the maps horizontally and vertically, so they are exactly the same size.

5.8 Resources

Resource 5.1 : Additional resources for working with the {sf} package

A variety of additional resources for working with the sf package can be found at https://r-spatial.github.io/sf/index.html. These include a link to the sf “cheatsheet” as well as a variety of articles, vignettes, and blog posts that provide additional examples of how to work with vector geospatial data using this package.
- A rather comrehensive overview about loading and saving datasets of different formats is explained in the blog article How to load and save vector data in R. I will follow the tutorial in this article in Appendix A.
The book Geocomputation with R by Robin Lovelace and others (Lovelace, Nowosad, and Muenchow 2020) is also an excellent reference that encompasses {sf} as well as many other R packages for working with geospatial data. Look for their forthcoming second edition (Lovelace, Nowosad, and Muenchow 2025), which is already online for the most up-to-date information on geospatial data processing with R.
A more pressing knowledge for me is to learn how to get vector map data for creating maps. I am following here the tutorial by (FelixAnalytix 2023) in Appendix B.

References

FelixAnalytix. 2023. “How to Map ANY Region of the World Using r Programming.” https://felixanalytix.medium.com/how-to-map-any-region-of-the-world-using-r-programming-bb3c4146f97f.

Kassambara, Alboukadel. 2023. “Ggpubr: ’Ggplot2’ Based Publication Ready Plots.” https://CRAN.R-project.org/package=ggpubr.

Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2020. Geocomputation with R. 1st ed. Boca Raton London New York: Routledge.

———. 2025. Geocomputation With R. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC.

Pedersen, Thomas Lin. 2024. “Patchwork: The Composer of Plots.” https://CRAN.R-project.org/package=patchwork.

Wilke, Claus O. 2024. “Cowplot: Streamlined Plot Theme and Plot Annotations for ’Ggplot2’.” https://github.com/wilkelab/cowplot.

5.1 Import Geospatial Data

5.1.1 ESRI shapefile format

5.1.2 Conversion data sf <-> sp

5.2 Creating simple maps

5.2.1 Draw Oklahoma county boundaries

5.2.2 Inspect tpoint and tpath

5.2.3 Visualization of the Oklahoma tornado data (2016-2021)

5.3 Overlaying Vector Datasets

5.3.1 A first spatial join

5.3.2 Count tornados per county

5.3.3 Associate polygons with tornado counts

5.4 Save spatial geodata files

5.4.1 ESRI format

5.4.2 GeoPackage format

5.4.3 GeoJSON format

5.5 Choropleth Maps

5.5.1 Filling with colors (standard)

5.5.2 Mapping symbols

5.6 Modifying Map Appearance

5.6.1 {RColorBrewer}: Color palettes for choropleth mapping

5.6.2 Specifying a color palette for continuous data

5.6.3 Specifying a color palette for discrete data

5.6.4 Specifying graduate symbols with categories

5.7 Exporting Graphics Output

5.8 Resources

References

5.2.2 Inspect `tpoint` and `tpath`