Chapter 2 Refresh: Using R for data and spatial data

2.1 Introduction

This practical is intended as refresher for your spatial data handling skills in R/RStudio. It uses a series of worked examples to illustrate some fundamental and commonly applied operations on data and spatial datasets. It is based on Chapter 5 of Brunsdon and Comber (2018)1 and uses a series of worked examples to illustrate some fundamental and commonly applied spatial operations on spatial datasets. Many of these form the basis of most GIS software.

Specifically, in GIS and spatial analysis, we are often interested in finding out how the information contained in one spatial dataset relates to that contained in another. The datasets may be ones you have read into R from shapefiles or ones that you have created in the course of your analysis. Essentially, the operations illustrate different methods for extracting information from one spatial dataset based on the spatial extent of another. Many of these are what are frequently referred to as overlay operations in GIS software such as ArcGIS or QGIS, but here are extended to include a number of other types of data manipulation. The sections below describe how to create a point layer from a data table with location attributes and the following operations:

  • Intersections and clipping one dataset to the extent of another
  • Creating buffers around features
  • Merging the features in a spatial dataset
  • Point-in-Polygon and Area calculations
  • Creating distance attributes
  • Combining spatial data and attributes

An R script with all the code for this practical is provided.

2.1.1 Packages

To handle these data types we will need some bespoke tools (functions) and these are found in packages. Recall that if you have not used packages on the computer your are working on before then they will need to be installed, using the install.packages function. The code below tests for the existence of the sf, tidyverse and tmap packages and installs them if they are not found:

# sf for spatial data
if (!is.element("sf", installed.packages()))
    install.packages("sf", dep = T)
# tmap for mapping
if (!is.element("tmap", installed.packages()))
    install.packages("tmap", dep = T)
# load the packages
library(sf)
library(tmap)

2.1.2 Data

To do this, we will need some data and you can download and then load some data using the code below. This downloads data for Chapter 7 of Comber and Brunsdon (2021) from a repository

https://archive.researchdata.leeds.ac.uk/741/1/ch7.Rdata

download.file("https://archive.researchdata.leeds.ac.uk/741/1/ch7.Rdata",
              "./wk1.RData", mode = "wb")

You should see file called ch1.RData appear in your working directory and then you can load the data:

load("wk1.RData")

Note that you can check where your working directory is by entering getwd() and you can change (reset) it by going to Session > Set Working Directory….

If you check what is is loaded you should see 2 R objects have been loaded into the session via the ch2.RData file.

ls()
## [1] "lsoa"       "oa"         "properties"

Three data files are loaded:

  1. oa a multipolygon sf object of Liverpool census Output Areas (OAs) (\(n = 1584\))
  2. lsoa a multipolygon sf object of Liverpool census Lower Super Output Areas (LSOAs) (\(n = 298\))
  3. properties a point sf object scraped from Nestoria (https://www.nestoria.co.uk) (\(n = 4230\))

You can examine these in the usual way:

# look at the structure
str(oa)
str(lsoa)
# examine the first 6 rows
head(properties)

The census data attributes in oa and lsoa describe economic well-being (unemployment (the unmplyd attribute)), life-stage indicators (percentage of under 16 years (u16), 16-24 years (u25), 25-44 years (u45), 45-64 years (u65) and over 65 years (o65) and an environmental variable of the percentage of the census area containing greenspaces (gs_area). The unemployment and age data were from the 2011 UK population census (https://www.nomisweb.co.uk) and the greenspace proportions were extracted from the Ordnance Survey Open Greenspace layer (https://www.ordnancesurvey.co.uk/opendatadownload/products.html). The spatial frameworks were from the EDINA data library (https://borders.ukdataservice.ac.uk). The layer are projected to the OSGB projection (EPSG 27700) and has a geo-demographic class label (OAC) from the OAC (see https://data.cdrc.ac.uk/dataset/output-area-classification-2011).

We will work mainly with the oa and properties layers in this practical.

The properties data was downloaded on 30th May 2019 using an API (https://www.programmableweb.com/api/nestoria). It contains latitude (Lat) and longitude (Lon) in decimal degrees,allowing an sf point layer to be created from it, price in 1000s of pounds (£), the number of bedrooms and 38 binary variables indicating the presence of different keywords in the property listing such as Conservatory, Garage, Wood Floor etc.

The code below creates a flat data table from the properties point layer and then removes it. The idea in not having a spatial sf object propertiesbut a data table (like a spreadsheet) is similar to many real world situations - there is just a data table that you download, create, scrape or be sent (for example in your dissertation work).

# create a data frame version
props = data.frame(properties)
# check the names of the attributes
names(props)
##  [1] "Kitchen"             "Garden"              "Modern"             
##  [4] "Gas.Central.Heating" "No.Chain"            "Parking"            
##  [7] "Shared.Garden"       "Double.Bedroom"      "Balcony"            
## [10] "New.Build"           "Lift"                "Gym"                
## [13] "Porter"              "Price"               "Beds"               
## [16] "Terraced"            "Detached"            "Semi.Detached"      
## [19] "Conservatory"        "Cul.de.Sac"          "Bungalow"           
## [22] "Garage"              "Reception"           "En.suite"           
## [25] "Conversion"          "Dishwasher"          "Refurbished"        
## [28] "Patio"               "Cottage"             "Listed"             
## [31] "Fireplace"           "Victorian"           "Penthouse"          
## [34] "Purpose.Built"       "Wood.Floor"          "Loft"               
## [37] "Detached.Garage"     "Auction"             "Needs.Modernisation"
## [40] "Double.Garage"       "geometry"            "Lon"                
## [43] "Lat"
# remove the geometry attribute
props = props[, -41]
# remove the properties and lsoa layers
rm(list = c("lsoa", "properties"))
# check what is present
ls()
## [1] "oa"    "props"

Both of the remaining objects, oa and props have a data table format, with rows representing observations (people, places, dates, or in this cases census areas) and columns represent their attributes. Such long format data are the most commonly used and are similar to a standard spreadsheet, although the oa object is spatial and has geometries attached to each row, as do spatial data used in a GIS.

2.2 R as a GIS

2.2.1 Creating point data

The properties data props has locational attributes that can be used to convert it from a flat data table to point data.

This is the case in many situations and projects for which this - a flat data table - is the starting point: you are given a file with point observations of something (plus location) that you want to something spatial with.

The code converts the props flat table to the proprttties spatial object and undertakes this operation, using the WG84 projection for decimal degrees which has an EPSG code of 4236, noting that all projections have such a code2:

properties = st_as_sf(props, coords = c("Lon", "Lat"), crs = 4326)

This can be plotted using a standard tmap approach:

tm_shape(properties) + 
  tm_dots("Price", size = 0.1,alpha = 0.5, 
          style = "kmeans",
          palette = "inferno") + 
  tm_layout(bg.color = "grey95",
            legend.position = c("right", "top"),
            legend.outside = T) 
Proprties for sale in the Liverpool area.

Figure 2.1: Proprties for sale in the Liverpool area.

Alternatively this can be done as an interactive map, by change the tmap_mode:

# set tmap mode for interactive plots
tmap_mode("view")
# create the plot
tm_shape(properties) + 
  tm_dots("Price", alpha = 0.5, 
          style = "kmeans",
          palette = "viridis") + 
  tm_basemap("OpenStreetMap")
# reset tamp mode
tmap_mode("plot")

Task 1

Write some code that generates an interactive tmap of the number of bedrooms for the observations / records in the properties layer.

NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.

2.2.2 Intersections and Clip Operations

Now, consider the situation where the aim was to analyse the incidence of properties for sale in the Output Areas of Liverpool: we do not want to analyse all of the properties data but only those records that describe events in our study area - the area we are interested in.

This can be plotted using the usual commands as in the code below. You can see that the plot extent is defined by the spatial extent of area of interest (here oa) and that all of the properties within that extent are displayed.

# plot the areas
tm_shape(oa) +
  tm_borders(col = "black", lwd = 0.1) +
  # add the points 
  tm_shape(properties) +
    tm_dots(col = "#FB6A4A") +
  # define some style / plotting options
  tm_layout(frame = F)
The OAs in Liverpool and the locations of properties for sale.

Figure 2.2: The OAs in Liverpool and the locations of properties for sale.

There are a number of ways of clipping spatial data in R. The simplest of these is to use the spatial extent of one as an index to subset another.

BUT to undertake any spatial operation with 2 spatial layers, they must be in the same projection. The tmap code above was able to handle this but spatial operations may fail.

Examine the properties layer and note its projection (particularly through the Geodetic CRS metadata at the top of the print out and the geometry attribute:

properties

Now change the projection to OSGB3 and re-examine:

properties = st_transform(properties, 27700)
properties

The spatial extent of the oa layer can now be used to clip out the intersecting observations form the proprties layer:

prop_clip = properties[oa,]

This simply clips out the data from properties that is within the spatial extent of oa. You can check this:

tm_shape(oa) +
  tm_fill() +
  tm_shape(prop_clip) +
  tm_dots()

However, such clip (or crop) operations simply subset data based on their spatial extents. There may be occasions when you wish to combine the attributes of difference datasets based on the spatial intersection. The st_intersection in sf allows us to do this as shown in the code below.

prop_int <- st_intersection(oa, properties)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

The st_intersection operation and the clip operation are based on spatial extents, do the same thing and the outputs are the same dimension but with subtle differences:

prop_clip
prop_int

If you examine the data created by the intersection, you will notice that each of the intersecting points has the full attribution from both input datasets (you may have to scroll up in your console to see this!):

head(data.frame(prop_clip))
head(data.frame(prop_int))

Task 2

Write some code that selects the polygon in oa with the LSOA code E00033902 (in the code attribute in oa) and then clips out the properties in that polygon. How many properties are there in E00033902?

NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.

2.2.3 Merging spatial features and Buffers

In many situations, we are interested in events or features that occur near to our area of interest as well as those within it. Environmental events such as tornadoes, for example, do not stop at state lines or other administrative boundaries. Similarly, if we were studying crimes locations or spatial access to facilities such as shops or health services, we would want to know about locations near to the study area border. Buffer operations provide a convenient way of doing this and buffers can be created in R using the st_buffer function in sf.

Continuing with the example above, we might be interested in extracting the properties within a certain distance of the Liverpool area, say 2km. Thus the objective is to create a 2km buffer around Liverpool and to use that to select from the properties dataset. The buffer function allow us to do that, and requires a distance for the buffer to be specified in terms of the units used in the projection.

The code below create a buffer, but does it for each OA - this is not we need in this case!

# apply a buffer to each object
buf_liv <- st_buffer(x = oa, dist = 2000)
# map
tm_shape(buf_liv) + 
  tm_borders()

The code below uses st_union to merge all the OAs in oa to a single polygon and then uses that as the object to be buffered:

# union the oa layer to a single polygon
liv_merge <- st_sf(st_union(oa))
# buffer this
buf_liv <- st_buffer(liv_merge, 2000)
# and map
tm_shape(buf_liv) + 
  tm_borders() +
  tm_shape(liv_merge) + tm_fill()
The outline of the Liverpool area (in grey) and a 2km buffer.

Figure 2.3: The outline of the Liverpool area (in grey) and a 2km buffer.

2.2.4 Point-in-polygon and Area calculations

2.2.4.1 Point-in-polygon

It is often useful to count the number of points falling within different zones in a polygon data set. This can be done using the the st_contains function in sf.

The code below returns a list of counts of the number properties that occur inside each OA to the variable prop.count and prints the first six of these to the console using the head function:

tmp = st_contains(oa, properties, sparse = F)
prop.count = rowSums(tmp)
head(prop.count)
## [1] 3 0 0 5 4 5
length(prop.count)
## [1] 1584

Each of the observations in prop.count corresponds in order to the OAs in oa and could be attached to the data in the following way:

oa$prop.count = prop.count

This could be mapped using a full tmap approach as in the below, using best practice for counts (choropleths should only be used for rates!):

tm_shape(oa) + 
  tm_dots(size = "prop.count", alpha = 0.5) +
  tm_shape(liv_merge) + tm_borders()
The the counts of properties for sale in the OAs in Liverpool.

Figure 2.4: The the counts of properties for sale in the OAs in Liverpool.

2.2.4.2 Area calculations

Another useful operation is to be able calculate polygon areas. The st_area function in and sf does this. To check the projection, and therefore the map units, of an sf objects use the st_crs function:

st_crs(oa)

This declares the projection to be in metres. To see the areas in square metres enter:

head(st_area(oa))

These are not particularly useful and more realistic measures are to report areas in hectares or square kilometres:

# hectares
st_area(oa) / (100 * 100)
# square kilometres
st_area(oa) / (1000 * 1000)

Task 3

Your task is to create the code to produce maps of the densities of properties in each OA in Liverpool in properties per square kilometre. For the analysis you will need to use the properties point data and the oa dataset and undertake a point in polygon operation, apply an area function and undertake a conversion to square kilometres. The maps should be produced using the tm_shape and tm_fill functions in the tmap package^[Hint: in the tm_fill part of the mapping use style = "kmeans" to set the colour breaks).

NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.

2.2.5 Creating distance attributes

Distance is fundamental to spatial analysis. For example, we may wish to analyse the number of locations (e.g. health facilities, schools, etc) within a certain distance of the features we are considering. In the exercise below distance measures are used to evaluate differences in accessibility for different social groups, as recorded in census areas. Such approaches form the basis of supply and demand modelling and provide inputs into location-allocation models.

Distance could be approximated using a series of buffers created at specific distance intervals around our features (whether point or polygons). These could be used to determine the number of features or locations that are within different distance ranges, as specified by the buffers using the poly.counts function above. However distances can be measured directly and there a number of functions available in R to do this.

First, the most commonly used is the dist function. This calculates the Euclidean distance between points in \(n-\)dimensional feature space. The example below developed from the help for dist shows how it is used to calculate the distances between 5 records (rows) in a feature space of 20 hypothetical variables:

# create some random data
set.seed(123) # for reproducibility
x <- matrix(rnorm(100), nrow = 5)
colnames(x) <- paste0("Var", 1:20)
dist(x)
as.matrix(dist(x))

If your data is projected (i.e. in metres, feet etc) then dist can also be used to calculate the Euclidean distance between pairs of coordinates (i.e. 2 variables not 20!):

coords = st_coordinates(st_centroid(oa)) 
dmat = as.matrix(dist(coords, diag = F))
dim(dmat)
dmat[1:6, 1:6]

The distance functions return a to-from matrix of the distances between each pair of locations. These could describe distances between any objects and such approaches underpin supply and demand modelling and accessibility analyses. When determining geographic distances, it is important that you consider the projection properties of your data: if the data are projected using degrees (i.e. in latitude and longitude) then this needs to be considered in any calculation of distance. The st_distance function in sf calculates the Cartesian minimum distance (straight line) distance between two spatial datasets of class sf projected in planar coordinates.

The code below calculates the distance between 2 features:

st_distance(oa[100,], properties[2,])
## Units: [m]
##          [,1]
## [1,] 3118.617

And this can be extended to calculates multiple distances using a form of apply function. The code below uses st_distance to determine the distance from the 100^th observation in oa to each of the properties for sale:

g1 = st_geometry(oa[100,])
g2 = st_geometry(properties) 
dlist = mapply(st_distance, g1, g2)
length(dlist)
## [1] 4230
head(dlist)
## [1]  4440.633  3118.617 10518.848  4435.707  2519.499  2519.499

You could check this by mapping the properties data, shaded by their distance to this location:

tm_shape(cbind(properties, dlist)) +
  tm_dots("dlist", size = 0.1,alpha = 0.5, 
          style = "kmeans",
          palette = "inferno") + 
  tm_layout(bg.color = "grey95",
            legend.position = c("right", "top"),
            legend.outside = T) + 
  tm_shape(oa[100,]) + tm_borders("red", lwd = 4) 
A map of the proprties for sale inthe Liverpool area shaded by their distance to the 100th observation.

Figure 2.5: A map of the proprties for sale inthe Liverpool area shaded by their distance to the 100th observation.

2.2.6 Combining spatial datasets and their attributes

The point-in-polygon calculation above generates counts of the points falling in each polygon. A common situation in spatial analysis is the need to combine (overlay) different polygon features that describe the spatial distribution of different variables, attributes or processes that are of interest. The problem is that the data may have different underlying area geographies. In fact, it is commonly the case that different agencies, institutions and government departments use different geographical areas and even where they do not, geographical areas frequently change over time.

In these situations, we can use the intersection function st_intersection in sf to identify the area of intersection between different spatial datasets.

In this exercise, a zone dataset will be created with the aim of calculating the number of of properties for sale in each zone. Now we could just use a point in polygon operation with the raw properties data in this case, but the aim here to illustrate the technique, when for example data are only available over a specific set of areas such as the Output Areas here.

First, you should create the zones, number them with an ID and plot these on a map with the OA data:

## define a 1km grid in polygons
sq = st_make_grid(oa, 1000, what = "polygons", square = T) 
sq_grid = data.frame(ID = 1:length(sq))
st_geometry(sq_grid) = sq
# clip the extent of oa
sq_grid = sq_grid[oa,]
# plot
tm_shape(sq_grid) + tm_polygons(col = "grey") +
  tm_shape(oa) +tm_borders(col = "red")
A map of the the OAs (red) and the 1km grid.

Figure 2.6: A map of the the OAs (red) and the 1km grid.

The 2 layers, st_grid and oa have the same projections, and so they can be intersected using st_intersection:

int.res_sf <- st_intersection(sq_grid, oa[, c("code","prop.count")])
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

You can examine the intersected data:

head(int.res_sf) 

You will see that the data.frame of the intersected object contains composites of the inputs. These links can be used to create attributes for the intersection output data.

First, note that the ID variable of int.res_sf relates to the ID variable of sq_grid and the variable prop.count is from the variable of the same name in oa. We wish to summarise this over the zones of sq_grid. Here the functionality of dplyr single table operations that were introduced in Chapter 4 can be useful. Knowing the the unique identifiers of each polygon in both of the intersected layers is critical for working out proportions.

# generate area and proportions
int.areas <- st_area(int.res_sf)
liv.areas <- st_area(oa)
# match tract area to the new layer
index <- match(int.res_sf$code, oa$code)
liv.areas <- liv.areas[index]
liv.prop <- as.vector(int.areas)/as.vector(liv.areas)

The liv.prop object can be used to create a variable in the data frame of the new layer

int.res_sf$props <- oa$prop.count[index] * liv.prop

And this can be summarised using the functionality in dplyr:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# summarise the counts of properties of the grid cells
int.res_sf %>% st_drop_geometry() %>%
  group_by(ID) %>% 
  summarise(count = sum(props)) -> props
# link to the grid data
sq_grid %>% left_join(props) -> sq_grid
## Joining, by = "ID"

The results can be plotted :

tm_shape(sq_grid) + 
  tm_polygons("count", palette = "Greens", 
              style = "kmeans", title = "Properties for sale") +
  tm_layout(frame = F, legend.position = c(1,0.5)) 
The zones shaded by the number of propeties for sale after intersection with the Output Areas

Figure 2.7: The zones shaded by the number of propeties for sale after intersection with the Output Areas

Of course, all of this can be done using st_interpolate_aw which wraps up all of the above steps into a single function:

props2 = st_interpolate_aw(oa[, "prop.count"], sq_grid, extensive = T)
tm_shape(props2) + 
  tm_polygons("prop.count", palette = "Greens", 
              style = "kmeans", title = "No. of Properties for sale") +
  tm_layout(frame = F, legend.position = c(1,0.5)) 

2.2.7 Reading and writing data in and out of R/RStudio

There are 2 basic approaches:

  1. read and write data in and out of R/Rstudio in propriety formats such as shapefiles, CSV files etc.
  2. read and write data using .Rdata files in R’s binary format and the load and save functions.

The code below provides does these as a reference for you:

Tabular data (CSV format)

write.csv(props, file = "props.csv", row.names = FALSE)
p2 = read.csv("props.csv", header=TRUE)

Tabular data (.RData format)

save(props, file = "props.RData")
load("props.RData") # notice that this needs no assignation e.g. to p2 or props

Spatial data (Proprietary format)

# as shapefile
st_write(properties, "point.shp", delete_layer = T)
p2 = st_read("point.shp")
# as GeoPackage
st_write(oa, "area.gpkg", delete_layer = T)
a2 = st_read("area.gpkg")

Spatial data (R format)

save(oa, file = "areas.RData")
load("areas.RData") # notice that this needs no assignation e.g. to p2 or props

2.3 Answer to Tasks

Task 1 Write some code that generates an interactive map of the number of bedrooms for the observations / records on properties.

# basic
tm_shape(properties) + 
  tm_dots("Beds") 
# with embellishment
tm_shape(properties) + 
  tm_dots("Beds", size = 0.2, title = "Bedrooms",
          style = "kmeans", palette = "viridis") +
  tm_compass(bg.color = "white")+
  tm_scale_bar(bg.color = "white")

Task 2 Write some code that selects the polygon in oa with the LSOA code E00033902 (in the code attribute in oa) and then clips out the properties in that polygon. How many properties are there in E00033902?

## There are lots of ways of doing this but each of them uses a logical statement

# 1. using an index to create a subset and then clip
index = oa$code == "E00033902"
oa_select = oa[index,]
properties[oa_select,]

# 2. using the layer you have created already 
sum(prop_int$code == "E00033902")

Task 3 Produce a map of the densities of properties for sale in each OA block in Liverpool in properties per square kilometre

# point in polygon
p.count <-  rowSums(st_contains(oa,properties,sparse = F))
# area calculation
oa.area <- st_area(oa) / (1000 * 1000)
# combine and assign to the oa data
oa$p.p.sqkm <- as.vector(p.count/oa.area)
# map
tm_shape(oa) +
  tm_fill("p.p.sqkm", style = "kmeans", title ="") 

2.4 Optional: mapping with tmap and ggplot2

We have been using tmap in the above exercises without really commenting on what is being done. This section covers mapping in tmap in a bit more detail and also shows how maps can be created with the ggplot2 package, supported by the ggspatial package. The grid package is used for combining multiple plot objects.

## install packages
# tidyverse for many things! 
if (!is.element("tidyverse", installed.packages()))
    install.packages("tidyverse", dep = T)
# ggspatial for mapping with ggplot
if (!is.element("ggspatial", installed.packages()))
    install.packages("ggspatial", dep = T)
# grid for plotting
if (!is.element("grid", installed.packages()))
    install.packages("grid", dep = T)
# load packages
library(tidyverse)
library(grid)
library(ggspatial)

The code loads the Local Authority Districts (LADs) in Great Britain with attributes describing the Brexit vote and additional demographic data. You will need to have the data_gb.geojson in your working directory on your computer:

This is the same data and variables used in the analysis of the Brexit vote by Beecham, Williams, and Comber (2020). You can inspect in the usual way

summary(data_gb)
str(data_gb)

2.4.1 Mapping with tmap

The tmap package supports the thematic visualization of spatial data. It has a grammatical style that handles each element of the map separately in a series of layers (it is similar to ggplot in this respect - see below). In so doing it seeks to exercise control over each element in the map. This is different to the basic R plot functions. The basic syntax of tmap is:

tm_shape(data = <data>)+
  tm_<function>()

NB do not run the code above - its is simply showing the syntax of tmap. However, you have probably noticed that it uses a syntactical style similar to ggplot to join together layers of commands, using the + sign. The tm_shape function initialises the map and then layer types, the variables that get mapped etc, are specified using different flavours of tm_<function>. The main types are listed in the table below.

## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
Table 2.1: Commonly used tmap functions
Function Description
tm_dots, tm_bubbles for points either simple or sized according to a variable
tm_lines for line features
tm_fill, tm_borders, tm_polygons for polygons / areas, with and without attributes and polygon outlines
tm_text for adding text to map features
tm_format, tm_facet, tm_layout for adjusting cartographic appearance and map style, including legends
tm_view, tm_compass, tm_credits, tm_scale_bar for including traditional cartography elements

Let’s start with a simple choropleth map. These show the distribution of a continuous variable in the spatial data (typically polygons / areas or points)

tm_shape(data_gb)+
  tm_polygons("share_leave")
A choropleth map of `share_leave`.

Figure 2.8: A choropleth map of share_leave.

By default tmap picks a shading scheme, the class breaks and places a legend somewhere. All of these can be changed. The code below allocates the tmap plot to p1 (plot 1) and then prints it:

p1 = tm_shape(data_gb) + 
  tm_polygons("share_leave", palette = "GnBu", border.col = "salmon",
    breaks = seq(0,0.8, 0.1), title = "Leave Share")+
  tm_layout(legend.position = c("right", "top"), frame = F)
p1

And of course many other elements included either by running the code snippet defining p1 above with additional lines or by simply adding them as in the code below

p1 + 
  tm_scale_bar(position = c("left", "top")) + 
  tm_compass(position = c(0.15, 0.9))

It is also possible to add or overlay other data such as the region boundaries, which because this is another spatial data layer needs to be added with tm_shape followed by the usual function. Notice the use of the piping syntax to create regions:

# first make the region boundaries
regions <- data_gb %>% group_by(Region) %>% summarise()

# then add to the plot
p1 + tm_shape(regions) + tm_borders(col = "black", lwd = 1)

The tmap package can be used to plot multiple attributes in the same plot in this case the leave share and degree educated proportions:

tm_shape(data_gb)+
  tm_fill(c("share_leave", "degree_educated"), palette = "YlGnBu",
    breaks = seq(0,1, 0.1), title = c("Leave Share", "Degree"))+
  tm_layout(legend.position = c("left", "bottom"))
Choropleth maps of `share_leave` and `degree_educated`.

Figure 2.9: Choropleth maps of share_leave and degree_educated.

The code below applies further refinements to the code to generate the next map. Notice the use of title and legend.hist, and then subsequent parameters passed to tm_layout to control the legend:

tm_shape(data_gb)+
  tm_polygons("share_leave", title = "Leave Share", palette = "Reds",
    style = "kmeans", legend.hist = T)+
  tm_layout(title = "Leave Vote Share \nin the UK",
            frame = F, legend.outside = T, 
            legend.hist.width = 1,
            legend.format = list(digits = 1), 
            legend.outside.position = c("left", "top"),
            legend.text.size = 0.7,
            legend.title.size = 1) +
  tm_scale_bar(position = c("left", "top")) + 
  tm_compass(position = c("left", "top")) +
  tm_shape(regions) + tm_borders(col = "black", lwd = 1)
A refined choropleth map of the Leave Share.

Figure 2.10: A refined choropleth map of the Leave Share.

This is quite a lot of code to unpick. The first call to tm_shape determines which layer is to be mapped, then a specific mapping function is applied to that layer (in this case tm_polygons) and a variable is passed to it. Other parameters are used to specify are the palette, whether a histogram is to be displayed and the type of breaks to be imposed (here a \(k\)-means was applied). The help for tm_polygons describes a number of different parameters that can be passed to tmap elements. Next, some adjustments were made to the defaults through the tm_layout function, which allows you to override the defaults that tmap automatically assigns (shading scheme, legend location etc). Finally the region layer was added.

There are literally 1000’s of options with tmap and many of them are controlled in the tm_layout function. You should inspect this:

?tm_layout

The code below creates and maps point data values in different ways. There are 2 basic options: change the size or change the shading of the points according to the attribute value. The code snippets below illustrate these approaches.

Create the point layer:

lads_pts = st_centroid(data_gb)

Then map by size using tm_bubbles as in the figure below:

tm_shape(regions) +
  tm_fill("olivedrab4") +
  tm_borders("grey", lwd = 1) + 
  # the points layer
  tm_shape(lads_pts) +
  tm_bubbles("share_leave", title.size = "Leave Vote", scale = 0.6, col = "gold") +
  tm_layout(legend.position = c("left", "top")) 
Point data values using size.

Figure 2.11: Point data values using size.

Second by shade using tm_dots as below:

# the 1st layer
tm_shape(regions) +
  tm_fill("grey70") +
  tm_borders(lwd = 1) +
  # the points layer
  tm_shape(lads_pts) +
  tm_dots("degree_educated", size = 0.1, title = "Degree", shape = 19) +
  tm_layout(legend.outside = T, frame = F)
Point data values using choropleth mapping (colour).

Figure 2.12: Point data values using choropleth mapping (colour).

You should explore the introductory tmap vignettes which describes other approaches for mapping different types of variables, in built styles, plotting multiple layers:

vignette("tmap-getstarted", package = "tmap")

It is possible to use an OpenStreetMap backdrop for some basic web-mapping with tmap by switching the view from the default mode from plot to view. You need to be on-line and connected to the internet. The code below generates a zoomable web-map map with an OpenStreetMap backdrop in the figure below, with a transparency term to aid the understanding of the local map context:

# set the mode
tmap_mode("view")
# do the plot
tm_shape(data_gb[data_gb$Region == "East",]) +
    tm_fill(c("share_leave"), palette = "Reds", alpha = 0.6) +
    tm_basemap('OpenStreetMap')

Aerial imagery and satellite imagery (depending on the scale and extent of the scene) can also be used to provide context as in the figure below:

tm_shape(data_gb[data_gb$Region == "East",]) +
    tm_fill(c("share_leave"), palette = "Reds", alpha = 0.6) +
    tm_basemap('Esri.WorldImagery')

Other backdrops can be specified as well as OSM. Try the following: Stamen.Watercolor, Esri.WorldGrayCanvas, OpenStreetMap and Esri.WorldTopoMap.

The tmap_mode needs to be reset to return to a normal map view:

tmap_mode("plot")

Finally in this brief description of tmap, when maps are assigned to a named R object (like p1) then you can exert greater control over how multiple maps are plotted together using the tmap_arrange function. This allows the parameters for each map to be specified individually before they displayed together as in the figure below.

p1 <- tm_shape(data_gb) + 
  tm_fill("younger_adults", palette = "Reds", style = "quantile", 
          n = 5, title = "Younger Adults")+
  tm_layout(legend.position = c("left", "top"))+
  tm_shape(regions) + tm_borders(col = "black", lwd = 1)
p2 <- tm_shape(data_gb) + 
  tm_fill("private_transport_to_work", palette = "YlGn", style = "quantile", 
          n = 5, title = "Car Commute")+
  tm_layout(legend.position = c("left", "top"))+
  tm_shape(regions) + tm_borders(col = "black", lwd = 1)
p3 <- tm_shape(data_gb) + 
  tm_fill("not_good_health", palette = "YlOrRd", style = "quantile", 
          n = 5, title = "Poor health")+
  tm_layout(legend.position = c("left", "top"))+
  tm_shape(regions) + tm_borders(col = "black", lwd = 1)
tmap_arrange(p1,p2,p3, nrow = 1)
The result of combining multiple `tmap` objects.

(#fig:ch2fig16,)The result of combining multiple tmap objects.

The full functionality of tmap has only been touched on. It is covered in much greater depth throughout in Brunsdon and Comber (2018) but particularly in Chapters 3 and 5.

Key Points:

  • The basic syntax for mapping using the tmap package was introduced;
  • This has a layered form tm_shape(data = <data>) + tm_<function>()
  • By default tmap allocates a colour scheme, decides the breaks (divisions) for the attribute being mapped, places the legend somewhere, gives a legend title etc. All of these can be changed;
  • Additional spatial data layers can be added to the map;
  • tmap has all the usual map embellishments (scale bar, north arrow etc);
  • tmap allows interactive maps to be created with different map backdrops (e.g. OpenStreetMap, aerial imagery, etc);
  • the tmap_arrange function allows multiple maps to plotted together in the same plot window.

2.4.2 Mapping with ggplot2

The ggplot2 package can also be used to visualise spatial data. This section provides some outline examples of how to do this with point and area data. The code below selects the East Midlands for us to work with to create an sf object from a subset of data_gb:

east_mids = data_gb[data_gb$Region == "East Midlands",]

The code below plots the east_mids dataset introduced above in Figure 9, but rather than using tmap (or qtm), uses the ggplot2 package:

ggplot(east_mids) + geom_sf()
The map of the LADs in te East Midlands created using ggplot.

Figure 2.13: The map of the LADs in te East Midlands created using ggplot.

The geom_sf function, adds a geometry stored in an sf object to the plot. It looks for an attribute called geom or geometry in the sf object. On occasion you may need to specify this as in the code snippet below.

ggplot(east_mids) + geom_sf(aes(geometry = geometry))

As with tmap and other ggplot operations, colour can be specified directly or for a variable, by specifying plotting aesthetics (aes) using fill or colour depending on the what is being shaded in the plot:

# directly
ggplot(east_mids) +
  geom_sf(aes(geometry = geometry), colour = "darkgrey", fill = "azure")
# using a continuous variable and specify a thin line size
ggplot(east_mids) +
  geom_sf(aes(geometry = geometry, fill = degree_educated), size = 0.1) +
  scale_fill_viridis_c(name = "Degree educated")

It is also possible to add scale bars and north arrows using some of the functions in the ggspatial package, and to control the appearance of the map using both predefined themes (such theme_bw) and specific theme elements. Examples of some of these are shown in the code below to create the figure below:

ggplot(east_mids) +
  geom_sf(aes(geometry = geometry, fill = not_good_health), size = 0.1) +
  scale_fill_viridis_c(name = "Poor Health") + 
  annotation_scale(location = "tl") +
  annotation_north_arrow(location = "tl", which_north = "true", 
        pad_x = unit(0.9, "in"), pad_y = unit(0.25, "in"),
        style = north_arrow_fancy_orienteering) +
  theme_bw() +
  theme(panel.grid.major = element_line(color = gray(.5), linetype="dashed", size=0.5),
        panel.background = element_rect(fill="white"))
Map of Poor Healthin the East Midlands using `ggplot` with cartographic enhancements.

Figure 2.14: Map of Poor Healthin the East Midlands using ggplot with cartographic enhancements.

As with tmap, add multiple data layers can be included in the map. The code below creates and then adds the locations and names of some randomly selected LADs to the map (Note the set.seed() function makes sure that you all select the same random sample!).

set.seed(13)
labels.data = east_mids[sample(1:nrow(east_mids), 10), ]
labels.data = cbind(labels.data, st_coordinates(st_centroid(labels.data)))
# create the ggplot
ggplot(east_mids) +
  geom_sf(aes(geometry = geometry), fill = "white", size = 0.1) +
  geom_point(data = labels.data, aes(x=X, y=Y), colour = "red") +
  geom_text(data = labels.data, aes(x=X, y=Y, label=lad15nm),
    fontface = "bold", check_overlap = T) +
  theme_bw() +
  theme(axis.title = element_blank())

A final trick is to include a backdrop to the map. This is not interactive as with tmap in ‘view’ mode, but instead loads tiles of backdrop maps to provide context. This can be done using the annotation_map_tile function from ggspatial. Note the order it needs to be placed in the sequence ggplot layers of commands:

ggplot(east_mids) +
  annotation_map_tile(zoomin=0, type='osm') + 
  geom_sf(aes(geometry = geometry, fill = degree_educated), size = 0.1, alpha = 0.5) +
  scale_fill_viridis_c(name = "Degree educated")
## Zoom: 9
A ggplot map with OSM backdrop.

Figure 2.15: A ggplot map with OSM backdrop.

It is possible to compare the use of ggplot and tmap to visualise analyses of spatial structure in spatial data variables.

The key thing that we are interested in, that cannot be directly determined using the aspatial bivariate and multivariate approaches (correlations, correspondence tables etc), are the locations where for example leave votes were high or low.

The code below partitions the data around median leave share. This shows that there are clear spatial trends. Both tmap and ggplot approaches are shown:

# tmap
p1 = data_gb %>% 
  mutate(leave = ifelse(share_leave > median(share_leave), "High","Low")) %>%
  tm_shape() + 
    tm_graticules(ticks = FALSE, col = "grey") +
    tm_polygons("leave", title = "Leave",legend.is.portrait = FALSE) +
    tm_layout(legend.outside = F, legend.position = c("left", "top"))
# ggplot
p2 = data_gb %>% 
  mutate(leave = ifelse(share_leave > median(share_leave), "High","Low")) %>%
  ggplot() + 
    geom_sf(aes(geometry = geometry, fill = leave), col = "grey", size = 0.0) +
    scale_fill_discrete(name = "Leave") +
    theme_bw() +
    theme(panel.grid.major = element_line(color=gray(.5), linetype="dashed", size=0.3),
          panel.background = element_rect(fill="white"), legend.position=c(.2, .9), 
          axis.text = element_text(size = 6))
# plot together 
library(grid)
pushViewport(viewport(layout=grid.layout(1,2)))
print(p1, vp=viewport(layout.pos.col = 1))
print(p2, vp=viewport(layout.pos.col = 2))
LADs with High and Low areas of leave voting using both tmap (top) and ggplot (bottom) approaches.

Figure 2.16: LADs with High and Low areas of leave voting using both tmap (top) and ggplot (bottom) approaches.

The choice over ggplot or tmap approaches may relate to the user’s norms about maps, which many people believe should always be accompanied by a scale bar and a north arrow. Classic map production with common cartographic conventions such as legends, scale bars, north arrows, etc, are readily supported by tmap, and although possible with ggplot2 are less directly implemented. Comprehensive treatment of the use of the tmap and ggplot2 packages to map spatial data is given in Comber and Brunsdon (2018).

Key Points:

  • Mapping (Spatial EDA!) can be undertaken with ggplot as well as tmap with the geom_sf function;
  • Mapping in ggplot is supported by the ggspatial package which contains functions for map embellishments (scale bars, north arrows, backdrops etc);
  • Continuous spatial data properties can be visualised by partitions (e.g. around the mean or median) and supported by divergent palettes.

Reward

As a reward for doing this Optional set of tasks, here is a link to the best recording of Echo and the Bunnymen ever. They were much better live than in the studio but you need to play it loud on a proper stereo! https://www.youtube.com/watch?v=n8q14frWQ-U

If you don’t want to go for the full 30 mins and just want to dance then try the individual songs off the EP:

References

Beecham, Roger, Nick Williams, and Alexis Comber. 2020. “Regionally-Structured Explanations Behind Area-Level Populism: An Update to Recent Ecological Analyses.” Plos One 15 (3): e0229974.
Brunsdon, Chris, and Lex Comber. 2018. An Introduction to r for Spatial Analysis and Mapping (2e). Sage.
Comber, Lex, and Chris Brunsdon. 2021. Geographical Data Science and Spatial Data Analysis: An Introduction in r. Sage.

  1. see https://bit.ly/3tZ8Stp↩︎

  2. They can be found at https://epsg.io/ with this one being https://epsg.io/4326↩︎

  3. EPSG 27700 - see https://epsg.io/27700↩︎