Chapter 2 Refresh: Using R for data and spatial data
2.1 Introduction
This practical is intended as refresher for your spatial data handling skills in R/RStudio. It uses a series of worked examples to illustrate some fundamental and commonly applied operations on data and spatial datasets. It is based on Chapter 5 of Brunsdon and Comber (2018)1 and uses a series of worked examples to illustrate some fundamental and commonly applied spatial operations on spatial datasets. Many of these form the basis of most GIS software.
Specifically, in GIS and spatial analysis, we are often interested in finding out how the information contained in one spatial dataset relates to that contained in another. The datasets may be ones you have read into R from shapefiles or ones that you have created in the course of your analysis. Essentially, the operations illustrate different methods for extracting information from one spatial dataset based on the spatial extent of another. Many of these are what are frequently referred to as overlay operations in GIS software such as ArcGIS or QGIS, but here are extended to include a number of other types of data manipulation. The sections below describe how to create a point layer from a data table with location attributes and the following operations:
- Intersections and clipping one dataset to the extent of another
- Creating buffers around features
- Merging the features in a spatial dataset
- Point-in-Polygon and Area calculations
- Creating distance attributes
- Combining spatial data and attributes
An R script with all the code for this practical is provided.
2.1.1 Packages
To handle these data types we will need some bespoke tools (functions) and these are found in packages. Recall that if you have not used packages on the computer your are working on before then they will need to be installed, using the install.packages
function. The code below tests for the existence of the sf
, tidyverse
and tmap
packages and installs them if they are not found:
# sf for spatial data
if (!is.element("sf", installed.packages()))
install.packages("sf", dep = T)
# tmap for mapping
if (!is.element("tmap", installed.packages()))
install.packages("tmap", dep = T)
# load the packages
library(sf)
library(tmap)
2.1.2 Data
To do this, we will need some data and you can download and then load some data using the code below. This downloads data for Chapter 7 of Comber and Brunsdon (2021) from a repository
https://archive.researchdata.leeds.ac.uk/741/1/ch7.Rdata
download.file("https://archive.researchdata.leeds.ac.uk/741/1/ch7.Rdata",
"./wk1.RData", mode = "wb")
You should see file called ch1.RData
appear in your working directory and then you can load the data:
load("wk1.RData")
Note that you can check where your working directory is by entering getwd()
and you can change (reset) it by going to Session > Set Working Directory….
If you check what is is loaded you should see 2 R objects have been loaded into the session via the ch2.RData
file.
ls()
## [1] "lsoa" "oa" "properties"
Three data files are loaded:
oa
a multipolygonsf
object of Liverpool census Output Areas (OAs) (\(n = 1584\))lsoa
a multipolygonsf
object of Liverpool census Lower Super Output Areas (LSOAs) (\(n = 298\))properties
a pointsf
object scraped from Nestoria (https://www.nestoria.co.uk) (\(n = 4230\))
You can examine these in the usual way:
# look at the structure
str(oa)
str(lsoa)
# examine the first 6 rows
head(properties)
The census data attributes in oa
and lsoa
describe economic well-being (unemployment (the unmplyd
attribute)), life-stage indicators (percentage of under 16 years (u16
), 16-24 years (u25
), 25-44 years (u45
), 45-64 years (u65
) and over 65 years (o65
) and an environmental variable of the percentage of the census area containing greenspaces (gs_area
). The unemployment and age data were from the 2011 UK population census (https://www.nomisweb.co.uk) and the greenspace proportions were extracted from the Ordnance Survey Open Greenspace layer (https://www.ordnancesurvey.co.uk/opendatadownload/products.html). The spatial frameworks were from the EDINA data library (https://borders.ukdataservice.ac.uk). The layer are projected to the OSGB projection (EPSG 27700) and has a geo-demographic class label (OAC
) from the OAC (see https://data.cdrc.ac.uk/dataset/output-area-classification-2011).
We will work mainly with the oa
and properties
layers in this practical.
The properties
data was downloaded on 30th May 2019 using an API (https://www.programmableweb.com/api/nestoria). It contains latitude (Lat
) and longitude (Lon
) in decimal degrees,allowing an sf
point layer to be created from it, price in 1000s of pounds (£), the number of bedrooms and 38 binary variables indicating the presence of different keywords in the property listing such as Conservatory, Garage, Wood Floor etc.
The code below creates a flat data table from the properties
point layer and then removes it. The idea in not having a spatial sf
object properties
but a data table (like a spreadsheet) is similar to many real world situations - there is just a data table that you download, create, scrape or be sent (for example in your dissertation work).
# create a data frame version
= data.frame(properties)
props # check the names of the attributes
names(props)
## [1] "Kitchen" "Garden" "Modern"
## [4] "Gas.Central.Heating" "No.Chain" "Parking"
## [7] "Shared.Garden" "Double.Bedroom" "Balcony"
## [10] "New.Build" "Lift" "Gym"
## [13] "Porter" "Price" "Beds"
## [16] "Terraced" "Detached" "Semi.Detached"
## [19] "Conservatory" "Cul.de.Sac" "Bungalow"
## [22] "Garage" "Reception" "En.suite"
## [25] "Conversion" "Dishwasher" "Refurbished"
## [28] "Patio" "Cottage" "Listed"
## [31] "Fireplace" "Victorian" "Penthouse"
## [34] "Purpose.Built" "Wood.Floor" "Loft"
## [37] "Detached.Garage" "Auction" "Needs.Modernisation"
## [40] "Double.Garage" "geometry" "Lon"
## [43] "Lat"
# remove the geometry attribute
= props[, -41]
props # remove the properties and lsoa layers
rm(list = c("lsoa", "properties"))
# check what is present
ls()
## [1] "oa" "props"
Both of the remaining objects, oa
and props
have a data table format, with rows representing observations (people, places, dates, or in this cases census areas) and columns represent their attributes. Such long format data are the most commonly used and are similar to a standard spreadsheet, although the oa
object is spatial and has geometries attached to each row, as do spatial data used in a GIS.
2.2 R as a GIS
2.2.1 Creating point data
The properties data props
has locational attributes that can be used to convert it from a flat data table to point data.
This is the case in many situations and projects for which this - a flat data table - is the starting point: you are given a file with point observations of something (plus location) that you want to something spatial with.
The code converts the props
flat table to the proprttties
spatial object and undertakes this operation, using the WG84 projection for decimal degrees which has an EPSG code of 4236, noting that all projections have such a code2:
= st_as_sf(props, coords = c("Lon", "Lat"), crs = 4326) properties
This can be plotted using a standard tmap
approach:
tm_shape(properties) +
tm_dots("Price", size = 0.1,alpha = 0.5,
style = "kmeans",
palette = "inferno") +
tm_layout(bg.color = "grey95",
legend.position = c("right", "top"),
legend.outside = T)
Alternatively this can be done as an interactive map, by change the tmap_mode
:
# set tmap mode for interactive plots
tmap_mode("view")
# create the plot
tm_shape(properties) +
tm_dots("Price", alpha = 0.5,
style = "kmeans",
palette = "viridis") +
tm_basemap("OpenStreetMap")
# reset tamp mode
tmap_mode("plot")
Task 1
Write some code that generates an interactive tmap
of the number of bedrooms for the observations / records in the properties
layer.
NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.
2.2.2 Intersections and Clip Operations
Now, consider the situation where the aim was to analyse the incidence of properties for sale in the Output Areas of Liverpool: we do not want to analyse all of the properties data but only those records that describe events in our study area - the area we are interested in.
This can be plotted using the usual commands as in the code below. You can see that the plot extent is defined by the spatial extent of area of interest (here oa
) and that all of the properties within that extent are displayed.
# plot the areas
tm_shape(oa) +
tm_borders(col = "black", lwd = 0.1) +
# add the points
tm_shape(properties) +
tm_dots(col = "#FB6A4A") +
# define some style / plotting options
tm_layout(frame = F)
There are a number of ways of clipping spatial data in R. The simplest of these is to use the spatial extent of one as an index to subset another.
BUT to undertake any spatial operation with 2 spatial layers, they must be in the same projection. The tmap
code above was able to handle this but spatial operations may fail.
Examine the properties layer and note its projection (particularly through the Geodetic CRS
metadata at the top of the print out and the geometry
attribute:
properties
Now change the projection to OSGB3 and re-examine:
= st_transform(properties, 27700)
properties properties
The spatial extent of the oa
layer can now be used to clip out the intersecting observations form the proprties
layer:
= properties[oa,] prop_clip
This simply clips out the data from properties
that is within the spatial extent of oa
. You can check this:
tm_shape(oa) +
tm_fill() +
tm_shape(prop_clip) +
tm_dots()
However, such clip (or crop) operations simply subset data based on their spatial extents. There may be occasions when you wish to combine the attributes of difference datasets based on the spatial intersection. The st_intersection
in sf
allows us to do this as shown in the code below.
<- st_intersection(oa, properties) prop_int
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
The st_intersection
operation and the clip operation are based on spatial extents, do the same thing and the outputs are the same dimension but with subtle differences:
prop_clip prop_int
If you examine the data created by the intersection, you will notice that each of the intersecting points has the full attribution from both input datasets (you may have to scroll up in your console to see this!):
head(data.frame(prop_clip))
head(data.frame(prop_int))
Task 2
Write some code that selects the polygon in oa
with the LSOA code E00033902 (in the code
attribute in oa
) and then clips out the properties in that polygon. How many properties are there in E00033902?
NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.
2.2.3 Merging spatial features and Buffers
In many situations, we are interested in events or features that occur near to our area of interest as well as those within it. Environmental events such as tornadoes, for example, do not stop at state lines or other administrative boundaries. Similarly, if we were studying crimes locations or spatial access to facilities such as shops or health services, we would want to know about locations near to the study area border. Buffer operations provide a convenient way of doing this and buffers can be created in R using the st_buffer
function in sf
.
Continuing with the example above, we might be interested in extracting the properties within a certain distance of the Liverpool area, say 2km. Thus the objective is to create a 2km buffer around Liverpool and to use that to select from the properties dataset. The buffer function allow us to do that, and requires a distance for the buffer to be specified in terms of the units used in the projection.
The code below create a buffer, but does it for each OA - this is not we need in this case!
# apply a buffer to each object
<- st_buffer(x = oa, dist = 2000)
buf_liv # map
tm_shape(buf_liv) +
tm_borders()
The code below uses st_union
to merge all the OAs in oa
to a single polygon and then uses that as the object to be buffered:
# union the oa layer to a single polygon
<- st_sf(st_union(oa))
liv_merge # buffer this
<- st_buffer(liv_merge, 2000)
buf_liv # and map
tm_shape(buf_liv) +
tm_borders() +
tm_shape(liv_merge) + tm_fill()
2.2.4 Point-in-polygon and Area calculations
2.2.4.1 Point-in-polygon
It is often useful to count the number of points falling within different zones in a polygon data set. This can be done using the the st_contains
function in sf
.
The code below returns a list of counts of the number properties that occur inside each OA to the variable prop.count
and prints the first six of these to the console using the head
function:
= st_contains(oa, properties, sparse = F)
tmp = rowSums(tmp)
prop.count head(prop.count)
## [1] 3 0 0 5 4 5
length(prop.count)
## [1] 1584
Each of the observations in prop.count
corresponds in order to the OAs in oa
and could be attached to the data in the following way:
$prop.count = prop.count oa
This could be mapped using a full tmap
approach as in the below, using best practice for counts (choropleths should only be used for rates!):
tm_shape(oa) +
tm_dots(size = "prop.count", alpha = 0.5) +
tm_shape(liv_merge) + tm_borders()
2.2.4.2 Area calculations
Another useful operation is to be able calculate polygon areas. The st_area
function in and sf
does this. To check the projection, and therefore the map units, of an sf
objects use the st_crs
function:
st_crs(oa)
This declares the projection to be in metres. To see the areas in square metres enter:
head(st_area(oa))
These are not particularly useful and more realistic measures are to report areas in hectares or square kilometres:
# hectares
st_area(oa) / (100 * 100)
# square kilometres
st_area(oa) / (1000 * 1000)
Task 3
Your task is to create the code to produce maps of the densities of properties in each OA in Liverpool in properties per square kilometre. For the analysis you will need to use the properties
point data and the oa
dataset and undertake a point in polygon operation, apply an area function and undertake a conversion to square kilometres. The maps should be produced using the tm_shape
and tm_fill
functions in the tmap
package^[Hint: in the tm_fill
part of the mapping use style = "kmeans"
to set the colour breaks).
NB You are advised to attempt the tasks after the timetabled practical session. Worked answers are provided in the last section of the worksheet.
2.2.5 Creating distance attributes
Distance is fundamental to spatial analysis. For example, we may wish to analyse the number of locations (e.g. health facilities, schools, etc) within a certain distance of the features we are considering. In the exercise below distance measures are used to evaluate differences in accessibility for different social groups, as recorded in census areas. Such approaches form the basis of supply and demand modelling and provide inputs into location-allocation models.
Distance could be approximated using a series of buffers created at specific distance intervals around our features (whether point or polygons). These could be used to determine the number of features or locations that are within different distance ranges, as specified by the buffers using the poly.counts
function above. However distances can be measured directly and there a number of functions available in R to do this.
First, the most commonly used is the dist
function. This calculates the Euclidean distance between points in \(n-\)dimensional feature space. The example below developed from the help for dist
shows how it is used to calculate the distances between 5 records (rows) in a feature space of 20 hypothetical variables:
# create some random data
set.seed(123) # for reproducibility
<- matrix(rnorm(100), nrow = 5)
x colnames(x) <- paste0("Var", 1:20)
dist(x)
as.matrix(dist(x))
If your data is projected (i.e. in metres, feet etc) then dist
can also be used to calculate the Euclidean distance between pairs of coordinates (i.e. 2 variables not 20!):
= st_coordinates(st_centroid(oa))
coords = as.matrix(dist(coords, diag = F))
dmat dim(dmat)
1:6, 1:6] dmat[
The distance functions return a to-from matrix of the distances between each pair of locations. These could describe distances between any objects and such approaches underpin supply and demand modelling and accessibility analyses.
When determining geographic distances, it is important that you consider the projection properties of your data: if the data are projected using degrees (i.e. in latitude and longitude) then this needs to be considered in any calculation of distance. The st_distance
function in sf
calculates the Cartesian minimum distance (straight line) distance between two spatial datasets of class sf
projected in planar coordinates.
The code below calculates the distance between 2 features:
st_distance(oa[100,], properties[2,])
## Units: [m]
## [,1]
## [1,] 3118.617
And this can be extended to calculates multiple distances using a form of apply function. The code below uses st_distance
to determine the distance from the 100^th observation in oa
to each of the properties for sale:
= st_geometry(oa[100,])
g1 = st_geometry(properties)
g2 = mapply(st_distance, g1, g2)
dlist length(dlist)
## [1] 4230
head(dlist)
## [1] 4440.633 3118.617 10518.848 4435.707 2519.499 2519.499
You could check this by mapping the properties data, shaded by their distance to this location:
tm_shape(cbind(properties, dlist)) +
tm_dots("dlist", size = 0.1,alpha = 0.5,
style = "kmeans",
palette = "inferno") +
tm_layout(bg.color = "grey95",
legend.position = c("right", "top"),
legend.outside = T) +
tm_shape(oa[100,]) + tm_borders("red", lwd = 4)
2.2.6 Combining spatial datasets and their attributes
The point-in-polygon calculation above generates counts of the points falling in each polygon. A common situation in spatial analysis is the need to combine (overlay) different polygon features that describe the spatial distribution of different variables, attributes or processes that are of interest. The problem is that the data may have different underlying area geographies. In fact, it is commonly the case that different agencies, institutions and government departments use different geographical areas and even where they do not, geographical areas frequently change over time.
In these situations, we can use the intersection function st_intersection
in sf
to identify the area of intersection between different spatial datasets.
In this exercise, a zone dataset will be created with the aim of calculating the number of of properties for sale in each zone. Now we could just use a point in polygon operation with the raw properties data in this case, but the aim here to illustrate the technique, when for example data are only available over a specific set of areas such as the Output Areas here.
First, you should create the zones, number them with an ID and plot these on a map with the OA data:
## define a 1km grid in polygons
= st_make_grid(oa, 1000, what = "polygons", square = T)
sq = data.frame(ID = 1:length(sq))
sq_grid st_geometry(sq_grid) = sq
# clip the extent of oa
= sq_grid[oa,]
sq_grid # plot
tm_shape(sq_grid) + tm_polygons(col = "grey") +
tm_shape(oa) +tm_borders(col = "red")
The 2 layers, st_grid
and oa
have the same projections, and so they can be intersected using st_intersection
:
<- st_intersection(sq_grid, oa[, c("code","prop.count")]) int.res_sf
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
You can examine the intersected data:
head(int.res_sf)
You will see that the data.frame
of the intersected object contains composites of the inputs. These links can be used to create attributes for the intersection output data.
First, note that the ID
variable of int.res_sf
relates to the ID
variable of sq_grid
and the variable prop.count
is from the variable of the same name in oa
. We wish to summarise this over the zones of sq_grid
. Here the functionality of dplyr
single table operations that were introduced in Chapter 4 can be useful. Knowing the the unique identifiers of each polygon in both of the intersected layers is critical for working out proportions.
# generate area and proportions
<- st_area(int.res_sf)
int.areas <- st_area(oa)
liv.areas # match tract area to the new layer
<- match(int.res_sf$code, oa$code)
index <- liv.areas[index]
liv.areas <- as.vector(int.areas)/as.vector(liv.areas) liv.prop
The liv.prop
object can be used to create a variable in the data frame of the new layer
$props <- oa$prop.count[index] * liv.prop int.res_sf
And this can be summarised using the functionality in dplyr
:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# summarise the counts of properties of the grid cells
%>% st_drop_geometry() %>%
int.res_sf group_by(ID) %>%
summarise(count = sum(props)) -> props
# link to the grid data
%>% left_join(props) -> sq_grid sq_grid
## Joining, by = "ID"
The results can be plotted :
tm_shape(sq_grid) +
tm_polygons("count", palette = "Greens",
style = "kmeans", title = "Properties for sale") +
tm_layout(frame = F, legend.position = c(1,0.5))
Of course, all of this can be done using st_interpolate_aw
which wraps up all of the above steps into a single function:
= st_interpolate_aw(oa[, "prop.count"], sq_grid, extensive = T)
props2 tm_shape(props2) +
tm_polygons("prop.count", palette = "Greens",
style = "kmeans", title = "No. of Properties for sale") +
tm_layout(frame = F, legend.position = c(1,0.5))
2.2.7 Reading and writing data in and out of R/RStudio
There are 2 basic approaches:
- read and write data in and out of R/Rstudio in propriety formats such as shapefiles, CSV files etc.
- read and write data using
.Rdata
files in R’s binary format and the load and save functions.
The code below provides does these as a reference for you:
Tabular data (CSV format)
write.csv(props, file = "props.csv", row.names = FALSE)
= read.csv("props.csv", header=TRUE) p2
Tabular data (.RData format)
save(props, file = "props.RData")
load("props.RData") # notice that this needs no assignation e.g. to p2 or props
Spatial data (Proprietary format)
# as shapefile
st_write(properties, "point.shp", delete_layer = T)
= st_read("point.shp")
p2 # as GeoPackage
st_write(oa, "area.gpkg", delete_layer = T)
= st_read("area.gpkg") a2
Spatial data (R format)
save(oa, file = "areas.RData")
load("areas.RData") # notice that this needs no assignation e.g. to p2 or props
2.3 Answer to Tasks
Task 1 Write some code that generates an interactive map of the number of bedrooms for the observations / records on properties
.
# basic
tm_shape(properties) +
tm_dots("Beds")
# with embellishment
tm_shape(properties) +
tm_dots("Beds", size = 0.2, title = "Bedrooms",
style = "kmeans", palette = "viridis") +
tm_compass(bg.color = "white")+
tm_scale_bar(bg.color = "white")
Task 2 Write some code that selects the polygon in oa
with the LSOA code E00033902 (in the code
attribute in oa
) and then clips out the properties in that polygon. How many properties are there in E00033902?
## There are lots of ways of doing this but each of them uses a logical statement
# 1. using an index to create a subset and then clip
= oa$code == "E00033902"
index = oa[index,]
oa_select
properties[oa_select,]
# 2. using the layer you have created already
sum(prop_int$code == "E00033902")
Task 3 Produce a map of the densities of properties for sale in each OA block in Liverpool in properties per square kilometre
# point in polygon
<- rowSums(st_contains(oa,properties,sparse = F))
p.count # area calculation
<- st_area(oa) / (1000 * 1000)
oa.area # combine and assign to the oa data
$p.p.sqkm <- as.vector(p.count/oa.area)
oa# map
tm_shape(oa) +
tm_fill("p.p.sqkm", style = "kmeans", title ="")
2.4 Optional: mapping with tmap
and ggplot2
We have been using tmap
in the above exercises without really commenting on what is being done. This section covers mapping in tmap
in a bit more detail and also shows how maps can be created with the ggplot2
package, supported by the ggspatial
package. The grid
package is used for combining multiple plot objects.
## install packages
# tidyverse for many things!
if (!is.element("tidyverse", installed.packages()))
install.packages("tidyverse", dep = T)
# ggspatial for mapping with ggplot
if (!is.element("ggspatial", installed.packages()))
install.packages("ggspatial", dep = T)
# grid for plotting
if (!is.element("grid", installed.packages()))
install.packages("grid", dep = T)
# load packages
library(tidyverse)
library(grid)
library(ggspatial)
The code loads the Local Authority Districts (LADs) in Great Britain with attributes describing the Brexit vote and additional demographic data. You will need to have the data_gb.geojson
in your working directory on your computer:
This is the same data and variables used in the analysis of the Brexit vote by Beecham, Williams, and Comber (2020). You can inspect in the usual way
summary(data_gb)
str(data_gb)
2.4.1 Mapping with tmap
The tmap
package supports the thematic visualization of spatial data. It has a grammatical style that handles each element of the map separately in a series of layers (it is similar to ggplot
in this respect - see below). In so doing it seeks to exercise control over each element in the map. This is different to the basic R plot functions. The basic syntax of tmap
is:
tm_shape(data = <data>)+
<function>() tm_
NB do not run the code above - its is simply showing the syntax of tmap
. However, you have probably noticed that it uses a syntactical style similar to ggplot
to join together layers of commands, using the +
sign. The tm_shape
function initialises the map and then layer types, the variables that get mapped etc, are specified using different flavours of tm_<function>
. The main types are listed in the table below.
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Function | Description |
---|---|
tm_dots, tm_bubbles | for points either simple or sized according to a variable |
tm_lines | for line features |
tm_fill, tm_borders, tm_polygons | for polygons / areas, with and without attributes and polygon outlines |
tm_text | for adding text to map features |
tm_format, tm_facet, tm_layout | for adjusting cartographic appearance and map style, including legends |
tm_view, tm_compass, tm_credits, tm_scale_bar | for including traditional cartography elements |
Let’s start with a simple choropleth map. These show the distribution of a continuous variable in the spatial data (typically polygons / areas or points)
tm_shape(data_gb)+
tm_polygons("share_leave")
By default tmap
picks a shading scheme, the class breaks and places a legend somewhere. All of these can be changed. The code below allocates the tmap
plot to p1
(plot 1) and then prints it:
= tm_shape(data_gb) +
p1 tm_polygons("share_leave", palette = "GnBu", border.col = "salmon",
breaks = seq(0,0.8, 0.1), title = "Leave Share")+
tm_layout(legend.position = c("right", "top"), frame = F)
p1
And of course many other elements included either by running the code snippet defining p1
above with additional lines or by simply adding them as in the code below
+
p1 tm_scale_bar(position = c("left", "top")) +
tm_compass(position = c(0.15, 0.9))
It is also possible to add or overlay other data such as the region boundaries, which because this is another spatial data layer needs to be added with tm_shape
followed by the usual function. Notice the use of the piping syntax to create regions
:
# first make the region boundaries
<- data_gb %>% group_by(Region) %>% summarise()
regions
# then add to the plot
+ tm_shape(regions) + tm_borders(col = "black", lwd = 1) p1
The tmap
package can be used to plot multiple attributes in the same plot in this case the leave share and degree educated proportions:
tm_shape(data_gb)+
tm_fill(c("share_leave", "degree_educated"), palette = "YlGnBu",
breaks = seq(0,1, 0.1), title = c("Leave Share", "Degree"))+
tm_layout(legend.position = c("left", "bottom"))
The code below applies further refinements to the code to generate the next map. Notice the use of title
and legend.hist
, and then subsequent parameters passed to tm_layout
to control the legend:
tm_shape(data_gb)+
tm_polygons("share_leave", title = "Leave Share", palette = "Reds",
style = "kmeans", legend.hist = T)+
tm_layout(title = "Leave Vote Share \nin the UK",
frame = F, legend.outside = T,
legend.hist.width = 1,
legend.format = list(digits = 1),
legend.outside.position = c("left", "top"),
legend.text.size = 0.7,
legend.title.size = 1) +
tm_scale_bar(position = c("left", "top")) +
tm_compass(position = c("left", "top")) +
tm_shape(regions) + tm_borders(col = "black", lwd = 1)
This is quite a lot of code to unpick. The first call to tm_shape
determines which layer is to be mapped, then a specific mapping function is applied to that layer (in this case tm_polygons
) and a variable is passed to it. Other parameters are used to specify are the palette, whether a histogram is to be displayed and the type of breaks to be imposed (here a \(k\)-means was applied). The help for tm_polygons
describes a number of different parameters that can be passed to tmap
elements. Next, some adjustments were made to the defaults through the tm_layout
function, which allows you to override the defaults that tmap
automatically assigns (shading scheme, legend location etc). Finally the region layer was added.
There are literally 1000’s of options with tmap
and many of them are controlled in the tm_layout
function. You should inspect this:
?tm_layout
The code below creates and maps point data values in different ways. There are 2 basic options: change the size or change the shading of the points according to the attribute value. The code snippets below illustrate these approaches.
Create the point layer:
= st_centroid(data_gb) lads_pts
Then map by size using tm_bubbles
as in the figure below:
tm_shape(regions) +
tm_fill("olivedrab4") +
tm_borders("grey", lwd = 1) +
# the points layer
tm_shape(lads_pts) +
tm_bubbles("share_leave", title.size = "Leave Vote", scale = 0.6, col = "gold") +
tm_layout(legend.position = c("left", "top"))
Second by shade using tm_dots
as below:
# the 1st layer
tm_shape(regions) +
tm_fill("grey70") +
tm_borders(lwd = 1) +
# the points layer
tm_shape(lads_pts) +
tm_dots("degree_educated", size = 0.1, title = "Degree", shape = 19) +
tm_layout(legend.outside = T, frame = F)
You should explore the introductory tmap
vignettes which describes other approaches for mapping different types of variables, in built styles, plotting multiple layers:
vignette("tmap-getstarted", package = "tmap")
It is possible to use an OpenStreetMap backdrop for some basic web-mapping with tmap
by switching the view from the default mode from plot
to view
. You need to be on-line and connected to the internet. The code below generates a zoomable web-map map with an OpenStreetMap backdrop in the figure below, with a transparency term to aid the understanding of the local map context:
# set the mode
tmap_mode("view")
# do the plot
tm_shape(data_gb[data_gb$Region == "East",]) +
tm_fill(c("share_leave"), palette = "Reds", alpha = 0.6) +
tm_basemap('OpenStreetMap')
Aerial imagery and satellite imagery (depending on the scale and extent of the scene) can also be used to provide context as in the figure below:
tm_shape(data_gb[data_gb$Region == "East",]) +
tm_fill(c("share_leave"), palette = "Reds", alpha = 0.6) +
tm_basemap('Esri.WorldImagery')
Other backdrops can be specified as well as OSM. Try the following: Stamen.Watercolor
, Esri.WorldGrayCanvas
, OpenStreetMap
and Esri.WorldTopoMap
.
The tmap_mode
needs to be reset to return to a normal map view:
tmap_mode("plot")
Finally in this brief description of tmap
, when maps are assigned to a named R object (like p1
) then you can exert greater control over how multiple maps are plotted together using the tmap_arrange
function. This allows the parameters for each map to be specified individually before they displayed together as in the figure below.
<- tm_shape(data_gb) +
p1 tm_fill("younger_adults", palette = "Reds", style = "quantile",
n = 5, title = "Younger Adults")+
tm_layout(legend.position = c("left", "top"))+
tm_shape(regions) + tm_borders(col = "black", lwd = 1)
<- tm_shape(data_gb) +
p2 tm_fill("private_transport_to_work", palette = "YlGn", style = "quantile",
n = 5, title = "Car Commute")+
tm_layout(legend.position = c("left", "top"))+
tm_shape(regions) + tm_borders(col = "black", lwd = 1)
<- tm_shape(data_gb) +
p3 tm_fill("not_good_health", palette = "YlOrRd", style = "quantile",
n = 5, title = "Poor health")+
tm_layout(legend.position = c("left", "top"))+
tm_shape(regions) + tm_borders(col = "black", lwd = 1)
tmap_arrange(p1,p2,p3, nrow = 1)
The full functionality of tmap
has only been touched on. It is covered in much greater depth throughout in Brunsdon and Comber (2018) but particularly in Chapters 3 and 5.
Key Points:
- The basic syntax for mapping using the
tmap
package was introduced; - This has a layered form
tm_shape(data = <data>) + tm_<function>()
- By default
tmap
allocates a colour scheme, decides the breaks (divisions) for the attribute being mapped, places the legend somewhere, gives a legend title etc. All of these can be changed; - Additional spatial data layers can be added to the map;
tmap
has all the usual map embellishments (scale bar, north arrow etc);tmap
allows interactive maps to be created with different map backdrops (e.g. OpenStreetMap, aerial imagery, etc);- the
tmap_arrange
function allows multiple maps to plotted together in the same plot window.
2.4.2 Mapping with ggplot2
The ggplot2
package can also be used to visualise spatial data. This section provides some outline examples of how to do this with point and area data. The code below selects the East Midlands for us to work with to create an sf
object from a subset of data_gb
:
= data_gb[data_gb$Region == "East Midlands",] east_mids
The code below plots the east_mids
dataset introduced above in Figure 9, but rather than using tmap
(or qtm
), uses the ggplot2
package:
ggplot(east_mids) + geom_sf()
The geom_sf
function, adds a geometry stored in an sf
object to the plot. It looks for an attribute called geom
or geometry
in the sf
object. On occasion you may need to specify this as in the code snippet below.
ggplot(east_mids) + geom_sf(aes(geometry = geometry))
As with tmap
and other ggplot
operations, colour can be specified directly or for a variable, by specifying plotting aesthetics (aes
) using fill
or colour
depending on the what is being shaded in the plot:
# directly
ggplot(east_mids) +
geom_sf(aes(geometry = geometry), colour = "darkgrey", fill = "azure")
# using a continuous variable and specify a thin line size
ggplot(east_mids) +
geom_sf(aes(geometry = geometry, fill = degree_educated), size = 0.1) +
scale_fill_viridis_c(name = "Degree educated")
It is also possible to add scale bars and north arrows using some of the functions in the ggspatial
package, and to control the appearance of the map using both predefined themes (such theme_bw
) and specific theme
elements. Examples of some of these are shown in the code below to create the figure below:
ggplot(east_mids) +
geom_sf(aes(geometry = geometry, fill = not_good_health), size = 0.1) +
scale_fill_viridis_c(name = "Poor Health") +
annotation_scale(location = "tl") +
annotation_north_arrow(location = "tl", which_north = "true",
pad_x = unit(0.9, "in"), pad_y = unit(0.25, "in"),
style = north_arrow_fancy_orienteering) +
theme_bw() +
theme(panel.grid.major = element_line(color = gray(.5), linetype="dashed", size=0.5),
panel.background = element_rect(fill="white"))
As with tmap
, add multiple data layers can be included in the map. The code below creates and then adds the locations and names of some randomly selected LADs to the map (Note the set.seed()
function makes sure that you all select the same random sample!).
set.seed(13)
= east_mids[sample(1:nrow(east_mids), 10), ]
labels.data = cbind(labels.data, st_coordinates(st_centroid(labels.data)))
labels.data # create the ggplot
ggplot(east_mids) +
geom_sf(aes(geometry = geometry), fill = "white", size = 0.1) +
geom_point(data = labels.data, aes(x=X, y=Y), colour = "red") +
geom_text(data = labels.data, aes(x=X, y=Y, label=lad15nm),
fontface = "bold", check_overlap = T) +
theme_bw() +
theme(axis.title = element_blank())
A final trick is to include a backdrop to the map. This is not interactive as with tmap
in ‘view’ mode, but instead loads tiles of backdrop maps to provide context. This can be done using the annotation_map_tile
function from ggspatial
. Note the order it needs to be placed in the sequence ggplot
layers of commands:
ggplot(east_mids) +
annotation_map_tile(zoomin=0, type='osm') +
geom_sf(aes(geometry = geometry, fill = degree_educated), size = 0.1, alpha = 0.5) +
scale_fill_viridis_c(name = "Degree educated")
## Zoom: 9
It is possible to compare the use of ggplot
and tmap
to visualise analyses of spatial structure in spatial data variables.
The key thing that we are interested in, that cannot be directly determined using the aspatial bivariate and multivariate approaches (correlations, correspondence tables etc), are the locations where for example leave votes were high or low.
The code below partitions the data around median leave share. This shows that there are clear spatial trends. Both tmap
and ggplot
approaches are shown:
# tmap
= data_gb %>%
p1 mutate(leave = ifelse(share_leave > median(share_leave), "High","Low")) %>%
tm_shape() +
tm_graticules(ticks = FALSE, col = "grey") +
tm_polygons("leave", title = "Leave",legend.is.portrait = FALSE) +
tm_layout(legend.outside = F, legend.position = c("left", "top"))
# ggplot
= data_gb %>%
p2 mutate(leave = ifelse(share_leave > median(share_leave), "High","Low")) %>%
ggplot() +
geom_sf(aes(geometry = geometry, fill = leave), col = "grey", size = 0.0) +
scale_fill_discrete(name = "Leave") +
theme_bw() +
theme(panel.grid.major = element_line(color=gray(.5), linetype="dashed", size=0.3),
panel.background = element_rect(fill="white"), legend.position=c(.2, .9),
axis.text = element_text(size = 6))
# plot together
library(grid)
pushViewport(viewport(layout=grid.layout(1,2)))
print(p1, vp=viewport(layout.pos.col = 1))
print(p2, vp=viewport(layout.pos.col = 2))
The choice over ggplot
or tmap
approaches may relate to the user’s norms about maps, which many people believe should always be accompanied by a scale bar and a north arrow. Classic map production with common cartographic conventions such as legends, scale bars, north arrows, etc, are readily supported by tmap
, and although possible with ggplot2
are less directly implemented. Comprehensive treatment of the use of the tmap
and ggplot2
packages to map spatial data is given in Comber and Brunsdon (2018).
Key Points:
- Mapping (Spatial EDA!) can be undertaken with
ggplot
as well astmap
with thegeom_sf
function; - Mapping in
ggplot
is supported by theggspatial
package which contains functions for map embellishments (scale bars, north arrows, backdrops etc); - Continuous spatial data properties can be visualised by partitions (e.g. around the mean or median) and supported by divergent palettes.
Reward
As a reward for doing this Optional set of tasks, here is a link to the best recording of Echo and the Bunnymen ever. They were much better live than in the studio but you need to play it loud on a proper stereo! https://www.youtube.com/watch?v=n8q14frWQ-U
If you don’t want to go for the full 30 mins and just want to dance then try the individual songs off the EP:
- Over the Wall: https://www.youtube.com/watch?v=OW00GPMs27w
- Crocodiles: https://www.youtube.com/watch?v=hevQHVbF-3A
- All that Jazz: https://www.youtube.com/watch?v=E32NjqddSmI
- Zimbo (All My Colours): https://www.youtube.com/watch?v=HgbAHwOBXug
2.5 Useful Resources
There are few links in the list below that you may find useful:
- Spatial operations, Chapter 4 in https://geocompr.robinlovelace.net/spatial-operations.html
- An Intro to Spatial Data Analysis https://www.spatialanalysisonline.com/An%20Introduction%20to%20Spatial%20Data%20Analysis%20in%20R.pdf
- Making Maps with R https://bookdown.org/nicohahn/making_maps_with_r5/docs/introduction.html
- Chapter 5 in https://geocompr.robinlovelace.net/adv-map.html
- An Introduction to Spatial Data Analysis and Visualisation in R https://data.cdrc.ac.uk/dataset/introduction-spatial-data-analysis-and-visualisation-r
Data sources also:
- ONS spatial data: https://geoportal.statistics.gov.uk/search?collection=Map
- Spatial data from ONS (shapefile format) - https://www.data.gov.uk/search?q=&filters%5Bpublisher%5D=&filters%5Btopic%5D=Mapping&filters%5Bformat%5D=SHP&sort=best
- House of Commons data: https://commonslibrary.parliament.uk/data-tools-and-resources/
- uk-hex-cartograms-noncontiguous, House of Commons github, https://github.com/houseofcommonslibrary/uk-hex-cartograms-noncontiguous
References
They can be found at https://epsg.io/ with this one being https://epsg.io/4326↩︎
EPSG 27700 - see https://epsg.io/27700↩︎