Chapter 7 Getting data from APIs
7.1 Overview
The is session is all about getting data! In this practical session you will develop a crime analysis using data from the UK Police website / API, and then from the Nomis census API to link crime data to socio-economic variables.
You should note a few things for the crime data:
- the crime data from https://data.police.uk/data/ - it comes in monthly chunks;
- functions are used to extract the location and crime type from the data from the
RCurl
andrjsonlite
packages; - you will extract data on a particular kind of crime;
You should note a few other things for the population Census API:
- Nomis is the official population census site: https://www.nomisweb.co.uk. You should have a look if you are not familiar.
- Census data are available for different years, and over different geographies
- importantly census data are summarised in different ways
- here, the
nomisr
package is used to access the Nomis web API - variables are extracted to support the creation of a Townsend measure of deprivation
The final section brings the crime and deprivation index together, and in so doing the practical suggests how data from different sources can be combined, to develop geocomputational models of socio-economic processes for example.
7.2 Packages and Data
You will need to load the following packages for this practical. Some may need installing but you are experienced at this now.
library(httr)
library(jsonlite)
library(sf)
library(tmap)
library(tidyverse)
library(nomisr)
select = dplyr::select
You will need the following data in your working directory from the VLE: leeds_lsoa.gpkg
.
7.3 The Police API
7.3.1 Getting and mapping crime data
The first thing is to download some data from the police API. The code below downloads data for September 2023 for an area around a location 53.7997 North and -1.5492 West (do you know where this is?).
# specify the url - the web address
url = paste0("http://data.police.uk/api/crimes-street/all-crime",
"?lat=53.7997",
"&lng=-1.5492",
"&date=2023-09")
# use the GET function to "get" the url response object
# (see the help for GET)
x = GET(url)
# finally extract and assign to a data table
crimes <- as_tibble(
fromJSON(httr::content(x, as = "text", encoding = "utf8"),
flatten = T
)
)
Now before we investigate what has been downloaded, have a look at the web address in url
:
To understand what is going on here have a look at how the call to the Police API is formed for street crime at url(https://data.police.uk/docs/method/crime-street/). In the url
object above, notice the use of the ‘?’ and the ‘&’ to construct the query with lat
, lng
and date
.
We can examine the crimes
object:
## # A tibble: 1,714 × 13
## category location_type context persistent_id id location_subtype month
## <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 anti-socia… Force "" "" 1.13e8 "" 2023…
## 2 anti-socia… Force "" "" 1.13e8 "" 2023…
## 3 anti-socia… Force "" "" 1.13e8 "" 2023…
## 4 anti-socia… Force "" "" 1.13e8 "" 2023…
## 5 anti-socia… Force "" "" 1.13e8 "" 2023…
## 6 anti-socia… Force "" "" 1.13e8 "" 2023…
## 7 anti-socia… Force "" "" 1.13e8 "" 2023…
## 8 anti-socia… Force "" "" 1.13e8 "" 2023…
## 9 anti-socia… Force "" "" 1.13e8 "" 2023…
## 10 anti-socia… Force "" "" 1.13e8 "" 2023…
## # ℹ 1,704 more rows
## # ℹ 6 more variables: location.latitude <chr>, location.longitude <chr>,
## # location.street.id <int>, location.street.name <chr>,
## # outcome_status.category <chr>, outcome_status.date <chr>
The data is in tibble
format with a number of attributes:
The 2 location attributes are of particular interest here. The functions below extract the coordinates and the attributes and renders them into a flat data table format:
# 1. Get location
getLonLat <- function(x) {
df = data.frame(lon = as.numeric(x$location.longitude),
lat = as.numeric(x$location.latitude))
# return the dataframe
return(df)
}
# 2. Get attributes
getAttr <- function(x) {
df = data.frame(
category = x$category,
street_name = x$location.street.name,
location_type = x$location_type,
month = substr(x$month, 6,9),
year = substr(x$month, 1,4))
# return the data.frame
return(df)
}
The workings of these functions can be investigated for example by assigning crimes
to x
and running lines of the code:
Then they can be applied to the crimes
object:
Finally a function is defined that uses the location information and the attributes to create a spatial object in sf
format:
# join together and make a spatial (sf) object
makeSpatial = function(crimes.loc, crimes.attr){
# create a data frame
df = data.frame(longitude = crimes.loc[,1],
latitude = crimes.loc[,2],
crimes.attr)
# convert to sf
df_sf = st_as_sf(df, coords = c("longitude", "latitude"),
crs = 4326, agr = "constant")
# return the sf
return(df_sf)
}
# and apply
crimes_sf = makeSpatial(crimes.loc, crimes.attr)
It is possible to see the counts of different crime types using the table
function:
##
## anti-social-behaviour bicycle-theft burglary
## 77 57 111
## criminal-damage-arson drugs other-crime
## 81 52 24
## other-theft possession-of-weapons public-order
## 150 13 190
## robbery shoplifting theft-from-the-person
## 47 308 93
## vehicle-crime violent-crime
## 86 425
And using the code below you can extract a crime type and plot them on a map - note the use of the alpha parameter so that crime densities are shown1. This is need:
## tmap mode set to interactive viewing
## tmap mode set to plotting
7.3.2 Getting and mapping more crime data
In the above example crime data has just been obtained for a single month. It is very easy to get data for a longer period, a year for example.
This is done by putting the operations above into a loop. The code below extends the single month to a year (1 to 12 months in the loop) simply by passing a different date variable to the getForm
function, and appends the answer to a list.This is done for crimes that involve possession-of-weapons
.
Note the while
loop in the middle to test for a lack of a server-side error. This would blow the loop out. The while
loop tests for this and if present repeated queries the server until the data are returned.
# create empty vectors for the results
# these will convert to a data.table in the first iteration
# of the loop and then are subsequently added to
crimes.loc.tab = vector()
crimes.attr.tab = vector()
for (i in 1:12) {
# create the date
date.i <- paste0("2022-",i)
# pass to the API
url.i <- paste0("http://data.police.uk/api/crimes-street/all-crime",
"?lat=53.7997",
"&lng=-1.5492",
"&date=", date.i)
x.i = GET(url.i)
while (x.i$status_code == 500) {
x.i = GET(url.i)
}
crimes.i <- as_tibble(
fromJSON(httr::content(x.i, as = "text", encoding = "utf8"),
flatten = T
)
)
# add the result to the results
crimes.loc.tab <- rbind(crimes.loc.tab, getLonLat(crimes.i))
crimes.attr.tab <- rbind(crimes.attr.tab, getAttr(crimes.i))
# print out a little indicator of progress
cat("downloaded month", i, "\n")
}
## downloaded month 1
## downloaded month 2
## downloaded month 3
## downloaded month 4
## downloaded month 5
## downloaded month 6
## downloaded month 7
## downloaded month 8
## downloaded month 9
## downloaded month 10
## downloaded month 11
## downloaded month 12
Then you can have a look at the data, convert to an sf
spatial object and map the results:
## lon lat
## 1 -1.543925 53.79574
## 2 -1.550334 53.79696
## 3 -1.541691 53.79601
## 4 -1.565558 53.80603
## 5 -1.542859 53.79891
## 6 -1.544913 53.79843
crimes_sf_2021 = makeSpatial(crimes.loc.tab, crimes.attr.tab)
bike_nickers.pts <- crimes_sf_2021[crimes_sf_2021$category=="bicycle-theft",]
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(bike_nickers.pts)+
tm_dots(alpha = 0.2, shape = 1, size = 0.1)+
tm_basemap('OpenStreetMap')
## tmap mode set to plotting
7.3.3 Getting and mapping lots of crime data
The above extension got data from the API for a longer time period and grouped the results to show patterns for the year. However this was for an area approximately a mile around a single location. It is possible to further extend this spatially by defining a bounding box or a polygon to get the data. The code below reads in some LSOA data for Leeds and then extracts some contiguous LSOAs as an area to investigate:
# read in a Leeds LSOA
leeds = st_read("leeds_lsoa.gpkg", quiet = T)
# transform to lat / lon - WGS84
leeds = st_transform(leeds, 4326)
# set up list of LSOAs
codes = c("E01011351", "E01011352", "E01011353", "E01011356", "E01011359")
# extract from leeds
poly.temp <- leeds %>% filter(code %in% codes)
# have a look
tmap_mode("view")
## tmap mode set to interactive viewing
## tmap mode set to plotting
We can now use these polygons as an area to extract crimes for by slightly changing the arguments we pass to the API call. First the coordinates of the bounding box for this area are extracted
## xmin ymin xmax ymax
## -1.542430 53.816416 -1.515997 53.835344
These are used to create a sequence of coordinates to be passed to the API, of the box - notice how the first coordinate pair are repeated at the end to close the box:
X = round(c(bb[1], bb[1], bb[3], bb[3], bb[1]), 3)
Y = round(c(bb[2], bb[4], bb[4], bb[2], bb[2]), 3)
poly_paste <- paste(paste(Y, X, sep = ","), collapse = ":")
poly_paste
## [1] "53.816,-1.542:53.835,-1.542:53.835,-1.516:53.816,-1.516:53.816,-1.542"
Finally these can be passed to the API using the poly=
parameter:
url = paste0("https://data.police.uk/api/crimes-street/all-crime?poly=",
poly_paste,
"&date=2023-09")
x = GET(url)
crimes <- as_tibble(
fromJSON(httr::content(x, as = "text", encoding = "utf8"),
flatten = T
)
)
crimes
## # A tibble: 207 × 13
## category location_type context persistent_id id location_subtype month
## <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 anti-socia… Force "" "" 1.13e8 "" 2023…
## 2 anti-socia… Force "" "" 1.13e8 "" 2023…
## 3 anti-socia… Force "" "" 1.13e8 "" 2023…
## 4 anti-socia… Force "" "" 1.13e8 "" 2023…
## 5 anti-socia… Force "" "" 1.13e8 "" 2023…
## 6 anti-socia… Force "" "" 1.13e8 "" 2023…
## 7 anti-socia… Force "" "" 1.13e8 "" 2023…
## 8 anti-socia… Force "" "" 1.13e8 "" 2023…
## 9 anti-socia… Force "" "" 1.13e8 "" 2023…
## 10 anti-socia… Force "" "" 1.13e8 "" 2023…
## # ℹ 197 more rows
## # ℹ 6 more variables: location.latitude <chr>, location.longitude <chr>,
## # location.street.id <int>, location.street.name <chr>,
## # outcome_status.category <chr>, outcome_status.date <chr>
Again you should have a look at the url
and even use the BROWSE
function to explore it as before.
Now there are are limits to what can be passed to the API in terms of the complexity of the definition of a polygon2. So the code below creates a 5km grid for the area of Leeds and passes each grid in turn to the API, using yet another for
loop.
# transform back to OSGB
leeds = st_transform(leeds, 27700)
# create a grid: 1. the geometry
gr_geom = st_make_grid(leeds, cellsize = 5000)
# 2. a data frame of IDs
gr = data.frame(ID = 1:length(gr_geom))
# 3. apply the geometry to the data.frame to make an sf object
st_geometry(gr) = gr_geom
You could examine this:
## tmap mode set to interactive viewing
## tmap mode set to plotting
The function below extracts the coordinates from each grid cell and formats them so that they can be passed to the API with a poly call.
get_poly_coords = function(x){
# transform to lat lon
x = st_transform(x, 4326)
# extract coordinates
coords = data.frame(st_coordinates(x)[, c("X", "Y")])
poly_paste <- paste(paste(coords$Y, coords$X, sep = ","), collapse = ":")
return(poly_paste)
}
To test this and to show what the above function is doing, examine this for a single grid cell:
## [1] "53.699629746901,-1.80124066773008:53.6994803352637,-1.72550965108028:53.7444201684409,-1.72521713914431:53.7445698244954,-1.80102893467678:53.699629746901,-1.80124066773008"
# URL for leeds area
url=paste0("https://data.police.uk/api/crimes-street/all-crime?poly=",
coords,
"&date=2023-09")
x = GET(url)
crimes <- as_tibble(
fromJSON(httr::content(x, as = "text", encoding = "utf8"),
flatten = T
)
)
# extract data
crimes.loc <- getLonLat(crimes)
crimes.attr <- getAttr(crimes)
crimes_sf = makeSpatial(crimes.loc, crimes.attr)
# and map
tmap_mode("view")
tm_shape(gr[1,])+ tm_borders() +
tm_shape(crimes_sf) + tm_dots(alpha = 0.2, shape = 1, size = 0.1)+
tm_basemap('OpenStreetMap')
Finally this can be put into a loop for all the grid cells. Again note the while
loop in the middle to avoid server-side errors.
# define some results tables as before
crimes.loc.tab = vector()
crimes.attr.tab = vector()
for(i in 1:nrow(gr)){
coords = get_poly_coords(gr[i,])
url.i=paste0("https://data.police.uk/api/crimes-street/all-crime?",
"poly=", coords,
"&date=2023-09")
x.i = GET(url.i)
while (x.i$status_code == 500) {
x.i = GET(url.i)
}
crimes.i <- as_tibble(
fromJSON(httr::content(x.i, as = "text", encoding = "utf8"),
flatten = T)
)
crimes.loc.tab <- rbind(crimes.loc.tab, getLonLat(crimes.i))
crimes.attr.tab <- rbind(crimes.attr.tab, getAttr(crimes.i))
# print out a little indicator of progress
cat("grid cell", i, "done \n")
}
## grid cell 1 done
## grid cell 2 done
## grid cell 3 done
## grid cell 4 done
## grid cell 5 done
## grid cell 6 done
## grid cell 7 done
## grid cell 8 done
## grid cell 9 done
## grid cell 10 done
## grid cell 11 done
## grid cell 12 done
## grid cell 13 done
## grid cell 14 done
## grid cell 15 done
## grid cell 16 done
## grid cell 17 done
## grid cell 18 done
## grid cell 19 done
## grid cell 20 done
## grid cell 21 done
## grid cell 22 done
## grid cell 23 done
## grid cell 24 done
## grid cell 25 done
## grid cell 26 done
## grid cell 27 done
## grid cell 28 done
## grid cell 29 done
## grid cell 30 done
## grid cell 31 done
## grid cell 32 done
## grid cell 33 done
## grid cell 34 done
## grid cell 35 done
## grid cell 36 done
## grid cell 37 done
## grid cell 38 done
## grid cell 39 done
## grid cell 40 done
## grid cell 41 done
## grid cell 42 done
And as ever the results can be converted to an sf
spatial object and mapped:
## tmap mode set to interactive viewing