3  Accessing UK Air Quality Data

Author

David Carslaw

Abstract
This section outlines the different ways users can access the abundant air quality data in the UK. The main functions provide easy access to hourly data and other statistical summaries such as annual means and data capture rates. Easy access is also provided to site meta data such as longitude, latitude, site type and details of the pollutants measured.

3.1 Accessing data

The UK has a surprisingly large amount of air quality data that is publicly accessible. The main UK AURN archive and regional (England, Scotland, Wales and Northern Ireland) together with Imperial College London’s London Air Quality Network (LAQN) are important and large databases of information that allow free public access. Storing and managing data in this way has many advantages including consistent data format, and underlying high quality methods to process and store the data.

openair has a family of functions that provide users with extensive access to UK air quality data. Ricardo Energy & Environment have provided .RData files (R workspaces) for several important air quality networks in the UK. These files are updated on a daily basis. This approach requires a link to the Internet to work. The networks include:

  • importAURN For importing data from the UK national network called Automatic Urban and Rural Network}. This is the main UK network.

  • importSAQN For accessing data from Air Quality Scotland network.

  • importWAQN For accessing data from the Air Quality Wales network.

  • importAQE For accessing data from the Air Quality England network of sites.

  • importNI For accessing data from the Northern Ireland network of sites.

  • importEurope A simplified version of a function to give basic access to hourly European data based on Stuart Grange’s saqgetr package — see https://github.com/skgrange/saqgetr. The openair function has a similar approach to other openair import functions i.e. requires a site code(s) and year(s) to be supplied.

  • importKCL For accessing data from the sites operated by Imperial College London1, primarily including the The London Air Quality Network.

Many users download hourly data from the air quality archive at https://www.airquality.co.uk. Most commonly, the data are emailed to the user as .csv files and have a fixed format as shown below. This is a useful facility but does have some limitations and frustrations, many of which have been overcome using a new way of storing and downloading the data described below.

There are several advantages over the web portal approach where .csv files are downloaded. First, it is quick to select a range of sites, pollutants and periods (see examples below). Second, storing the data as .RData objects is very efficient as they are about four times smaller than .csv files (which are already small) — which means the data downloads quickly and saves bandwidth. Third, the function completely avoids any need for data manipulation or setting time formats, time zones etc. Finally, it is easy to import many years of data. The final point makes it possible to download several long time series in one go.

The site codes and pollutant names can be upper or lower case.

Some examples of usage are shown below. First load the packages we need.

3.2 Site Meta Data

The first question is, what sites are available and what do they measure? Users can access the details of air pollution monitoring sites using the importMeta function. The user only needs to provide the network name and (optionally) whether all data should be returned. By default only site type, latitude and longitude are returned.

aurn_meta <- importMeta(source = "aurn")
aurn_meta
# A tibble: 278 × 5
   site                           code  latitude longitude site_type       
   <chr>                          <chr>    <dbl>     <dbl> <chr>           
 1 Aberdeen                       ABD       57.2     -2.09 Urban Background
 2 Aberdeen Erroll Park           ABD9      57.2     -2.09 Urban Background
 3 Aberdeen Union Street Roadside ABD7      57.1     -2.11 Urban Traffic   
 4 Aberdeen Wellington Road       ABD8      57.1     -2.09 Urban Traffic   
 5 Armagh Roadside                ARM6      54.4     -6.65 Urban Traffic   
 6 Aston Hill                     AH        52.5     -3.03 Rural Background
 7 Auchencorth Moss               ACTH      55.8     -3.24 Rural Background
 8 Ballymena Antrim Road          BAAR      54.9     -6.27 Urban Traffic   
 9 Ballymena Ballykeel            BALM      54.9     -6.25 Urban Background
10 Barnsley                       BARN      53.6     -1.48 Urban Background
# … with 268 more rows

Or return much more detailed data that includes which pollutants are measured at each site and site start / end dates. The option all = TRUE should be added.

aurn_meta <- importMeta(source = "aurn", all = TRUE) 
aurn_meta

The example below uses sites on the AURN that measure NO2, but can easily be extended to the other data sources.

To see how many sites measure NO2 in the AURN that are ‘urban traffic’:

aurn_detailed <- importMeta(source = "aurn", all = TRUE)

no2_sites <- filter(
  aurn_detailed,
  variable == "NO2",
  site_type == "Urban Traffic"
)

nrow(no2_sites)
[1] 90

To import data, you can use the different versions of importAURN. Some examples are below.

## import all pollutants from Marylebone Rd from 2000:2005
mary <- importAURN(site = "my1", year = 2000:2005)

## import nox, no2, o3 from Marylebone Road and Nottingham Centre for 2000
thedata <- importAURN(site = c("my1", "nott"), year = 2000,
                      pollutant = c("nox", "no2", "o3"))

## import over 30 years of Mace Head O3 data!
o3 <- importAURN(site = "mh", year = 1987:2019)
## import hydrocarbon data from Marylebone Road
hc <- importAURN(site = "my1", year = 2008, hc = TRUE)

## Import data from the AQE network (York data in this case)
yk13 <- importAQE(site = "yk13", year = 2018)

And to include basic meta data when importing air pollution data:

kc1 <- importAURN(site = "kc1", year = 2018, meta = TRUE)

glimpse(kc1)
Rows: 8,760
Columns: 17
$ site      <chr> "London N. Kensington", "London N. Kensington", "London N. K…
$ code      <chr> "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1…
$ date      <dttm> 2018-01-01 00:00:00, 2018-01-01 01:00:00, 2018-01-01 02:00:…
$ co        <dbl> 0.114872, 0.111043, 0.112000, 0.100512, 0.091897, 0.100512, …
$ nox       <dbl> 8.32519, 8.89934, 9.41967, 9.36584, 7.21277, 7.64339, 10.173…
$ no2       <dbl> 8.11153, 8.54325, 8.99235, 8.93852, 6.94570, 7.26948, 10.013…
$ no        <dbl> 0.13935, 0.23224, 0.27869, 0.27869, 0.17418, 0.24386, 0.1045…
$ o3        <dbl> 70.98040, 67.52118, 69.69982, 70.49810, 71.74542, 70.49810, …
$ so2       <dbl> NA, 2.40953, 2.49812, 2.12606, 2.39181, 2.28551, 2.23236, 2.…
$ pm10      <dbl> 12.425, 7.375, 5.625, 3.200, 3.875, 5.050, 9.400, 12.400, 15…
$ pm2.5     <dbl> 8.892, 4.363, 3.137, 1.792, 2.146, 2.618, 4.575, 6.109, 7.05…
$ ws        <dbl> 5.5, 5.0, 4.8, 4.8, 5.3, 5.3, 4.4, 3.0, 2.6, 1.6, 1.6, 1.1, …
$ wd        <dbl> 263.3, 256.4, 251.0, 246.8, 248.4, 248.0, 245.8, 239.5, 232.…
$ air_temp  <dbl> 5.5, 5.1, 4.9, 4.7, 4.9, 5.0, 5.0, 4.6, 4.2, 3.7, 5.4, 5.7, …
$ latitude  <dbl> 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, …
$ longitude <dbl> -0.213492, -0.213492, -0.213492, -0.213492, -0.213492, -0.21…
$ site_type <chr> "Urban Background", "Urban Background", "Urban Background", …

The latter is useful if you then want to plot the sites on a map, as shown below.

By default, the function returns data where each pollutant is in a separate column. However, it is possible to return the data in a tidy format (column for pollutant name, column for value) by using the option to_narrow:

my1 <- importAURN("my1", year = 2018, to_narrow = TRUE)

It is also possible to return information on whether the data have been ratified or not using the option ratified (FALSE by default). So, add the option ratified = TRUE if you want this information.

3.3 Plot Sites on a Map

To easily visualise entire monitoring networks, consider using the openairmaps R package. This package can be installed from CRAN, similar to openair.

install.packages("openairmaps")

This package contains the networkMap function which acts as a wrapper around importMeta and returns a detailed map similar to the one produced above, with many options for customisation. For example, sites can be clustered together to avoid clutter, and an optional “control menu” can be added to filter for certain sites (e.g., different site types, shown below).

library(openairmaps)
networkMap(source = "aurn", cluster = TRUE, control = "site_type")

Figure 3.1: Plotting the AURN using the openairmaps package.

For more information about openairmaps please refer to Appendix D.

3.4 Annual and other statistics

By default, all the functions above return hourly data. However, often there is a need to return data such as annual means of a long period of time. The UK family of functions (but not importKCL) can return data for averaging times: annual, monthly, daily and for SO2 15-minute. The annual and monthly data also provide valuable information on data capture rates. The averaging statistic is selected with the data_type option. The values data_type can take include:

  • “hourly” This is the default and specific site(s) must be provided.
  • “daily” Daily means returned and specific site(s) must be provided.
  • “monthly” Monthly means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
  • “annual” Annual means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
  • “15_min” 15-minute SO2 concentrations returned for a specific site(s).
  • “8_hour” Rolling 8-hour concentrations returned for a specific site(s) for O3 and CO.
  • “24_hour” Rolling 24-hour concentrations returned for a specific site(s) for PM10 and PM2.5.
  • “daily_max_8” Maximum daily rolling 8-hour maximum for O3 and CO.
  • “daqi” Daily Air Quality Index (DAQI). See here for more details of how the index is defined.

Note that for annual and monthly statistics all network data is returned and the site option has no effect.

As an example, to import 5 years of annual mean data from the AURN:

uk_annual <- importAURN(year = 2016:2020, data_type = "annual")

By default, this will return data in “wide” format with a pollutant and its data capture rate in separate columns. Often it is more useful to have “narrow” format data, which is possible to select with the to_narrow option. Furthermore, it is also possible to return site meta data (site type, latitude and longitude) at the same time.

Below is an example of obtaining annual mean data for 2020.

uk_2020 <- importAURN(year = 2020, 
                      data_type = "annual",
                      meta = TRUE,
                      to_narrow = TRUE)

uk_2020
# A tibble: 3,268 × 9
   code  site     date                species        value data_capture latitude
   <chr> <chr>    <dttm>              <chr>          <dbl>        <dbl>    <dbl>
 1 ABD   Aberdeen 2020-01-01 00:00:00 o3             45.5         0.610     57.2
 2 ABD   Aberdeen 2020-01-01 00:00:00 o3.daily.max…  57.5        NA         57.2
 3 ABD   Aberdeen 2020-01-01 00:00:00 o3.aot40v      NA          NA         57.2
 4 ABD   Aberdeen 2020-01-01 00:00:00 o3.aot40f      NA          NA         57.2
 5 ABD   Aberdeen 2020-01-01 00:00:00 somo35        502.          0.607     57.2
 6 ABD   Aberdeen 2020-01-01 00:00:00 no              4.60        0.945     57.2
 7 ABD   Aberdeen 2020-01-01 00:00:00 no2            13.5         0.945     57.2
 8 ABD   Aberdeen 2020-01-01 00:00:00 nox            20.5         0.945     57.2
 9 ABD   Aberdeen 2020-01-01 00:00:00 so2            NA          NA         57.2
10 ABD   Aberdeen 2020-01-01 00:00:00 co             NA          NA         57.2
# … with 3,258 more rows, and 2 more variables: longitude <dbl>,
#   site_type <chr>

The pollutants returned include:

unique(uk_2020$species)
 [1] "o3"                 "o3.daily.max.8hour" "o3.aot40v"         
 [4] "o3.aot40f"          "somo35"             "no"                
 [7] "no2"                "nox"                "so2"               
[10] "co"                 "pm10"               "nv10"              
[13] "v10"                "pm2.5"              "nv2.5"             
[16] "v2.5"               "gr10"               "gr2.5"             
[19] "o3.summer"         

Now it is easy for example, to select annual mean data from 2020 for NO2 with a data capture rate of at least 80%:

uk_2020 %>% 
  filter(species == "no2", data_capture >= 0.8)
# A tibble: 144 × 9
   code  site  date                species value data_capture latitude longitude
   <chr> <chr> <dttm>              <chr>   <dbl>        <dbl>    <dbl>     <dbl>
 1 ABD   Aber… 2020-01-01 00:00:00 no2     13.5         0.945     57.2     -2.09
 2 ABD7  Aber… 2020-01-01 00:00:00 no2     23.6         0.982     57.1     -2.11
 3 ABD8  Aber… 2020-01-01 00:00:00 no2     25.1         0.995     57.1     -2.09
 4 AH    Asto… 2020-01-01 00:00:00 no2      2.81        0.983     52.5     -3.03
 5 ARM6  Arma… 2020-01-01 00:00:00 no2     21.1         0.960     54.4     -6.65
 6 BAAR  Ball… 2020-01-01 00:00:00 no2     15.6         0.893     54.9     -6.27
 7 BALM  Ball… 2020-01-01 00:00:00 no2     10.3         0.993     54.9     -6.25
 8 BAR3  Barn… 2020-01-01 00:00:00 no2     11.9         0.970     53.6     -1.51
 9 BBRD  Birk… 2020-01-01 00:00:00 no2     16.9         0.984     53.4     -3.03
10 BDMA  Brad… 2020-01-01 00:00:00 no2     34.4         0.805     53.8     -1.76
# … with 134 more rows, and 1 more variable: site_type <chr>

For the AURN, it is also possible to return the DAQI (Daily Air Quality Index) by pollutant to save deriving it.

daqi_2020 <- importAURN(year = 2020, data_type = "daqi", meta = TRUE)
daqi_2020
# A tibble: 148,513 × 10
   code  site             pollutant date                concentration poll_index
   <chr> <chr>            <chr>     <dttm>                      <dbl>      <int>
 1 ABD   Aberdeen         no2       2020-01-01 00:00:00          39.2          1
 2 ABD   Aberdeen         pm10      2020-01-01 00:00:00          10            1
 3 ABD   Aberdeen         pm2.5     2020-01-01 00:00:00           9            1
 4 ABD7  Aberdeen Union … no2       2020-01-01 00:00:00          42.4          1
 5 ABD8  Aberdeen Wellin… no2       2020-01-01 00:00:00          30.6          1
 6 ACTH  Auchencorth Moss o3        2020-01-01 00:00:00          57            2
 7 ACTH  Auchencorth Moss pm10      2020-01-01 00:00:00          14            1
 8 ACTH  Auchencorth Moss pm2.5     2020-01-01 00:00:00          12            2
 9 AGRN  Birmingham Acoc… no2       2020-01-01 00:00:00          23.2          1
10 AGRN  Birmingham Acoc… o3        2020-01-01 00:00:00          35            2
# … with 148,503 more rows, and 4 more variables: measurement_period <chr>,
#   latitude <dbl>, longitude <dbl>, site_type <chr>

  1. The data were first accessible when the Environmental Research Group was based at King’s College London.↩︎