3 Accessing UK Air Quality Data
3.1 Accessing data
The UK has a surprisingly large amount of air quality data that is publicly accessible. The main UK AURN archive and regional (England, Scotland, Wales and Northern Ireland) together with Imperial College London’s London Air Quality Network (LAQN) are important and large databases of information that allow free public access. Storing and managing data in this way has many advantages including consistent data format, and underlying high quality methods to process and store the data.
openair has a family of functions that provide users with extensive access to UK air quality data. Ricardo Energy & Environment have provided .RData files (R workspaces) for several important air quality networks in the UK. These files are updated on a daily basis. This approach requires a link to the Internet to work. The work of Trevor Davies at Ricardo Energy & Environment is greatly appreciated in making all the data available. The networks include:
importAURN
For importing data from the UK national network called Automatic Urban and Rural Network. This is the main UK network.importSAQN
For accessing data from Air Quality Scotland network.importWAQN
For accessing data from the Air Quality Wales network.importAQE
For accessing data from the Air Quality England network of sites.importNI
For accessing data from the Northern Ireland network of sites.importLocal
Import data from locally managed AQ networks in England. These are sites operated in most cases by Local Authorities but may also include monitoring from other programmes, industry and airports. The location and purpose of these sites differs from the national network which is governed by strict rules of the air quality directives. As a result there is a broad range of site types, equipment and data quality practices. For more information see here. These data represent information from about 15 different local air quality networks.importEurope
A simplified version of a function to give basic access to hourly European data based on Stuart Grange’s saqgetr package — see https://github.com/skgrange/saqgetr. The openair function has a similar approach to other openair import functions i.e. requires a site code(s) and year(s) to be supplied.importKCL
For accessing data from the sites operated by Imperial College London1, primarily including the The London Air Quality Network.
Many users download hourly data from the air quality archive at https://www.airquality.co.uk. Most commonly, the data are emailed to the user as .csv files and have a fixed format as shown below. This is a useful facility but does have some limitations and frustrations, many of which have been overcome using a new way of storing and downloading the data described below.
There are several advantages over the web portal approach where .csv files are downloaded. First, it is quick to select a range of sites, pollutants and periods (see examples below). Second, storing the data as .RData objects is very efficient as they are about four times smaller than .csv files (which are already small) — which means the data downloads quickly and saves bandwidth. Third, the function completely avoids any need for data manipulation or setting time formats, time zones etc. Finally, it is easy to import many years of data. The final point makes it possible to download several long time series in one go.
The site codes and pollutant names can be upper or lower case.
Some examples of usage are shown below. First load the packages we need.
3.2 Site Meta Data
3.2.1 National networks
The first question is, what sites are available and what do they measure? Users can access the details of air pollution monitoring sites using the importMeta
function. The user only needs to provide the network name and (optionally) whether all data should be returned and whether certain periods should be considered. By default only site type, latitude and longitude are returned.
aurn_meta <- importMeta(source = "aurn")
aurn_meta
# A tibble: 281 × 6
site code latitude longitude site_type source
<chr> <chr> <dbl> <dbl> <chr> <chr>
1 Aberdeen ABD 57.2 -2.09 Urban Backgro… aurn
2 Aberdeen Erroll Park ABD9 57.2 -2.09 Urban Backgro… aurn
3 Aberdeen Union Street Roadside ABD7 57.1 -2.11 Urban Traffic aurn
4 Aberdeen Wellington Road ABD8 57.1 -2.09 Urban Traffic aurn
5 Armagh Roadside ARM6 54.4 -6.65 Urban Traffic aurn
6 Aston Hill AH 52.5 -3.03 Rural Backgro… aurn
7 Auchencorth Moss ACTH 55.8 -3.24 Rural Backgro… aurn
8 Ballymena Antrim Road BAAR 54.9 -6.27 Urban Traffic aurn
9 Ballymena Ballykeel BALM 54.9 -6.25 Urban Backgro… aurn
10 Barnsley BARN 53.6 -1.48 Urban Backgro… aurn
# ℹ 271 more rows
Or return much more detailed data that includes which pollutants are measured at each site and site start / end dates. The option all = TRUE
should be added.
aurn_meta <- importMeta(source = "aurn", all = TRUE)
# what comes back?
glimpse(aurn_meta)
Rows: 2,770
Columns: 14
$ code <chr> "ABD", "ABD", "ABD", "ABD", "ABD", "ABD", "ABD", "ABD"…
$ site <chr> "Aberdeen", "Aberdeen", "Aberdeen", "Aberdeen", "Aberd…
$ site_type <chr> "Urban Background", "Urban Background", "Urban Backgro…
$ latitude <dbl> 57.15736, 57.15736, 57.15736, 57.15736, 57.15736, 57.1…
$ longitude <dbl> -2.094278, -2.094278, -2.094278, -2.094278, -2.094278,…
$ variable <chr> "O3", "NO", "NO2", "NOx", "SO2", "CO", "PM10", "NV10",…
$ Parameter_name <chr> "Ozone", "Nitric oxide", "Nitrogen dioxide", "Nitrogen…
$ start_date <dttm> 2003-08-01, 1999-09-18, 1999-09-18, 1999-09-18, 2001-…
$ end_date <chr> "2021-09-20", "2021-09-20", "2021-09-20", "2021-09-20"…
$ ratified_to <dttm> 2021-09-20, 2021-09-20, 2021-09-20, 2021-09-20, 2007-…
$ zone <chr> "North East Scotland", "North East Scotland", "North E…
$ agglomeration <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ local_authority <chr> "Aberdeen City", "Aberdeen City", "Aberdeen City", "Ab…
$ source <chr> "aurn", "aurn", "aurn", "aurn", "aurn", "aurn", "aurn"…
Note that importMeta
can import information for several networks at once e.g. source = c("aurn", "saqn")
.
Often it is useful to consider sites that were open in a particular year or were open for a duration of years. This can be done using the year
argument. When year
is a range such as year = 2010:2020
, only sites that were open across that range of years will be returned. This option is especially useful for trend analysis when there might be an interest in extracting only sites that were measuring over the period of interest. Furthermore, if all = TRUE
is used, supplying a year (or years) will select only specific pollutants that were measured during the period of interest.2
For example, to check the number of sites that were open from 2010 to 2022 in the AURN and SAQN combined:
sites_2010_2022 <- importMeta(
source = c("aurn", "saqn"),
year = 2010:2022
)
nrow(sites_2010_2022)
[1] 143
The example below uses sites on the AURN that measure NO2, but can easily be extended to the other data sources.
To see how many sites measure NO2 in the AURN that are ‘urban traffic’:
aurn_detailed <- importMeta(source = "aurn", all = TRUE)
no2_sites <- filter(
aurn_detailed,
variable == "NO2",
site_type == "Urban Traffic"
)
nrow(no2_sites)
[1] 90
importMeta
as a way to select sites to import
One of the most useful aspects of importMeta
is to use it as a basis to identify site codes to then import data. For example, to import data from the AURN for sites that have been in operation from 2005 to 2020:
sites_2005_2020 <- importMeta(
source = "aurn",
year = 2005:2020
)
all_aq_data <- importAURN(
site = sites_2005_2020$code,
year = 2005:2020
)
To import data, you can use the different versions of importAURN
. Some examples are below.
## import all pollutants from Marylebone Rd from 2000:2005
mary <- importAURN(site = "my1", year = 2000:2005)
## import nox, no2, o3 from Marylebone Road and Nottingham Centre for 2000
thedata <- importAURN(site = c("my1", "nott"), year = 2000,
pollutant = c("nox", "no2", "o3"))
## import over 30 years of Mace Head O3 data!
o3 <- importAURN(site = "mh", year = 1987:2019)
## import hydrocarbon data from Marylebone Road
hc <- importAURN(site = "my1", year = 2008, hc = TRUE)
## Import data from the AQE network (York data in this case)
yk13 <- importAQE(site = "yk13", year = 2018)
And to include basic meta data when importing air pollution data:
kc1 <- importAURN(site = "kc1", year = 2018, meta = TRUE)
glimpse(kc1)
Rows: 8,760
Columns: 17
$ site <chr> "London N. Kensington", "London N. Kensington", "London N. K…
$ code <chr> "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1", "KC1…
$ date <dttm> 2018-01-01 00:00:00, 2018-01-01 01:00:00, 2018-01-01 02:00:…
$ co <dbl> 0.114872, 0.111043, 0.112000, 0.100512, 0.091897, 0.100512, …
$ nox <dbl> 8.32519, 8.89934, 9.41967, 9.36584, 7.21277, 7.64339, 10.173…
$ no2 <dbl> 8.11153, 8.54325, 8.99235, 8.93852, 6.94570, 7.26948, 10.013…
$ no <dbl> 0.13935, 0.23224, 0.27869, 0.27869, 0.17418, 0.24386, 0.1045…
$ o3 <dbl> 70.98040, 67.52118, 69.69982, 70.49810, 71.74542, 70.49810, …
$ so2 <dbl> NA, 2.40953, 2.49812, 2.12606, 2.39181, 2.28551, 2.23236, 2.…
$ pm10 <dbl> 12.425, 7.375, 5.625, 3.200, 3.875, 5.050, 9.400, 12.400, 15…
$ pm2.5 <dbl> 8.892, 4.363, 3.137, 1.792, 2.146, 2.618, 4.575, 6.109, 7.05…
$ ws <dbl> 5.5, 5.0, 4.8, 4.8, 5.3, 5.3, 4.4, 3.0, 2.6, 1.6, 1.6, 1.1, …
$ wd <dbl> 263.3, 256.4, 251.0, 246.8, 248.4, 248.0, 245.8, 239.5, 232.…
$ air_temp <dbl> 5.5, 5.1, 4.9, 4.7, 4.9, 5.0, 5.0, 4.6, 4.2, 3.7, 5.4, 5.7, …
$ latitude <dbl> 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, 51.52105, …
$ longitude <dbl> -0.213492, -0.213492, -0.213492, -0.213492, -0.213492, -0.21…
$ site_type <chr> "Urban Background", "Urban Background", "Urban Background", …
By default, the function returns data where each pollutant is in a separate column. However, it is possible to return the data in a tidy format (column for pollutant name, column for value) by using the option to_narrow
:
my1 <- importAURN("my1", year = 2018, to_narrow = TRUE)
It is also possible to return information on whether the data have been ratified or not using the option ratified
(FALSE
by default). So, add the option ratified = TRUE
if you want this information.
3.2.2 Local networks
In the case of locally available data, it is useful to know who the data providers are, which are shown below.
# access local meta data to get provider
meta_local <- importMeta("local", all = TRUE)
unique(meta_local$provider)
[1] "Sussex Air Quality Network"
[2] "Air Quality in North Lincolnshire "
[3] "Kent and Medway Air Quality"
[4] "Londonair"
[5] "Essex Air Quality Network"
[6] "Hertfordshire and Bedfordshire Air Quality Network "
[7] "Wiltshire Air Quality"
[8] "UK Air Quality"
[9] "Hampshire Air Quality Network"
[10] "Nottingham Air Quality"
[11] "Heathrow Airwatch"
[12] "Norfolk Air Quality"
[13] "Leicester Council AQ Network"
[14] "Wolverhampton Air Quality"
[15] "Liverpool Air Quality"
3.3 Plot Sites on a Map
To easily visualise entire monitoring networks, consider using the openairmaps R package. This package can be installed from CRAN, similar to openair.
install.packages("openairmaps")
This package contains the networkMap
function which acts as a wrapper around importMeta
and returns a detailed map similar to the one produced below, with many options for customisation. For example, sites can be clustered together to avoid clutter, and an optional “control menu” can be added to filter for certain sites (e.g., different site types, shown below).
library(openairmaps)
networkMap(source = "aurn", control = "site_type")
For more information about using openairmaps to build maps of monitoring networks, please refer to the Network Visualisation Page.
3.4 Annual and other statistics
By default, all the functions above return hourly data. However, often there is a need to return data such as annual means of a long period of time. The UK family of functions (but not importKCL
) can return data for averaging times: annual, monthly, daily and for SO2 15-minute. The annual and monthly data also provide valuable information on data capture rates. The averaging statistic is selected with the data_type
option. The values data_type
can take include:
- “hourly” This is the default and specific site(s) must be provided.
-
“daily” Daily means returned and specific site(s) must be provided. Note that in the case of PM10 and PM2.5 daily measurements can be available from those derived from hourly measurements (using instruments such as TEOM, BAM and FIDAS) and our daily gravimetric measurements such as from a Partisol. In the returned data the gravimetric daily measurements are shown as
gr_pm10
andgr_pm2.5
, respectively. - “monthly” Monthly means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
- “annual” Annual means returned. No site code is needed because all data for a particular year are returned. Data capture statistics are also given.
- “15_min” 15-minute SO2 concentrations returned for a specific site(s).
- “8_hour” Rolling 8-hour concentrations returned for a specific site(s) for O3 and CO.
- “24_hour” Rolling 24-hour concentrations returned for a specific site(s) for PM10 and PM2.5.
- “daily_max_8” Maximum daily rolling 8-hour maximum for O3 and CO.
- “daqi” Daily Air Quality Index (DAQI). See here for more details of how the index is defined.
Note that for annual and monthly statistics all network data is returned and the site
option has no effect.
As an example, to import 5 years of annual mean data from the AURN:
uk_annual <- importAURN(year = 2016:2020, data_type = "annual")
By default, this will return data in “wide” format with a pollutant and its data capture rate in separate columns. Often it is more useful to have “narrow” format data, which is possible to select with the to_narrow
option. Furthermore, it is also possible to return site meta data (site type, latitude and longitude) at the same time.
Below is an example of obtaining annual mean data for 2020.
uk_2020 <- importAURN(
year = 2020,
data_type = "annual",
meta = TRUE,
to_narrow = TRUE
)
uk_2020
# A tibble: 3,268 × 9
code site date species value data_capture latitude
<chr> <chr> <dttm> <chr> <dbl> <dbl> <dbl>
1 ABD Aberdeen 2020-01-01 00:00:00 o3 45.5 0.610 57.2
2 ABD Aberdeen 2020-01-01 00:00:00 o3.daily.max… 57.5 NA 57.2
3 ABD Aberdeen 2020-01-01 00:00:00 o3.aot40v NA NA 57.2
4 ABD Aberdeen 2020-01-01 00:00:00 o3.aot40f NA NA 57.2
5 ABD Aberdeen 2020-01-01 00:00:00 somo35 502. 0.607 57.2
6 ABD Aberdeen 2020-01-01 00:00:00 no 4.60 0.945 57.2
7 ABD Aberdeen 2020-01-01 00:00:00 no2 13.5 0.945 57.2
8 ABD Aberdeen 2020-01-01 00:00:00 nox 20.5 0.945 57.2
9 ABD Aberdeen 2020-01-01 00:00:00 so2 NA NA 57.2
10 ABD Aberdeen 2020-01-01 00:00:00 co NA NA 57.2
# ℹ 3,258 more rows
# ℹ 2 more variables: longitude <dbl>, site_type <chr>
The pollutants returned include:
unique(uk_2020$species)
[1] "o3" "o3.daily.max.8hour" "o3.aot40v"
[4] "o3.aot40f" "somo35" "no"
[7] "no2" "nox" "so2"
[10] "co" "pm10" "nv10"
[13] "v10" "pm2.5" "nv2.5"
[16] "v2.5" "gr10" "gr2.5"
[19] "o3.summer"
Now it is easy for example, to select annual mean data from 2020 for NO2 with a data capture rate of at least 80%:
# A tibble: 144 × 9
code site date species value data_capture latitude longitude
<chr> <chr> <dttm> <chr> <dbl> <dbl> <dbl> <dbl>
1 ABD Aber… 2020-01-01 00:00:00 no2 13.5 0.945 57.2 -2.09
2 ABD7 Aber… 2020-01-01 00:00:00 no2 23.6 0.982 57.1 -2.11
3 ABD8 Aber… 2020-01-01 00:00:00 no2 25.1 0.995 57.1 -2.09
4 AH Asto… 2020-01-01 00:00:00 no2 2.81 0.983 52.5 -3.03
5 ARM6 Arma… 2020-01-01 00:00:00 no2 21.1 0.960 54.4 -6.65
6 BAAR Ball… 2020-01-01 00:00:00 no2 15.6 0.893 54.9 -6.27
7 BALM Ball… 2020-01-01 00:00:00 no2 10.3 0.993 54.9 -6.25
8 BAR3 Barn… 2020-01-01 00:00:00 no2 11.9 0.970 53.6 -1.51
9 BBRD Birk… 2020-01-01 00:00:00 no2 16.9 0.984 53.4 -3.02
10 BDMA Brad… 2020-01-01 00:00:00 no2 34.4 0.805 53.8 -1.76
# ℹ 134 more rows
# ℹ 1 more variable: site_type <chr>
For the AURN, it is also possible to return the DAQI (Daily Air Quality Index) by pollutant to save deriving it.
daqi_2020 <- importAURN(
year = 2020,
data_type = "daqi", meta = TRUE
)
daqi_2020
# A tibble: 148,513 × 10
code site pollutant date concentration poll_index
<chr> <chr> <chr> <dttm> <dbl> <int>
1 ABD Aberdeen no2 2020-01-01 00:00:00 39.2 1
2 ABD Aberdeen pm10 2020-01-01 00:00:00 10 1
3 ABD Aberdeen pm2.5 2020-01-01 00:00:00 9 1
4 ABD7 Aberdeen Union … no2 2020-01-01 00:00:00 42.4 1
5 ABD8 Aberdeen Wellin… no2 2020-01-01 00:00:00 30.6 1
6 ACTH Auchencorth Moss o3 2020-01-01 00:00:00 57 2
7 ACTH Auchencorth Moss pm10 2020-01-01 00:00:00 14 1
8 ACTH Auchencorth Moss pm2.5 2020-01-01 00:00:00 12 2
9 AGRN Birmingham Acoc… no2 2020-01-01 00:00:00 23.2 1
10 AGRN Birmingham Acoc… o3 2020-01-01 00:00:00 35 2
# ℹ 148,503 more rows
# ℹ 4 more variables: measurement_period <chr>, latitude <dbl>,
# longitude <dbl>, site_type <chr>