Import data from the UK Air Pollution NetworksSource:
Functions for importing air pollution data from a range of UK networks including the Automatic Urban and Rural Network. Files are imported from a remote server operated by Ricardo that provides air quality data files as R data objects.
importAURN( site = "my1", year = 2009, data_type = "hourly", pollutant = "all", hc = FALSE, meta = FALSE, ratified = FALSE, to_narrow = FALSE, verbose = FALSE, progress = TRUE )
Site code of the site to import e.g. “my1” is Marylebone Road. Several sites can be imported with
site = c("my1", "nott")--- to import Marylebone Road and Nottingham for example.
Year or years to import. To import a sequence of years from 1990 to 2000 use
year = 1990:2000. To import several specific years use
year = c(1990, 1995, 2000)for example.
The data type averaging period. These include:
"hourly" Default is to return hourly data.
"daily" Daily average data.
"monthly" Monthly average data with data capture information for the whole network.
"annual" Annual average data with data capture information for the whole network.
"15_min" To import 15-minute average SO2 concentrations.
"8_hour" To import 8-hour rolling mean concentrations for O3 and CO.
"24_hour" To import 24-hour rolling mean concentrations for particulates.
"daily_max_8" To import maximum daily rolling 8-hour maximum for O3 and CO.
"daqi" To import Daily Air Quality Index (DAQI). See here for more details of how the index is defined.
Pollutants to import. If omitted will import all pollutants from a site. To import only NOx and NO2 for example use
pollutant = c("nox", "no2").
A few sites have hydrocarbon measurements available and setting
hc = TRUEwill ensure hydrocarbon data are imported. The default is however not to as most users will not be interested in using hydrocarbon data and the resulting data frames are considerably larger.
Should meta data be returned? If
TRUEthe site type, latitude and longitude are returned.
TRUEcolumns are returned indicating when each species was ratified i.e. quality-checked. Available for hourly data only.
By default the returned data has a column for each pollutant/variable. When
to_narrow = TRUEthe data are stacked into a narrow format with a column identifying the pollutant name.
Should the function give messages when downloading files? Default is
Show a progress bar when many sites/years are being imported? Defaults to
This family of functions has been written to make it easy to import data from across several UK air quality networks. Ricardo have provided .RData files (R workspaces) of all individual sites and years, as well as up to date meta data. These files are updated on a daily basis. This approach requires a link to the Internet to work.
For an up to date list of available sites that can be imported, see
The site codes and pollutant names can be upper or lower case.
There are several advantages over the web portal approach where .csv files
are downloaded. First, it is quick to select a range of sites, pollutants and
periods (see examples below). Second, storing the data as .RData objects is
very efficient as they are about four times smaller than .csv files --- which
means the data downloads quickly and saves bandwidth. Third, the function
completely avoids any need for data manipulation or setting time formats,
time zones etc. The function also has the advantage that the proper site name
is imported and used in
The data are imported by stacking sites on top of one another and will have
code (the site code) and
By default, the function returns hourly average data. However, annual,
monthly, daily and 15 minute data (for SO2) can be returned using the option
data_type. Annual and monthly data provide whole network information
including data capture statistics.
All units are expressed in mass terms for gaseous species (ug/m3 for NO, NO2, NOx (as NO2), SO2 and hydrocarbons; and mg/m3 for CO). PM10 concentrations are provided in gravimetric units of ug/m3 or scaled to be comparable with these units. Over the years a variety of instruments have been used to measure particulate matter and the technical issues of measuring PM10 are complex. In recent years the measurements rely on FDMS (Filter Dynamics Measurement System), which is able to measure the volatile component of PM. In cases where the FDMS system is in use there will be a separate volatile component recorded as 'v10' and non-volatile component 'nv10', which is already included in the absolute PM10 measurement. Prior to the use of FDMS the measurements used TEOM (Tapered Element Oscillating. Microbalance) and these concentrations have been multiplied by 1.3 to provide an estimate of the total mass including the volatile fraction.
Some sites report hourly and daily PM10 and / or PM2.5. When
= "daily" and there are both hourly and 'proper' daily measurements
available, these will be returned as e.g. "pm2.5" and "gr_pm2.5"; the former
corresponding to data based on original hourly measurements and the latter
corresponding to daily gravimetric measurements.
The function returns modelled hourly values of wind speed (
wd) and ambient temperature (
air_temp) if available
(generally from around 2010). These values are modelled using the WRF model
operated by Ricardo.
The BAM (Beta-Attenuation Monitor) instruments that have been incorporated into the network throughout its history have been scaled by 1.3 if they have a heated inlet (to account for loss of volatile particles) and 0.83 if they do not have a heated inlet. The few TEOM instruments in the network after 2008 have been scaled using VCM (Volatile Correction Model) values to account for the loss of volatile particles. The object of all these scaling processes is to provide a reasonable degree of comparison between data sets and with the reference method and to produce a consistent data record over the operational period of the network, however there may be some discontinuity in the time series associated with instrument changes.
No corrections have been made to the PM2.5 data. The volatile component of FDMS PM2.5 (where available) is shown in the 'v2.5' column.
## import all pollutants from Marylebone Rd from 1990:2009 if (FALSE) mary <- importAURN(site = "my1", year = 2000:2009) ## import nox, no2, o3 from Marylebone Road and Nottingham Centre for 2000 if (FALSE) thedata <- importAURN(site = c("my1", "nott"), year = 2000, pollutant = c("nox", "no2", "o3")) # Other functions work in the same way e.g. to import Cardiff Centre data # Import annual data over a period, make it narrow format and return site information if (FALSE) aq <- importAURN(year = 2010:2020, data_type = "annual", meta = TRUE, to_narrow = TRUE) if (FALSE) cardiff <- importWAQN(site = "card", year = 2020)