For this workshop session we will consider a release of pollution from a point source.

The scenario is that a pollution release from a research plant on the coast has resulted in a pollution plume over the county. Scientists have a model for the pollution density, and want to know which towns will be exposed to a level of pollution over 9000ng (nanograms) per square metre.

We need a number of packages for this.


We'll compute a colour palette for plots, and set it as default for sp graphics:

colours = colorRampPalette(brewer.pal(7, "OrRd")[-1])(50)
sp.theme(set = TRUE, regions = list(col = colours))

All our data files are in a folder called Data at the same level as the current folder:

Datadir = "../Data"
Datafile = function(f) {
    file.path(Datadir, f)

You should make sure Datadir is set to the place you've put the data files. If you have the tcltk package, then you can use as.character(tcltk::tkchooseDirectory()). Don't put a trailing slash or backslash at the end of Datadir or files might not be found.

Plume Model

We will use a simple model for the spread of pollution. The total pollution at a location at distance \(d\) and angle \(\theta\) from the point source is given by:

$$ f = \alpha \exp(-(d/\beta)^2 e^{-\kappa cos(\theta - \phi)^2}) $$

Where \(\alpha\) is the pollution level at distance zero, \(\beta\) is a distance scale factor, \(\phi\) is the major angle of the pollution plume, and \(\kappa\) is the eccentricity. If \(\kappa\) is 0 then the plume becomes a circularly symmetric exponential decay, and as \(\kappa\) increases the plume becomes more stretched in the direction of the \(\phi\) parameter. You can consider \(\kappa\) and \(\phi\) as the wind strength and direction respectively.

We can create a plume function that computes the pollution for a given source src at a number of destinations dst given a set of parameters:

pfunc = function(d2, ang, a, b, k, phi) {
    ## plume value for distance-squared d2
    a * exp(-(d2 * exp(-k * cos(ang - phi))^2/b^2))

plume <- function(src, dst, a, b, k, phi) {
    ## plume value for src at dst
    src = coordinates(src)
    dst = coordinates(dst)
    d2 = (dst[, 1] - src[, 1])^2 + (dst[, 2] - src[, 2])^2
    ang = atan2(dst[, 2] - src[, 2], dst[, 1] - src[, 1])
    pfunc(d2, ang, a, b, k, phi)

Load Data

We will use three data sets - the location of the pollution source, the location and population of towns from a UK gazetteer source, and a polygon boundary of Cumbria obtained from the gadm web site.

Two of the datasets are in lat-long format, so we use spTransform to convert them to Ordnance Survey grid references in metres with a little wrapper function. We then make a quick map with the source as a red dot and the towns as crosses.

toOSGB = function(s) {
    spTransform(s, CRS("+init=epsg:27700"))
cumbria = toOSGB(readOGR(Datadir, "cumbria", verbose = FALSE))
nuclear = toOSGB(readOGR(Datadir, "nuclear", verbose = FALSE))
ukgaz = readOGR(Datadir, "ukgaz", verbose = FALSE)
plot(nuclear, add = TRUE, pch = 19, col = "red")
plot(ukgaz, add = TRUE)
title("Base map")

Compute the Plume

To map the plume we create a 500x500 raster that spans the polygon and then fill it with values computed by the plume function. The call to SpatialPoints(plevel) creates a set of coordinates of all the grid cells at which to compute the plume values. The plume parameters have been given to us by some scientists who tell us that's what they are and that's what we should use. You will notice that the wind has been blowing at an angle of \(\pi/4\), for example. We plot the plume and the other data on a map.
plevel = raster(extent(cumbria), nrows = 500, ncols = 500)
plevel[] = plume(nuclear, SpatialPoints(plevel), 12000, 400, 5, pi/4)
projection(plevel) = proj4string(cumbria)
plot(plevel, col = colours)
plot(cumbria, add = TRUE)
plot(nuclear, add = TRUE, pch = 19, col = "red")
plot(ukgaz, add = TRUE)
title("Plume Spread")

Urban Exposure

We could compute the exposure at each town by working out the distance and angle of each town from the source and calling the plume function. Instead we'll use the extract function to sample from the raster grid at the town locations. Then we can extract the towns that have an exposure over 9000ng/\(m^2\) and put them in a table.
ukgaz$exposure = extract(plevel, ukgaz)
over9000 = ukgaz$exposure > 9000
dangerTowns = data.frame(ukgaz@data[over9000, c("Name", "pop", "exposure")])
                Name  pop exposure
4            Keswick 5508     9063
28        Borrowdale  438     9355
43            Lorton  250     9158
46        Loweswater  209    10350
50        Buttermere  127    10713
97          Gosforth 1230    11135
100         Lamplugh  763     9323
109 Ennerdale Bridge  240    10536
110            Haile  237    10921
112         Ponsonby   92    11961

River Exposure

The authorities want to know which rivers have been exposed to a level over 9000ng/\(m^2\) of pollution. To compute this we will work out the 9000ng/\(m^2\) contour from the plume and intersect it with a river dataset. The following functions compute the polygonal contour of the plume function given a value.
invpfunc <- function(ang, a, b, k, phi, f) {
    ### inverse plume function - compute distance given f
    d2 = -log(f/a) * b^2/exp(-k * cos(ang - phi))^2

contourPlume <- function(src, a, b, k, phi, f) {
    ### return a SpatialPolygon of a plume function at value f
    ### by sweeping round 360 degrees and computing the distance
    ang = seq(0, 2 * pi, len = 360)
    d = invpfunc(ang, a, b, k, phi, f = f)
    xy = coordinates(src)
    polys = SpatialPolygons(list(Polygons(list(Polygon(cbind(xy[, 1] + d * sin(ang), xy[, 2] + d * cos(ang)))),
        ID = 1)))
    proj4string(polys) = proj4string(src)
First we'll load the rivers and lakes data, then compute the 9000 contour and intersect it with the rivers.
rivers = toOSGB(readOGR(Datadir, "rivers", verbose = FALSE))
lakes = toOSGB(readOGR(Datadir, "naturalwater", verbose = FALSE))
nineK = contourPlume(nuclear, 12000, 400, 5, pi/4, 9000)
rivers9k = gIntersection(nineK, rivers, byid = TRUE)

plot(rivers9k, col = "blue", lwd = 3)
plot(rivers, col = "blue", lty = 2, add = TRUE)
plot(nineK, add = TRUE)
plot(lakes, add = TRUE, col = "blue")
title("River sections with >9000 exposure")

Lake Exposure

This region of England contains not only its highest mountain but also its deepest lake. We want to compute the total pollution that lands on the lakes. For this we use polygons from the natural water dataset.
lakes = toOSGB(readOGR(Datadir, "naturalwater", verbose = FALSE))
plot(plevel, col = colours)
plot(cumbria, add = TRUE)
plot(lakes, add = TRUE, col = "#0000FF")
The extract function can now be used to sum the plume within polygons. We multiply by the grid cell area in metres and divide by \(10^9\) to get the total pollution in grams that has gone into the lake.
lakes$pollution = extract(plevel, lakes, sum, small = TRUE) * prod(res(plevel))/1e+09
We can then plot the ten most polluted lakes by total pollution amount. Notice the use of the RColorBrewer and lattice packages to modify the default behaviour of the spplot function. We add sp.text to the panel function to label the lakes - the default label position sits right over the feature so we adjust its position slightly.
top10 = order(lakes$pollution, decreasing = TRUE)[1:10]
spplot(lakes[top10, ], "pollution", panel = function(...) {
    sp.text(coordinates(lakes[top10, ]), lakes$name[top10], adj = 0)
}, main = "Lake exposure")
lakes@data[top10, ]
##     osm_id               name  type pollution
## 34 4580520      Derwent Water water    49.000
## 35 4580524 Bassenthwaite Lake water    45.264
## 44 4582862          Ullswater water    35.408
## 39 4580535    Ennerdale Water water    32.499
## 37 4580533     Crummock Water water    27.251
## 40 4580536         Wast Water water    23.733
## 38 4580534         Buttermere water    10.498
## 36 4580532         Loweswater water     5.870
## 46 4583133      Floutern Tarn water     2.703
## 60 4583896     Greendale Tarn water     2.053

Simulating Sensor Samples

In this section we will briefly 'break the fourth wall' and talk about how we simulate the sample measurements for the next part.

Since there hasn't really been a major pollution incident, we will have to simulate the results of sensor sampling. We could just take the values from the plevel raster, but that would be exact, and statistics is never exact. We could add some independent random noise to the plume raster data for each sensor but that's not very satisfactory either. That would be equivalent to saying our measurement process is very poor - and if we took the reading again at the same point we'd get a different value. This is true in reality - no instrument has infinite precision - but it shouldn't swamp the rest of the variability.

What we'll do instead is create a smoothly varying raster that is the value of a measurement taken at that point. The reason this varies from the plume value could be topology, or land cover, or rainfall, or some other spatially-continuous process. We could also add further independent random noise to these numbers to represent the imprecision of the device.

This function creates a number of points over a grid and then does a crude smoothing using the idw function (of which more later). This is then added to the input raster to create a smoothly varying version of the input. The various parameters can change aspects of the output, but have been set at useful values for the case at hand. After this section of code, the values in raster sensorValues will be the reading of a sensor placed at those locations.

noisyRaster <- function(plevel, crs, npts = 300, idp = 4, a = 3000) {
    pts = spsample(SpatialPoints(plevel), npts, type = "random")
    pts = SpatialPointsDataFrame(pts, data.frame(Z = rnorm(npts, 0, 1)))
    proj4string(pts) = CRS(proj4string(ukgaz))
    fk = idw(Z ~ 1, pts, SpatialPoints(plevel, proj4string = crs), idp = idp)
    fkp = plevel
    fkp[] = fk$var1.pred
    pmax = max(values(plevel))
    return(fkp * a * (plevel/pmax) + plevel)

sensorValues = noisyRaster(plevel, CRS(proj4string(ukgaz)))
## [inverse distance weighted interpolation]

Now we can take the coordinates of the villages, sample from the simulated sensor values at those locations, create a SpatialPolygonsDataFrame and save it to disk:

samples = ukgaz[, "Name"]
samples$sample = extract(sensorValues, samples)
writeOGR(samples, Datadir, "samples", "ESRI Shapefile", overwrite = TRUE)

Great! Now, forget you did all that, and pretend that someone has gone to every village, taken a sensor reading, and put it all into a Shapefile for you. Read on...

Sampling Stations

At each town and village location a sampling system was set up to measure the pollution. The data was supplied as a point Shapefile with the measurements in an attribute called sample. We'll read this in and plot it.
samples = readOGR(Datadir, "samples", verbose = FALSE)
spplot(samples, "sample", main = "Samples from towns/villages")

Inverse Distance Weighting

The automap package has some functions for producing smoothed maps from point-sampled data. Perhaps the simplest method is inverse distance weighting. In this, the value at any point is a weighted sum of the sampled values, where the weight is inversely related to the distance, so that further points have little influence.

The idw function performs inverse distance weighting, and a few more lines convert the predictions to a raster data set.

rIDW = raster(extent(cumbria), ncol = 100, nrow = 100, crs = projection(samples))
sampleFitIDW = idw(sample ~ 1, samples, SpatialPoints(rIDW, proj4string = CRS(proj4string(samples))))
## [inverse distance weighted interpolation]
rIDW[] = sampleFitIDW$var1.pred
title("Inverse Distance Weighting")

This looks rather spotty. We could try adjusting some of the parameters or we could use a more statistically principled method.


Kriging can be thought of as similar to inverse distance weighting, but the effect of distance is derived from inspecting the relationship between sample points and their separation. This enables the construction of a "variogram", and from this both an estimate and a variance of the sample measurement can be made over a grid.

The autoKrige function returns and object that contains the prediction estimates and variances, and the variogram parameters. We first set up an empty rasters, rKrige to define the prediction locations, and call the autoKrige function. Plotting the autoKrige object shows the prediction, the standard error, and the variogram. The variogram is usually an increasing function with distance, showing that points further apart are less likely to be similar. Fitting variograms can be a fine art, with sometimes crucial selections of function parameters, ranges, and binning.

sampleFit = autoKrige(sample ~ 1, samples)
## Warning message: Removed 1 duplicate observation(s) in input_data:
##          coordinates                 Name sample
## 93  (301671, 507141)    St John Beckermet   3414
## 104 (301671, 507141) St Bridget Beckermet   3414
## [using ordinary kriging]

Exceedence Mapping

Because kriging gives us a prediction and standard error, we can make meaningful statements about the predictions. For example, we can compute the probability that the underlying value is over 9000 units by using the pnorm function with the given means and standard deviations.

Then we can map areas that are definitely (\(p>0.95\)) above 9000 in red, those that are definitely (\(p<0.05\)) below 9000 in blue, and the rest of the area in gray, showing this is a `gray area' where we cannot decide with confidence if the value is over 9000 units.

First we'll create two functions to compute these exceedences as a raster and to plot them. We'll code significantly low as zero, significantly high as two, and intermediate as one.

pOver = function(thresh, krigeFit) {
    ### convert kriging output to exceedence

    ### pnorm computes probabilities from a Normal/Gaussian distribution
    pValue = pnorm(thresh, krigeFit$krige_output$var1.pred, krigeFit$krige_output$var1.stdev, lower.tail = FALSE)
    ### create a raster object from the kriging coordinates and the probability
    pExceed = rasterFromXYZ(cbind(coordinates(krigeFit$krige_output), (pValue > 0.05) + (pValue > 0.95)))
    attr(pExceed, "thresh") = thresh

eColours = c("#4040FF", "grey", "red")

plotExceed = function(pe, overlay, region) {
    ### plot the exceedence with a couple of additional overlays
    thresh = attr(pe, "thresh")
    plot(pe, col = eColours, legend = FALSE)
    legend("topright", c(paste("Under ", thresh), "Gray Area", paste("Over ", thresh)), fill = eColours,
        bg = "white")
    plot(overlay, add = TRUE)
    plot(region, add = TRUE)

Now we'll compute the exceedence map from the kriging and plot it.

pK = pOver(9000, sampleFit)
plotExceed(pK, nineK, cumbria)
title("Exceedence Map")

Additional Sampling Locations

Now we have been given the capability to deploy 25 new sensors. Where do we put them? There's little point putting them in the areas where we are certain of the values, so we'll put them within the convex hull of the grey area from the kriging. The spsample function can generate simple point sampling schemes, including random, gridded and the one we'll use here, hexagonal. The hexagonal scheme might not generate exactly 25 sensor points, so we'll add in some extra ones randomly to make up the numbers.

# grey cells are == 1
greyPts = SpatialPoints(xyFromCell(pK, 1:length(pK))[values(pK) == 1 & !is.na(values(pK)),
ch = chull(coordinates(greyPts))
ch = c(ch[length(ch)], ch)
nNew = 25
newPts = spsample(Polygon(coordinates(greyPts[ch, ])), nNew, "hexagonal")
extra = nNew - length(newPts)
if (extra > 0) {
    more = spsample(Polygon(coordinates(greyPts[ch, ])), extra, "random")
    newPts = rbind(newPts, more)
plot(pK, col = eColours, legend = FALSE)
points(newPts, pch = 19)
title("New Sensor Locations")

Faking The Sensor Measurements

Now we have to fake the sensor measurements by sampling from that raster we created.

newPts2 = SpatialPointsDataFrame(newPts, data.frame(sample = extract(sensorValues, newPts)))
proj4string(newPts2) = CRS(proj4string(samples))
writeOGR(newPts2, Datadir, "sensors", "ESRI Shapefile", overwrite = TRUE)

Now we have a Shapefile of simulated measurements at the new sensor locations that we can pretend someone gave us. Read on...

Kriging With The New Data

The new sensor data has been supplied to us in a Shapefile. We'll read this in and merge it with the original settlement measurements. We run the kriging function again. We then re-use our functions to compute and plot the exceedence.

sensors = readOGR(Datadir, "sensors")
## OGR data source with driver: ESRI Shapefile 
## Source: "../Data", layer: "sensors"
## with 25 features and 1 fields
## Feature type: wkbPoint with 2 dimensions
newData = rbind(sensors[, "sample"], samples[, "sample"])
sampleFit2 = autoKrige(sample ~ 1, newData)
## Warning message: Removed 1 duplicate observation(s) in input_data:
##          coordinates sample
## 118 (301671, 507141)   3414
## 129 (301671, 507141)   3414
## [using ordinary kriging]
pK2 = pOver(9000, sampleFit2)
plotExceed(pK2, nineK, cumbria)
title("Exceedence Map")

From this second map you should be able to see the grey area is smaller - we can create a table of pixel count in each category for the two exceedence maps which shows this change:

eTable = cbind(table(values(pK)), table(values(pK2)))
rownames(eTable) = c("Low", "Grey", "High")
colnames(eTable) = c("Settlements", "Plus Sensors")
##      Settlements Plus Sensors
## Low         4426         4393
## Grey         541          490
## High          36          114

Further Ideas

Instead of getting the sensor locations from the spsample function, you could try manually placing the sensors using the locator() function. Another possibility might be that sensors can only be placed on main roads - spsample can take points from lines too!

Lessons Learnt

  • Reading vector data with readOGR()
  • Reading raster data with raster()
  • Transforming vectors with spTransform()
  • Mapping with plot() and spplot()
  • Point- and area-sampling rasters with extract()
  • Vector-vector intersection with gIntersection()
  • Simple kriging with the automap package
  • Converting kriged output to raster format


Because we simulated the sensor measurements, we can actually plot them. This is the underlying surface that the kriging was predicting: