For this workshop session we will consider a release of pollution from a point source.
The scenario is that a pollution release from a research plant on the coast has resulted in a pollution plume over the county. Scientists have a model for the pollution density, and want to know which towns will be exposed to a level of pollution over 9000ng (nanograms) per square metre.
We need a number of packages for this.
library("sp") library("raster") library("rgdal") library("rgeos")
We'll compute a colour palette for plots, and set it as default for sp graphics:
library("RColorBrewer") colours = colorRampPalette(brewer.pal(7, "OrRd")[-1])(50) sp.theme(set = TRUE, regions = list(col = colours))
All our data files are in a folder called Data
at the same level as the current folder:
Datadir = "../Data" Datafile = function(f) { file.path(Datadir, f) }
You should make sure Datadir
is set to the place you've put the data files. If you
have the tcltk
package, then you can use as.character(tcltk::tkchooseDirectory())
.
Don't put a trailing slash or backslash at the end of Datadir
or files might not be found.
We will use a simple model for the spread of pollution. The total pollution at a location at distance \(d\) and angle \(\theta\) from the point source is given by:
$$ f = \alpha \exp(-(d/\beta)^2 e^{-\kappa cos(\theta - \phi)^2}) $$
Where \(\alpha\) is the pollution level at distance zero, \(\beta\) is a distance scale factor, \(\phi\) is the major angle of the pollution plume, and \(\kappa\) is the eccentricity. If \(\kappa\) is 0 then the plume becomes a circularly symmetric exponential decay, and as \(\kappa\) increases the plume becomes more stretched in the direction of the \(\phi\) parameter. You can consider \(\kappa\) and \(\phi\) as the wind strength and direction respectively.
We can create a plume function that computes the pollution for a given source src
at
a number of destinations dst
given a set of parameters:
pfunc = function(d2, ang, a, b, k, phi) { ## plume value for distance-squared d2 a * exp(-(d2 * exp(-k * cos(ang - phi))^2/b^2)) } plume <- function(src, dst, a, b, k, phi) { ## plume value for src at dst src = coordinates(src) dst = coordinates(dst) d2 = (dst[, 1] - src[, 1])^2 + (dst[, 2] - src[, 2])^2 ang = atan2(dst[, 2] - src[, 2], dst[, 1] - src[, 1]) pfunc(d2, ang, a, b, k, phi) }
We will use three data sets - the location of the pollution source, the location and
population of towns from a UK gazetteer source, and a polygon boundary of Cumbria obtained
from the gadm
web site.
Two of the datasets are in
lat-long format, so we use spTransform
to convert them to Ordnance Survey
grid references in metres with a little wrapper function. We then make a quick
map with the source as a red dot and the towns as crosses.
toOSGB = function(s) { spTransform(s, CRS("+init=epsg:27700")) } cumbria = toOSGB(readOGR(Datadir, "cumbria", verbose = FALSE)) nuclear = toOSGB(readOGR(Datadir, "nuclear", verbose = FALSE)) ukgaz = readOGR(Datadir, "ukgaz", verbose = FALSE) plot(cumbria) plot(nuclear, add = TRUE, pch = 19, col = "red") plot(ukgaz, add = TRUE) title("Base map")
plume
function. The call to
SpatialPoints(plevel)
creates a set of coordinates of all the grid cells
at which to compute the plume values. The plume parameters have been
given to us by some scientists who tell us that's what they are and that's what
we should use. You will notice that the wind has been blowing at an angle of \(\pi/4\), for example.
We plot the plume and the other data on a map.
plevel = raster(extent(cumbria), nrows = 500, ncols = 500) plevel[] = plume(nuclear, SpatialPoints(plevel), 12000, 400, 5, pi/4) projection(plevel) = proj4string(cumbria) plot(plevel, col = colours) plot(cumbria, add = TRUE) plot(nuclear, add = TRUE, pch = 19, col = "red") plot(ukgaz, add = TRUE) title("Plume Spread")
extract
function
to sample from the raster grid at the town locations. Then we can extract the towns that have an
exposure over 9000ng/\(m^2\) and put them in a table.
ukgaz$exposure = extract(plevel, ukgaz) over9000 = ukgaz$exposure > 9000 dangerTowns = data.frame(ukgaz@data[over9000, c("Name", "pop", "exposure")]) dangerTowns
Name pop exposure 4 Keswick 5508 9063 28 Borrowdale 438 9355 43 Lorton 250 9158 46 Loweswater 209 10350 50 Buttermere 127 10713 97 Gosforth 1230 11135 100 Lamplugh 763 9323 109 Ennerdale Bridge 240 10536 110 Haile 237 10921 112 Ponsonby 92 11961
invpfunc <- function(ang, a, b, k, phi, f) { ### inverse plume function - compute distance given f d2 = -log(f/a) * b^2/exp(-k * cos(ang - phi))^2 sqrt(d2) } contourPlume <- function(src, a, b, k, phi, f) { ### return a SpatialPolygon of a plume function at value f ### by sweeping round 360 degrees and computing the distance ang = seq(0, 2 * pi, len = 360) d = invpfunc(ang, a, b, k, phi, f = f) xy = coordinates(src) polys = SpatialPolygons(list(Polygons(list(Polygon(cbind(xy[, 1] + d * sin(ang), xy[, 2] + d * cos(ang)))), ID = 1))) proj4string(polys) = proj4string(src) polys }
rivers = toOSGB(readOGR(Datadir, "rivers", verbose = FALSE)) lakes = toOSGB(readOGR(Datadir, "naturalwater", verbose = FALSE)) nineK = contourPlume(nuclear, 12000, 400, 5, pi/4, 9000) rivers9k = gIntersection(nineK, rivers, byid = TRUE) plot(rivers9k, col = "blue", lwd = 3) plot(rivers, col = "blue", lty = 2, add = TRUE) plot(nineK, add = TRUE) plot(lakes, add = TRUE, col = "blue") title("River sections with >9000 exposure")
lakes = toOSGB(readOGR(Datadir, "naturalwater", verbose = FALSE)) plot(plevel, col = colours) plot(cumbria, add = TRUE) plot(lakes, add = TRUE, col = "#0000FF") title("Lakes")
extract
function can now be used to sum the plume within polygons. We multiply by the
grid cell area in metres and divide by \(10^9\) to get the total pollution in grams that has gone into the lake.
lakes$pollution = extract(plevel, lakes, sum, small = TRUE) * prod(res(plevel))/1e+09
RColorBrewer
and lattice
packages to modify the default behaviour
of the spplot
function. We add sp.text
to the panel function to label the
lakes - the default label position sits right over the feature so we adjust its position
slightly.
library("lattice") library("RColorBrewer") top10 = order(lakes$pollution, decreasing = TRUE)[1:10] spplot(lakes[top10, ], "pollution", panel = function(...) { panel.polygonsplot(...) sp.text(coordinates(lakes[top10, ]), lakes$name[top10], adj = 0) }, main = "Lake exposure")
lakes@data[top10, ]
## osm_id name type pollution ## 34 4580520 Derwent Water water 49.000 ## 35 4580524 Bassenthwaite Lake water 45.264 ## 44 4582862 Ullswater water 35.408 ## 39 4580535 Ennerdale Water water 32.499 ## 37 4580533 Crummock Water water 27.251 ## 40 4580536 Wast Water water 23.733 ## 38 4580534 Buttermere water 10.498 ## 36 4580532 Loweswater water 5.870 ## 46 4583133 Floutern Tarn water 2.703 ## 60 4583896 Greendale Tarn water 2.053
In this section we will briefly 'break the fourth wall' and talk about how we simulate the sample measurements for the next part.
Since there hasn't really been a major pollution incident, we will
have to simulate the results of sensor sampling. We could just take
the values from the plevel
raster, but that would be
exact, and statistics is never exact. We could add some independent
random noise to the plume raster data for each sensor but that's not
very satisfactory either. That would be equivalent to saying our measurement
process is very poor - and if we took the reading again at the same point we'd
get a different value. This is true in reality - no instrument has infinite precision -
but it shouldn't swamp the rest of the variability.
What we'll do instead is create a smoothly varying raster that is the value of a measurement taken at that point. The reason this varies from the plume value could be topology, or land cover, or rainfall, or some other spatially-continuous process. We could also add further independent random noise to these numbers to represent the imprecision of the device.
This function creates a number of points over a grid and then does
a crude smoothing using the idw
function (of which more later).
This is then added to the input raster to create a smoothly varying version
of the input. The various parameters can change aspects of the output, but
have been set at useful values for the case at hand. After this section of code,
the values in raster sensorValues
will be the reading of a sensor placed
at those locations.
library("automap") noisyRaster <- function(plevel, crs, npts = 300, idp = 4, a = 3000) { pts = spsample(SpatialPoints(plevel), npts, type = "random") pts = SpatialPointsDataFrame(pts, data.frame(Z = rnorm(npts, 0, 1))) proj4string(pts) = CRS(proj4string(ukgaz)) fk = idw(Z ~ 1, pts, SpatialPoints(plevel, proj4string = crs), idp = idp) fkp = plevel fkp[] = fk$var1.pred pmax = max(values(plevel)) return(fkp * a * (plevel/pmax) + plevel) } sensorValues = noisyRaster(plevel, CRS(proj4string(ukgaz)))
## [inverse distance weighted interpolation]
Now we can take the coordinates of the villages, sample from the simulated sensor values
at those locations, create a SpatialPolygonsDataFrame
and save it to disk:
samples = ukgaz[, "Name"] samples$sample = extract(sensorValues, samples) writeOGR(samples, Datadir, "samples", "ESRI Shapefile", overwrite = TRUE) rm(samples)
Great! Now, forget you did all that, and pretend that someone has gone to every village, taken a sensor reading, and put it all into a Shapefile for you. Read on...
sample
. We'll read this in and plot it.
samples = readOGR(Datadir, "samples", verbose = FALSE) spplot(samples, "sample", main = "Samples from towns/villages")
The automap
package has some functions for producing smoothed maps from
point-sampled data. Perhaps the simplest method is inverse distance weighting.
In this, the value at any point is a weighted sum of the sampled values, where
the weight is inversely related to the distance, so that further points
have little influence.
The idw
function performs inverse distance weighting, and a few more lines
convert the predictions to a raster data set.
library("automap") rIDW = raster(extent(cumbria), ncol = 100, nrow = 100, crs = projection(samples)) sampleFitIDW = idw(sample ~ 1, samples, SpatialPoints(rIDW, proj4string = CRS(proj4string(samples))))
## [inverse distance weighted interpolation]
rIDW[] = sampleFitIDW$var1.pred plot(rIDW) title("Inverse Distance Weighting")
This looks rather spotty. We could try adjusting some of the parameters or we could use a more statistically principled method.
Kriging can be thought of as similar to inverse distance weighting, but the effect of distance is derived from inspecting the relationship between sample points and their separation. This enables the construction of a "variogram", and from this both an estimate and a variance of the sample measurement can be made over a grid.
The autoKrige
function returns and object that contains the prediction estimates and variances, and
the variogram parameters. We first set up an empty rasters, rKrige
to
define the prediction locations, and call the autoKrige
function.
Plotting the autoKrige
object shows the prediction, the standard error, and the
variogram. The variogram is usually an increasing function with distance, showing that points
further apart are less likely to be similar. Fitting variograms can be a fine art, with sometimes
crucial selections of function parameters, ranges, and binning.
sampleFit = autoKrige(sample ~ 1, samples)
## Warning message: Removed 1 duplicate observation(s) in input_data:
## coordinates Name sample ## 93 (301671, 507141) St John Beckermet 3414 ## 104 (301671, 507141) St Bridget Beckermet 3414 ## [using ordinary kriging]
plot(sampleFit)
Because kriging gives us a prediction and standard error, we can make meaningful statements
about the predictions. For example, we can compute the probability that the underlying value
is over 9000 units by using the pnorm
function with the given means and standard
deviations.
Then we can map areas that are definitely (\(p>0.95\)) above 9000 in red, those that are definitely (\(p<0.05\)) below 9000 in blue, and the rest of the area in gray, showing this is a `gray area' where we cannot decide with confidence if the value is over 9000 units.
First we'll create two functions to compute these exceedences as a raster and to plot them. We'll code significantly low as zero, significantly high as two, and intermediate as one.
pOver = function(thresh, krigeFit) { ### convert kriging output to exceedence ### pnorm computes probabilities from a Normal/Gaussian distribution pValue = pnorm(thresh, krigeFit$krige_output$var1.pred, krigeFit$krige_output$var1.stdev, lower.tail = FALSE) ### create a raster object from the kriging coordinates and the probability pExceed = rasterFromXYZ(cbind(coordinates(krigeFit$krige_output), (pValue > 0.05) + (pValue > 0.95))) attr(pExceed, "thresh") = thresh return(pExceed) } eColours = c("#4040FF", "grey", "red") plotExceed = function(pe, overlay, region) { ### plot the exceedence with a couple of additional overlays thresh = attr(pe, "thresh") plot(pe, col = eColours, legend = FALSE) legend("topright", c(paste("Under ", thresh), "Gray Area", paste("Over ", thresh)), fill = eColours, bg = "white") plot(overlay, add = TRUE) plot(region, add = TRUE) }
Now we'll compute the exceedence map from the kriging and plot it.
Now we have been given the capability to deploy 25 new sensors. Where do we put them?
There's little point putting them in the areas where we are certain of the values, so
we'll put them within the convex hull of the grey area from the kriging. The
spsample
function can generate simple point sampling schemes,
including random, gridded and the one we'll use here, hexagonal. The hexagonal
scheme might not generate exactly 25 sensor points, so we'll add in some extra
ones randomly to make up the numbers.
# grey cells are == 1 greyPts = SpatialPoints(xyFromCell(pK, 1:length(pK))[values(pK) == 1 & !is.na(values(pK)), ]) ch = chull(coordinates(greyPts)) ch = c(ch[length(ch)], ch) nNew = 25 newPts = spsample(Polygon(coordinates(greyPts[ch, ])), nNew, "hexagonal") extra = nNew - length(newPts) if (extra > 0) { more = spsample(Polygon(coordinates(greyPts[ch, ])), extra, "random") newPts = rbind(newPts, more) } plot(pK, col = eColours, legend = FALSE) points(newPts, pch = 19) title("New Sensor Locations")
Now we have to fake the sensor measurements by sampling from that raster we created.
newPts2 = SpatialPointsDataFrame(newPts, data.frame(sample = extract(sensorValues, newPts))) proj4string(newPts2) = CRS(proj4string(samples)) writeOGR(newPts2, Datadir, "sensors", "ESRI Shapefile", overwrite = TRUE) rm(newPts2)
Now we have a Shapefile of simulated measurements at the new sensor locations that we can pretend someone gave us. Read on...
The new sensor data has been supplied to us in a Shapefile. We'll read this in and merge it with the original settlement measurements. We run the kriging function again. We then re-use our functions to compute and plot the exceedence.
sensors = readOGR(Datadir, "sensors")
## OGR data source with driver: ESRI Shapefile ## Source: "../Data", layer: "sensors" ## with 25 features and 1 fields ## Feature type: wkbPoint with 2 dimensions
newData = rbind(sensors[, "sample"], samples[, "sample"]) sampleFit2 = autoKrige(sample ~ 1, newData)
## Warning message: Removed 1 duplicate observation(s) in input_data:
## coordinates sample ## 118 (301671, 507141) 3414 ## 129 (301671, 507141) 3414 ## [using ordinary kriging]
pK2 = pOver(9000, sampleFit2) plotExceed(pK2, nineK, cumbria) title("Exceedence Map")
From this second map you should be able to see the grey area is smaller - we can create a table of pixel count in each category for the two exceedence maps which shows this change:
eTable = cbind(table(values(pK)), table(values(pK2))) rownames(eTable) = c("Low", "Grey", "High") colnames(eTable) = c("Settlements", "Plus Sensors") eTable
## Settlements Plus Sensors ## Low 4426 4393 ## Grey 541 490 ## High 36 114
Instead of getting the sensor locations from the spsample
function,
you could try manually placing the sensors using the locator()
function. Another possibility might be that sensors can only
be placed on main roads - spsample
can take points from lines too!
readOGR()
raster()
spTransform()
plot()
and spplot()
extract()
gIntersection()
automap
package