Crime Data Mapping

For this workshop session we'll read in some data and do some basic mapping.

We need a number of packages for this.


All our data files are in a folder called Data at the same level as the current folder:

Datadir = "../Data"
Datafile = function(f) {
    file.path(Datadir, f)

You should make sure Datadir is set to the place you've put the data files. If you have the tcltk package, then you can use as.character(tcltk::tkchooseDirectory()). Don't put a trailing slash or backslash at the end of Datadir or files might not be found.

Read The Street-level Crime Data

The crime data at the street level is supplied in a CSV file each month. We'll read in one month and plot it. The coordinates are Ordnance Survey of Great Britain grid references, which is EPSG code 27700.

streetCrime = read.csv(Datafile("2012-03-cumbria-street.csv"))
##     Month         Falls.within Easting Northing                          Location Crime.type Context
## 1 2012-03 Cumbria Constabulary Cumbria Constabulary  351534   492353          On or near Dowker'S Lane   Burglary      NA
## 2 2012-03 Cumbria Constabulary Cumbria Constabulary  299210   528515            On or near Supermarket   Burglary      NA
## 3 2012-03 Cumbria Constabulary Cumbria Constabulary  351183   492741        On or near Serpentine Road   Burglary      NA
## 4 2012-03 Cumbria Constabulary Cumbria Constabulary  301277   524712          On or near Hallwood Road   Burglary      NA
## 5 2012-03 Cumbria Constabulary Cumbria Constabulary  299416   527779 On or near Sports/Recreation Area   Burglary      NA
## 6 2012-03 Cumbria Constabulary Cumbria Constabulary  361096   478662            On or near Chapel Lane   Burglary      NA
coordinates(streetCrime) = ~Easting + Northing
proj4string(streetCrime) = CRS("+init=epsg:27700")
title("Street-level crime locations")

OSM Background

Can we do a nicer map using the OpenStreetMap package? This gives us a function to download and print map tiles from Open Street Map, and other tile servers. These tiles use the Google Mercator coordinate system, so we have to transform our points to that. The osm() function returns the CRS string for the tiles.

latLongBox = bbox(spTransform(streetCrime, CRS("+init=epsg:4326")))
map = openmap(c(latLongBox[2, 2], latLongBox[1, 1]), c(latLongBox[2, 1], latLongBox[1, 2]),
    type = "osm")
points(spTransform(streetCrime, osm()))

Using ggplot

If we add the coordinates back to the data frame we can use ggplot to produce some nice plots:

streetCrime@data = cbind(streetCrime@data, coordinates(streetCrime))
ggplot(streetCrime@data) + geom_point(aes(x = Easting, y = Northing, colour = Crime.type)) +
    coord_equal() + opts(title = "Street Crime By Type")

Relative Rates

It is clear from the maps that there are more crimes in urban areas, because that's where people live. What we really want to do is look at rates of crime relative to some baseline. We could use population if we had a good measure of baseline population, or we can compare one crime rate against all crimes.

The sparr package provides functions for computing estimates of densities of point patterns using kernel smoothing. It has several methods for computing the kernel bandwidth, including adaptive methods. To use it we have to convert our points into an ppp object from the spatstat package. For the window containing the points we'll use the convex hull. We'll write a function that does the smoothing for a particular crime type. The function will convert the output from bivariate.density into a raster object. We'll also read in some town locations and get the largest of them for some context when mapping.

## Warning message: RGL: unable to open X11 display
## Warning message: error in rgl_init

coords = coordinates(streetCrime)
cHull = as.matrix(coords[rev(chull(coords)), ])
data = ppp(x = coords[, 1], y = coords[, 2], window = owin(poly = cHull), marks = streetCrime$Crime.type)
## Warning message: data contain duplicated points
makeIntensity <- function(data, type) {
    d = bivariate.density(data = data, ID = type, intensity = TRUE, comment = FALSE, pilotH = diff(range(data$x))/10)
    raster(list(x = d$X, y = d$Y, z = d$Zm))

allCrime = makeIntensity(data, NULL)
asbos = makeIntensity(data, "Anti-social behaviour")
## Warning message: more than two distinct ID values detected in 'data'
drugs = makeIntensity(data, "Drugs")
## Warning message: more than two distinct ID values detected in 'data'
ukgaz = readOGR(Datadir, "ukgaz", verbose = FALSE)
topTowns = ukgaz[order(ukgaz$pop, decreasing = TRUE)[1:8], ]

colours = colorRampPalette(c("blue", "gray", "red"))(20)

ASBO ratio

plot(asbos/allCrime, col = colours)
text(topTowns, topTowns$Name)
title("ASBOs as fraction of all crime")

The regional rate of ASBO crime to all crime is 0.5248. We can see that Kendal seems higher and Workington lower than this average value.

Drugs Ratio

plot(drugs/allCrime, col = colours)
text(topTowns, topTowns$Name)
title("Drug offences as fraction of all crime")

This shows almost the opposite pattern. The average drug crime ratio is 0.0333, with Workington over double this, and Kendal much lower.

These kind of 'heat maps' are aften used to show crime maps, house fires, or other point-based data, but without reference to some kind of background population as a denominator they can't display meaningful risk. The sparr package has functions for computing relative risk with error calculations. Read the package documentation for more information.

Police Neighbourhood Areas

UK crime data is organised by areas known as neighbourhoods. The boundaries of the neighbourhoods are available as a shapefile. The coordinates are lat-long, so we convert to the same CRS as the street crime data. In this section we will aggregate the cases to neighbourhood level.

neighbourhoods = spTransform(readOGR(Datadir, "neighbourhoods"), CRS(proj4string(streetCrime)))
## OGR data source with driver: ESRI Shapefile 
## Source: "../Data", layer: "neighbourhoods"
## with 46 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
plot(streetCrime, add = TRUE)
title("Neighbourhood polygons")

Topological Colouring

Plain regions like in the previous map never look too interesting, but colouring everything one colour isn't much better. Can we colour regions so that two regions with a common boundary don't have the same colour? The theory tells us that we only need at most four colours to do this for a planar map. However, the algorithm to do that is very hard, and our map might not be strictly planar, with islands and holes here and there.

The following function uses poly2nb from the spdep package to compute the adjacency of regions, and then executes a 'greedy' algorithm. For each region it finds the smallest colour not present in any of its neighbours. If all the colours are currently being used by its neighbours then it has to start using a new one.

colourmap <- function(polys, snap) {
    colouring = list()
    length(colouring) = length(polys)
    nbl = poly2nb(polys, snap = snap)
    nextFreeColour = 1
    for (node in 1:length(nbl)) {
        nbrColours = unlist(colouring[nbl[[node]]])
        freeColour = min(setdiff((1:nextFreeColour), nbrColours))
        colouring[[node]] = freeColour
        if (freeColour == nextFreeColour) {
            nextFreeColour = nextFreeColour + 1
colouring = colourmap(neighbourhoods, snap = 100)
palette(brewer.pal(max(colouring), "Set3"))
plot(neighbourhoods, col = colouring)
title("Topology colouring")

This map shouldn't have any two adjacent regions with the same colour. The snap had to be adjusted to make this work, however,because the map has some problems...

Digitizing Problems

Unfortunately the borders are very poorly digitized, and the edges between neighbourhoods do not match up very well. You can see this by zooming in on Carlisle and plotting with semi-transparent colours. This shows up the overlaps and the gaps.

plot(neighbourhoods, xlim = c(336000, 344000), ylim = c(552000, 560000), col = rainbow(10,
    alpha = 0.5), border = NA)
title("Overlapping and misaligned polygons")

Point overlays

We can use the over function to count the number of crimes in each neighbourhood. This function does various things depending on the exact class of object passed to it. It is also fussy about making sure its arguments have the same coordinate system.

Because we know that our polygons aren't well constructed, we'll return the list of polygons that each point is in. We can then count how many by computing the length of the list element. We'll then plot Carlisle and show those points that are not over exactly one region.

crimeN = over(SpatialPoints(streetCrime), SpatialPolygons(neighbourhoods@polygons), returnList = TRUE)
nPoly = unlist(lapply(crimeN, length))
## nPoly
##    0    1    2 
##    6 4287   57 
plot(neighbourhoods, xlim = c(336000, 344000), ylim = c(552000, 560000), col = rainbow(10,
    alpha = 0.5), border = NA)
plot(streetCrime, cex = 0.25, pch = 19, add = TRUE, col = "black")
plot(streetCrime[nPoly > 1, ], add = TRUE, pch = 19, col = "red")
plot(streetCrime[nPoly < 1, ], add = TRUE, pch = 19, col = "blue")
title("Points not in a single polygon")

Points in polygons

Conversely we can overlay polygons on points and get a list of the same length as the number of polygons. Each element contains the indices of points in that polygon. This way we will double-count points that are in more than one polygon, and drop those outside all the polygons.

crimeL = over(SpatialPolygons(neighbourhoods@polygons), SpatialPoints(streetCrime), returnList = TRUE)
# how many crimes per area
unlist(lapply(crimeL, length))
##  [1] 399 129 120  93  66 100  33  31  86 145 156 244  54  62  62  47  35 126  90 115  74 205  41  65 270 122 190  40  92 127  93  81
## [33]   7  38  43  53  32  21  24 152  33 142 195  20  38  10

We can then assign this vector to the neighbourhood object and use it to produce some maps. The default colours in spplot aren't particularly good, so we define a new palette from gray to red and set that.

colours2 = colorRampPalette(c("gray90", "red"))(20)
sp.theme(set = TRUE, regions = list(col = colours2))
neighbourhoods$TotalCrime = unlist(lapply(crimeL, length))
neighbourhoods$TotalCrimeRateK = 1000 * neighbourhoods$TotalCrime/neighbourhoods$Population
spplot(neighbourhoods, "TotalCrimeRateK", main = "Crimes/1000ppl/month from street data")
neighbourhoods@data[order(neighbourhoods$TotalCrimeRateK, decreasing = TRUE)[1:5], c("Name",
    "Population", "TotalCrimeRateK")]
##                               Name Population TotalCrimeRateK
## 24            Carlisle City Centre       2042          132.22
## 23        Botcherby And Durranhill       1334           48.73
## 0                   Barrow Central      13706           29.11
## 26                      St. Aidans       6813           27.89
## 31 Raffles / Newtown / Willowholme       5172           15.66

You'll notice that most of the county is gray - this is because the distribution is very skew. Try taking a log-transform and mapping that.

Area Data

The monthly area crime data is supplied in a CSV file. We can read this in, join it to the polygons by matching the ID and Neighbourhood columns from each object. The map from this data looks broadly similar to the data from the street-level file.

monthCrime = read.csv(Datafile("2012-03-cumbria-neighbourhood.csv"))
## [1] 46 15
monthCrime[1:10, c("Neighbourhood", "All.crime.and.ASB")]
##    Neighbourhood All.crime.and.ASB
## 1         GARS07                33
## 2         GARS06                99
## 3         GARS05                66
## 4         GARS04                88
## 5         GARS03               119
## 6         GARS02               125
## 7         GARS01               407
## 8         GARY01                 7
## 9         GARS09                85
## 10        GARS08                30
neighbourhoods$MonthlyCrime = monthCrime[match(neighbourhoods$ID, monthCrime$Neighbourhood),
neighbourhoods$MonthlyCrimeRate = 1000 * neighbourhoods$MonthlyCrime/neighbourhoods$Population
sp.theme(set = TRUE, regions = list(col = colours2))
spplot(neighbourhoods, "MonthlyCrimeRate", main = "Crimes/1000ppl/month from monthly data")

Land Use/Cover Data

The land use data comes from the EU Corine Land Cover data set. This uses a coordinate system designed for use across Europe. We'll transform our neighbourhoods to this system, and plot them, using two colours for contrast. The land use is a GeoTIFF file, and it comes with a text file describing the land use categories for creating legends.

The land use classification is hierarchical at three levels. We can read in the legend file and display the top level classes.

legend = read.csv(Datafile("legend.csv"), row.names = "GRID_CODE", = TRUE)
## [1] "Artificial surfaces"           "Agricultural areas"            "Forest and semi natural areas" "Wetlands"                     
## [5] "Water bodies"                  "NODATA"                        "UNCLASSIFIED"                 

Then each of those levels divides down into subcategories. For example, the agricultural areas divides into:

unique(legend$LABEL2[legend$LABEL1 == "Agricultural areas"])
## [1] "Arable land"                      "Permanent crops"                  "Pastures"                        
## [4] "Heterogeneous agricultural areas"

and then those classifications are further subdivided, for example:

unique(legend$LABEL3[legend$LABEL2 == "Heterogeneous agricultural areas"])
## [1] "Annual crops associated with permanent crops"                                          
## [2] "Complex cultivation patterns"                                                          
## [3] "Land principally occupied by agriculture, with significant areas of natural vegetation"
## [4] "Agro-forestry areas"                                                                   

We'll now read in the land use file itself and plot it with the raster package.

landUse = raster(Datafile("cumbriaLandUse.tif"))
neighEU = spTransform(neighbourhoods, CRS(projection(landUse)))
plot(neighEU, add = TRUE, border = "white", lwd = 3)
plot(neighEU, add = TRUE, border = "black", lwd = 1)

The colours correspond to the land use categories of LABEL3 above.

Extracting Values

The extract function is the workhorse for getting information out of rasters. We will get the land use code for each of our street crime points and add the information to the object. We use as.character in order to look up from the row names of the legend data frame.

luCrime = unlist(extract(landUse, spTransform(streetCrime, CRS(projection(landUse)))))
luCrime[luCrime == "0"] <- NA
streetCrime$landUse = luCrime
streetCrime$landUseCode3 = legend[as.character(luCrime), "LABEL3"]
streetCrime$landUseCode2 = legend[as.character(luCrime), "LABEL2"]
streetCrime$landUseCode1 = legend[as.character(luCrime), "LABEL1"]
plot(streetCrime, col = legend[streetCrime$landUse, "Colour"], pch = 19)
title("Street level crime by land use")

We can also use ggplot to produce maps by crime type, coloured by broad land use code

ggplot(streetCrime@data) + geom_point(aes(x = Easting, y = Northing, col = landUseCode1)) +
    coord_equal() + facet_wrap(~Crime.type)

Land Use Classification

Some neighbourhoods are clearly more rural than others. By considering the pixels that each polygon covers we can come up with a simple classification scheme.

Polygon-Raster Overlay

The extract function considers all polygons in a Spatial Polygons Data Frame and returns a list where the elements are the pixels in each polygon. We'll then process this list to get a table of land use categories for each neighbourhood.

The lutable function first removes any pixels in the 'missing' category, and then computes the fraction of the pixels in each category. We then make the list into a matrix with row and column names so we can then do a stacked barplot. Note there's not enough room for all the labels, so only a few are shown.

ext = extract(landUse, neighEU)

lutable = function(x, ...) {
    x = x[x != 0]
    tx = table(factor(x, levels = row.names(legend)))/length(x)

clm =, lapply(ext, lutable))
rownames(clm) = legend$LABEL3
colnames(clm) = neighEU$Name
barplot(clm, col = as.character(legend$Colour))

We can show the legend in a separate plot.

par(mar = c(0, 0, 0, 0))
plot(c(0, 1), c(0, 1), type = "n", axes = FALSE, xlab = "", ylab = "")
legend(0, 1, legend = substr(legend$LABEL3, 1, 50), fill = as.character(legend$Colour), ncol = 2,
    title = "Land Use")

This shows our regions are either mostly red (urban) or yellow/green (rural). For our simple classification scheme we'll look at the number of pixels in the region that are in the broad classes defined in LABEL2, and take the class with the most pixels. The classify function here takes as its argument a vector of land use class codes and returns the label of the most common value of LABEL2 for those codes. We'll then run that on the list of pixel values to get a vector classification that we'll add to the spatial polygons.

classify <- function(x) {
    ### x is vector of raster values
    ## remove missing values
    inR = x[x != 0]
    tableValues = sort(table(legend[as.character(inR), "LABEL2"]), decreasing = TRUE)

# example
classify(c(1, 1, 1, 22, 22, 43, 22, 22))
## [1] "Heterogeneous agricultural areas"
neighEU$Class = factor(unlist(lapply(ext, classify)))

Now we'll plot the neighbourhoods with the new classes. To get colours for each of the classes we'll average the colours for the levels in the class:

meanColour <- function(cs) {
    # convert colours to red-green-blue, and average components
    rgbc = apply(col2rgb(as.character(cs)), 1, mean)
    # convert r-g-b back to colours
    rgb(rgbc[1], rgbc[2], rgbc[3], max = 256)

# use tapply to compute mean colours within groups defined by LABEL2
colourTable = tapply(as.character(legend$Colour), as.character(legend$LABEL2), meanColour)
colours3 = colourTable[levels(neighEU$Class)]

sp.theme(set = TRUE, regions = list(col = colours3))
spplot(neighEU, "Class", main = "Classification system")

This map shows us a possible classification of urban and two types of rural neighbourhoods. Notice that the regions on the west coast, although they have some urban content, have more rural land area. The one industrial area in the south contains the shipyards at Barrow.

Crimes and Roads

In this section we'll look at crime locations and roads. We have a shapefile of roads with classification in lat-long coordinates which we'll convert to Ordnance Survey grid reference so we can plot the streetCrime points on it. We'll also define some colours and widths so the plot looks a bit like the OpenStreetMap maps.

mainroads = spTransform(readOGR(Datadir, "mainroads"), CRS("+init=epsg:27700"))
## OGR data source with driver: ESRI Shapefile 
## Source: "../Data", layer: "mainroads"
## with 3486 features and 3 fields
## Feature type: wkbLineString with 2 dimensions
##       motorway  motorway_link        primary   primary_link           road      secondary secondary_link       tertiary 
##            198             75            610              9            193            629              3            945 
##  tertiary_link          trunk     trunk_link 
##             11            757             56 
roadColours = c("#809CC0", "#809CC0", "#EC989B", "#EC989B", "gray", "#FDD5A4", "#FDD5A4",
    "gray", "gray", "#A9DAA9", "#A9DAA9")
roadWidths = c(4, 4, 3, 3, 1, 2, 2, 1, 1, 4)
plot(mainroads, col = roadColours[mainroads$type], lwd = roadWidths[mainroads$type])

Next we can compute the distance from each point to the nearest point on the major roads of the road network. First we extract the motorways, trunk roads, and primary roads, then work with that.

isMajor = mainroads$type %in% c("motorway", "motorway_link", "trunk", "trunk_link", "primary",
majorRoads = mainroads[isMajor, ]
streetCrime$dRoad = unlist(lapply(1:nrow(streetCrime), function(i) {
    gDistance(streetCrime[i, ], majorRoads)

We can use ggplot to produce some graphics showing how the road distance varies for different crimes. A box plot shows the outliers well.

x45 = opts(axis.text.x = theme_text(angle = -45, hjust = 0, vjust = 1))
ggplot(streetCrime@data, aes(x = Crime.type, y = dRoad)) + x45 + geom_boxplot()

What can we get from this? Perhaps only that shoplifting happens close to major roads, but thats because shops tend to be close to major roads.

Space-Time Regional Data

We also have a data file of the crime counts in neighbourhoods for a number of months (We would have more months of data but the neighbourhood boundaries changed, so we can only go back so far). The file has one for each month for each neighbourhood, and lists the number of crimes in each category. This form is ideal for feeding into ggplot to get a quick time-series plot.

allCrime = read.csv(Datafile("allCrime.csv"), = TRUE)
head(allCrime[, 1:5])
##     Month                Force Neighbourhood All.crime.and.ASB Burglary
## 1 2011-09 Cumbria Constabulary        GARS07                41        3
## 2 2011-09 Cumbria Constabulary        GARS06               141        3
## 3 2011-09 Cumbria Constabulary        GARS05                56        0
## 4 2011-09 Cumbria Constabulary        GARS04                82        2
## 5 2011-09 Cumbria Constabulary        GARS03               119        4
## 6 2011-09 Cumbria Constabulary        GARS02                90        2
# plot time series for each nbrhood - leave the legend out
ggplot(allCrime, aes(x = Month, y = Criminal.damage.and.Arson, group = Neighbourhood, colour = Neighbourhood)) +
    geom_line() + opts(legend.position = "none")

To plot maps of criminal damage and arson over time, we'll reformat the data into a matrix of neighbourhood by month and attach it to the neighbourhood map data object. We have to do some work with the months to get them into something that can be used in a formula for the spplot function. We'll just plot the first four months.

criminalDamage = dcast(allCrime, Neighbourhood ~ Month, value.var = "Criminal.damage.and.Arson")

months = names(criminalDamage)[-1]
names(criminalDamage)[-1] = format(as.Date(sprintf("%s-01", months)), "%b.%Y")

criminalDamage = criminalDamage[match(neighbourhoods$ID, criminalDamage$Neighbourhood), ]
neighbourhoods@data = cbind(neighbourhoods@data, criminalDamage)
names(neighbourhoods) <- make.names(names(neighbourhoods))
sp.theme(set = TRUE, regions = list(col = colours2, lwd = 2))
spplot(neighbourhoods, c("Sep.2011", "Oct.2011", "Nov.2011", "Dec.2011"), main = "Criminal Damage")

Static plots like that are fine for publications in dead-tree format, but for online work we can use technology to make animations.


The animation package provides functions for creating frame-based animations. All you need to do is write the code that produces a plot as each frame of the animation.

months = names(criminalDamage)[-1]
ani.options(verbose = FALSE, outdir = getwd())
    ani.options(interval = 0.5, verbose = FALSE)
    for (month in months) {
        sp.theme(set = TRUE, regions = list(col = colours2, lwd = 2))
        print(spplot(neighbourhoods, month, at = seq(0, 55, len = 20), main = month))
}, = "crime_ani", single.opts = "'utf8': false", autoplay = FALSE, interval = 0.5, imgdir = "crime_dir",
    htmlfile = "crimeAni.html", ani.height = 400, ani.width = 600, title = "Crime animation")
## HTML file created at: /home/rowlings/Work/R/UseR/UseR2012/Tutorial/WorkshopsHTML/crimeAni.html

Lessons Learnt

  • Creating spatial point data sets from scratch
  • Making maps with base graphics, ggplot, and spplot
  • Kernel Smoothing with package:sparr
  • Reading in vector data with readOGR
  • Reading in raster data with raster
  • Analysing raster data with extract
  • Transforming coordinates with spTransform
  • Making animations with the animation package