In this talk I'll show how the spatial ecosystem of R has grown, and how it relates to all sorts of other mapping and GIS software.
CRAN is the official R archive. There's lots on there.
The SpatialTaskView is essential reading. It describes a lot of the spatial functionality and lists many of the spatial packages.
Here's a list of the packages mentioned in the task view. Can we make sense of this?
Let's start with the fundamental sp package.
Here's the packages that directly depend on sp.
Here's the packages that directly depend on the packages that directly depend on sp. You might notice the layout algorithm popping out a few such as splancs, geoR and spdep. These are packages that have been around a while and have built up a user community.
Another final dependency step. On the right we can see a cluster of earthquake-related packages. Most of these have been written by one person! He has used a sensible approach of putting basic functionality in one package (that depends on something that depends on something that depends on sp) and then written other packages that depend on his basic functionality.
This is the limit of dependency – there are no packages that depend on the purple packages (that aren't already there).
Looking at these packages, there's a possible classification. Some packages deal with basic geography and geometry, some are general statistics, some have application areas in mind, and some are purely data.
In total there are over 3000 packages on CRAN.
How did scientists build their programs in the old days?
Without the internet, we worked a lot on our own.
We sat at terminals like this, and typed.
If we wanted graphics, the line printer would do this.
Maps were possible with some clever print statements.
High resolution graphics (1024x780 pixels) were available via a graphics terminal like this. There might be one or two in the university computing rooms.
About as useful as an etch-a-sketch.
But computers get better...
But I can take my old C and Fortran code and link them into R so they can be called from R functions. I don't have to rewrite everything into R. Here's a simple C function that returns 3.14, compiled, loaded into R, and called. The R function returns the value of the C function.
It is possible to take entire libraries of C code and wrap them in R functions. Many of these C libraries were originally only designed to be called from other C code, but R's loading mechanism lets us load them into R (basically because R itself is mostly written in C).
Many of the functions in the first part of this presentation used wrapped C libraries.
So where do these projects come from? OSGeo is an organisation which supports the development of Open Source geospatial software. They provide servers for developers to work together on projects as well as mailing lists, bug tracking etc.
I'll now go through a few of the projects and talk about how I use them and how they work with R.
Qgis is a desktop GIS. You can load, manipulate, map, and save geospatial data.
Sometimes its better than R for interactive work, so I use it.
Because R and Qgis both use the GDAL library for writing spatial data, you can save vector and raster data in a common format for data exchange.
Carson Farmer has written a plugin for Qgis that gives you an R IDE with editing and embedded graphs in Qgis. It also supplies buttons for quick transfer of spatial data from R to Qgis and back. This is an extreme example of integrating R with Qgis.
Qgis plugins can be written to hide complex R functionality from the user, but the process can be a bit complex since it involves going from Qgis to Python code to R and back again.
Using that mechanism I wrote this plugin, 'Arlat', which fits a complex model to some disease data and returns a grid of predictions. The plugin adds an extra menu to Qgis for the user to set various parameters. The complexity of fitting a bivariate binomial spatial model with covariates is not something the users want to know about.
PostGIS is a spatial database. It adds spatial capabilities to the PostgreSQL database system.
You can use it to create a distributed spatial data system, with multiple users from multiple computers.
Its great for handling large data sets and for doing spatial queries (overlays, routing, buffering) that might make R fall over.
Here's a quick intro to what it can do. In standard SQL you select things from tables with queries. In PostGIS you have a spatial SQL, which lets you select things like buffers or overlays from your data.
You can do this from R using the standard database tools. This is a non-spatial query from a spatial database. One of the returned columns is the geometry of each feature, which can be converted into an sp object.
Alternatively you can read an entire spatial table into an sp object using readOGR from the rgdal package.
Again, the use of GDAL-compatible data access methods means that R and Qgis can both interact with a PostGIS spatial database.
There's another way to integrate R with PostGIS, and that is to add R functions to the database server. This hides R functionality from the end user.
As a quick non-spatial example, this is how to define a new function to compute the median of a vector in PostgreSQL. It calls an R function to do the work.
Your SQL experts don't need to know R – they just call median in the same way as they would call mean.
My webmaps experimental package (on R-forge) can help in the creation of openlayers maps from sp-class objects. We convert everything to lat-long coordinates and then create an html page with the osmMap function.
Opening that HTML file gives us a fully working interactive web map. Clicking the features pops up the information in the attributes.
I'm hoping to create a dedicated package for OpenStreetMap pages soon.
The webmaps package can also be used to grab map image tiles from web map tile servers, such as OpenStreetMap, and turn them into images. It also gets the coordinates so you can overlay other spatial data.
There are other packages for getting map tiles from web map services. The OpenStreetMap package uses rJava to get them, so it has a fairly heavy requirement of a Java Virtual Machine. Once it works it works well, and can get from a wide range of map servers.
The ggmap package uses ggplot2 to draw its maps, and can also get maps from a wide range of providers. I have written a short function to convert these into raster objects for use outside ggplot functions.
These ggmap objects can be used as underlays to ggplot graphics, such as this plot of crime coloured by category.
You can also do choropleth maps with geom_polygon or geom_map. The problem I had with geom_map was due to not having all my packages upgraded. Always update everything!
Sadly ggplot ignores everything done with sp and spatial data and needs the data in its own format.
OpenStreetMap tiles are created from the original digitised vector data from OpenStreetMap users. Here is a map of some of that data. OSM provide tools for editing this vector data through the web, and its the main way that OSM data is added.
There is an R package called osmar that reads this raw vector data.
Osmar can read this data and convert it into sp objects for plotting. It can also convert them into igraph objects. These things describe the connections between junctions so you can do things like run minimum distance routing algorithms on them.
About this point in every Open-source GIS presentation, someone mentions Google Earth.
It's free as in beer but not free as in speech.
We can work with it, because the file format it uses is an open standard, controlled by the OGC.
So I can use wroteOGR from the rgdal package to create a KML file of points, for example.
I can also use the KML function from raster to create image overlays.
If I have to do anything more complex, I can create KML to order. I often use the brew package to create a template that is filled in by the data. Here is an extract showing how a loop in a brew package produces a Placemark for each point in a dataframe.
All these things are part of a whole spatial data infrastructure, from web and desktop down to database servers and filestores.
R can be found in a few places in this ecosystem.
But there's a lot of competition from other languages! You might want to look at python, C, C++ and Java if you want to understand it all.
You also might want to join the OSGeo-UK group, and come along to the FOSS4G 2013 Conference in Nottingham, UK, in 2013!
So now you've got a good knowledge of the breadth of open-source tools for spatial data. Go out there and make stuff!