Snapping points to lines in R

Snapping points to lines is a common task when dealing with points that have to be located over lines for some specific reason.

One of the most common applications is to locate trajectory data on a road network. As such, it is a georeferencing problem that aims to get enhanced locations of trajectories getting rid of their deviations by using road networks as reference.

This problem, also called point-to-curve matching, is one of the simplest approaches for map-matching algorithms, which can also include topology, probability and fuzzy logic, among others. A complete review of map-matching algorithms is given by (Quddus et al., 2007).

In terms of free and open source software for Geomatics (FOSS4G), libraries such as Java Topology Suite (JTS) and its port to C called GEOS provide classes and functions that can be used to snap points to lines. Some examples on how to use them are provided by P. Ramsey (using SQL and PostGIS functions), Marcello Benigno (using GRASS) and GeoTools User Guide.

However, the R package rgeos, which is the R interface to GEOS, still doesn’t provide those functions and one has to build his own ones. In the rest of this post I will show you my solution, which snaps the trajectory points to the nearest points lying on the road network.

First, import some road data into R (you can download the shapefile from this link, source: OpenStreetMap), don’t forget to adjust the path to the data:

1
2
3

4
5
require("rgdal")

crs = CRS("+proj=tmerc +lat_0=0 +lon_0=9 +k=1 +x_0=3500000 +y_0=0 +ellps=bessel +datum=potsdam +units=m +no_defs")
l = readOGR(dsn="/docs/geodata/munster/osm/", layer="subset_street_graph")
proj4string(l) = crs

Import trajectory data from http://ifgi.uni-muenster.de/~epebe_01/pm10/All4_1a.txt and ensure that both (road network and trajectory) have the same projected reference system (for distance calculation):

 6
 7

 8
 9
10
11

12
x = read.csv("http://ifgi.uni-muenster.de/~epebe_01/pm10/All4_1a.txt")
x$t = as.POSIXct(strptime(do.call(paste, x[c("Date", "Time")]), "%Y/%m/%d %H:%M:%S"))
m = matrix(c(x[,2],x[,1]),ncol=2)
df = data.frame(x$t, x[,5],x[,6])

p = SpatialPointsDataFrame(coords=m,data=df,proj4string=CRS("+proj=longlat +datum=WGS84"))
p = spTransform(p,crs)

Define the functions nearestPointOnSegment, nearestPointOnLine, this way:

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
nearestPointOnSegment = function(s, p){
    # Adapted from http://pastebin.com/n9rUuGRh
    ap = c(p[1] - s[1,1], p[2] - s[1,2])
    ab = c(s[2,1] - s[1,1], s[2,2] - s[1,2])
    t = sum(ap*ab) / sum(ab*ab)
    t = ifelse(t<0,0,ifelse(t>1,1,t))
    x = s[1,1] + ab[1] * t 
    y = s[1,2] + ab[2] * t
    c(x, y, (x-p[1])^2 + (y-p[2])^2)  # Return nearest point and distance
}

nearestPointOnLine = function(coordsLine, coordsPoints){
    nearest_points = vapply(2:nrow(coordsLine), 
        function(x) 
            nearestPointOnSegment(coordsLine[(x-1):x,], coordsPoints),
                FUN.VALUE=c(0,0,0))

    # Return coordinates of the nearest point in this line  
    nearest_points[1:2, which.min(nearest_points[3,])]  
}

Finally, define the main function called snapPointsToLines:

33
34
35
36
37
38
39
40
41
42
43
44
45
46

47
48
49
50
51
52
53
54
55
56
57
58

59
60
61
62
63
64
65
66
67
snapPointsToLines <- function( points, lines, maxDist=NA, withAttrs=TRUE) {

    require("rgeos")

    if (!is.na(maxDist)) {
        w = gWithinDistance(points, lines, dist=maxDist, byid=TRUE)
        validPoints = apply(w,2,any)
        validLines = apply(w,1,any)
        points = points[validPoints,]
        lines =  lines[validLines,]
    }

    d = gDistance(points, lines, byid=TRUE) 
    nearest_line_index = apply(d, 2, which.min) # Position of each nearest line in lines object 

    coordsLines = coordinates(lines)  
    coordsPoints = coordinates(points)  

    # Get coordinates of nearest points lying on nearest lines
    mNewCoords = vapply(1:length(points), 
        function(x) 
            nearestPointOnLine(coordsLines[[nearest_line_index[x]]][[1]], 
                coordsPoints[x,]), FUN.VALUE=c(0,0))

    # Recover lines' Ids (Ids and index differ if maxDist is given)
    if (!is.na(maxDist)) nearest_line_id = as.numeric(rownames(d)[nearest_line_index])+1 
    else nearest_line_id = nearest_line_index 

    # Create data frame and sp points
    if (withAttrs) df = cbind(points@data, nearest_line_id) 
    else df = data.frame(nearest_line_id, row.names=names(nearest_line_index))

    SpatialPointsDataFrame(coords=t(mNewCoords), data=df, 
        proj4string=CRS(proj4string(points)))
}

Now you can use the snapPointsToLines function with the whole datasets:

68
snappedPoints = snapPointsToLines(p, l)

Or you can establish a maximum distance (in this case in meters) to avoid snapping points that are farther apart:

69
snappedPoints = snapPointsToLines(p, l, 100)

You can also get rid of the original attributes and just get the coordinates of the snapped points and the id of their nearest line:

70
snappedPoints = snapPointsToLines(p, l, 100, FALSE) 

This is what I get executing the line 69 and calling the function plot (Red crosses are original trajectory points and green ones are snapped points):

71
72
73
plot(p,col="red")
plot(snappedPoints,add=TRUE,col="green")
plot(l,add=TRUE)

A more detailed view (Red crosses are original trajectory points and blue ones are snapped points):

A basic way to improve the result is to pass a filtered road network, e.g., only main roads. Of course, it depends on the trajectory. This is the result of snapping the points to a filtered road network (using 100 meters as maximum distance):

As I’m still unfamiliar with R data structures, this solution could be improved in terms of performance. With these test datasets this code is almost twice slower than a PostGIS implementation.

REFERENCES

  • M. Quddus, W. Ochieng and R. Noland. Current map-matching algorithms for transport applications : State-of-the art and future research directions. In Elsevier, Kidlington, ROYAUME-UNI, editor, Transportation research. Part C, Emerging technologies, vol. 15, no5, pages 312-328, 2007.

UPDATE (2012.06.30)

I’m pleased to announce the functions snapPointsToLines, nearestPointOnSegment and nearestPointOnLine have been included into the R package called maptools (v. 0.8-16).

The functions can be called directly after importing maptools into the R session, this way:

# install.packages("maptools") # For installing the package
require("maptools")

# Create both lines and points (see lines 1-5 and 6-12 in the post)

# Finally, call the main function
snappedPoints = snapPointsToLines(p, l, 100)