About this post

ABOUT: This entry was posted March 17, 2008 at 5:38 p.m. It is 1099 words long, which, in case you're curious, translates to about 31 inches. There are currently 0 comments on this post. Click here to add your own.

SUMMARY: In which I describe how to use MySQL's spatial functions and Python to do point-in-polygon detection.

TAGS: Python | Databases | GIS


Spread the love


Recent posts

Sunday, September 28th, 2008
In which I explain the advantages of using a popular fraud-detection tool in reporting.

Monday, March 17th, 2008
In which I describe how to use MySQL's spatial functions and Python to do point-in-polygon detection.

Sunday, July 1st, 2007
Some tidbits I've collected over the last month and a half.

Saturday, May 5th, 2007
Backing up your databases is easy with S3, boto and mysqldump.

Monday, April 30th, 2007
In which I plead with you to learn from our deployment mistakes.

Point-in-poly using MySQL and Python

Posted Monday, March 17th, 2008 at 5:38 p.m.

A couple weeks before the Texas primaries, the Chronicle quietly launched a cool little app that allows readers in our eight-county area to see which political races they can vote in, simply by typing in their address. Some local governments around here offer similar features, but none as lightweight and seamless as what we put together.

I bring this up because the way we implemented this project answers one question the CAR community has chattered about a lot lately: namely, how to run effective point-within-polygon queries using MySQL’s spatial extensions, which don’t yet have the same muscle as competitor PostGIS.

First a little backstory: Our Web guys came to me a last month asking if we could request the data to replicate this system for a voter’s guide project they were working on. I said that system was awful and that we could do better. So I dummied up a proof-of-concept model using Google’s client-side geocoder, MySQL’s spatial functions, some state redistricting data, and a little bit of raw Python for polygon detection.

I shipped the data and code to the online guys, who (I think) used it to help develop the live app, which is backed by a civic database that our neighborhood news and online folks put together.

A couple days later, some chatter started on NICAR-L about how to solve a similar problem – testing whether a point falls within a polygon – using MySQL. This is how I went about solving it.

Spatial MySQL

Testing point/polygon membership in MySQL requires two steps. The first is to use a little spatial SQL to narrow down the list of polygons a point might fall within. There’s no sense testing a point against every polygon in a dataset – that would be horribly slow. Fortunately, using bounding box queries, MySQL’s spatial functions can quickly produce a list of potential matches to work with later.

Spatial SQL to the rescue. As though standard SQL wasn’t bad enough, MySQL has introduced a set of spatial functions for querying and manipulating geospatial data. You can read all about them if you want, but for now we’re only concerned with a function known as MBRContains. MySQL’s spatial functions are somewhat limited in that they don’t actually test for polygon membership. Instead, they test whether your point falls within a minimum bounding rectangle, or MBR, derived from your polygon’s extremes. Something like this:

Notice how the MBR in this picture contains the polygon itself, plus some extra space around it. Because they don’t precisely follow polygon boundaries, it’s possible for more than one MBR to overlap. And because of that overlap, the point we’re testing may not fall within just one MBR. It could fit into the two, three or more, depending on how your map fits together.

That’s fine. If we know our point falls somewhere near Texas, for example, we might want to test it against the boundaries of Oklahoma and Arkansas, but not Alaska and Hawaii. For now, all we need is that “short list”, and that’s what MBRContains will give us:

If all goes well, the cursor at the end of this code block should contain a few candidate polygons that might possibly contain the input point. Now the job has gotten simpler: We just need to see which of those few polygons is the winner.

Polygon testing using algorithms

The new problem is a computer science classic: testing whether a point falls within a polygon. The solution is pretty simple. Picture a sheet of paper with a square drawn at the center. Now pick any point inside that square and draw a straight line from that point, through the right-hand side of the square, and all the way to the edge of the page. Notice how many times it crosses the boundary of the square (one). That number will always be odd.

Now pick a point anywhere outside the square and do the same. Notice how many times this line crosses the boundary of the square (two if the point is immediately to the left of the square; zero if it’s above, below, or to the right). That number is always either even or zero.

This picture sums it up a bit better:

Knowing what we know about those intersections, all we need to do is write a function that takes a point and a polygon as input, draws that line, counts the number of intersections. An odd number means the point is inside the square; even or zero means it’s outside. The function can return True or False. Simple as that. Here how I implemented that function in Python:

The function takes three arguments: a latitude and longitude point, and a polygon in a format known as well-known text, or WKT. If you have a polygon field in your MySQL table, retrieving a WKT representation of that polygon is as simple as using the AsText function.

If you loop through each polygon in your initial MBR-based resultset and run it through this function, you can weed out the points that don't fall within the polygon. The one that returns true is your winner.

Wrapping up

So, to recap:

Step 1.) Narrow down your universe of potential polygons using spatial SQL.

Step 2.) Run comparisons on the subset using the point-in-poly algorithm.

Not too bad when you break it down.

Now a question for any ubergeeks out there: What is the most efficient way to select from a dataset all points that fall within the boundaries of a polygon? Testing every point individually seems cumbersome and inefficient, so I figure there has to be a better way. Suggestions anyone?

Post your comment

Optional