|Sorting a database based on distance|
| 7:54 pm on May 4, 2010 (gmt 0)|
I have a database named coordinates, with several different locations, and their corresponding longitude and latitude (both as a float(10,6) value).
What I would like to do is to take a user's known location ($longitude, $latitude), and then search this database to find the closet matches.
I can do this now by calculating the distance:
$query=mysql_query("SELECT *, (SQRT((69.1*(latitude-$latitude))*(69.1*(latitude-$latitude))+(69.1*(longitude-$longitude)*cos($latitude/57.3))*(69.1*(longitude-$longitude)*cos($latitude/57.3)))*5280) AS distance FROM coordinates ORDER BY distance ASC");
That would output the database, with a new column called distance (in feet). The top of the list would be the closest, and the bottom of the list would be the furthest away.
Right now my database is small. My question is would this be a horrific database query if I had a larger database? And if it is, how can I tame this query? Or, is this something that the computer can easily churn through, and I don't need to worry about it.
|brotherhood of LAN|
| 8:37 pm on May 4, 2010 (gmt 0)|
A larger database would be fine. The geonames database (list of world locations) contains over 7 million records and can return queries very quickly. You can download a copy and try for yourself. Just make sure you have an index on your latitude and longitude columns.
To optimize the query a little, is there a distance limit for the rows you want returned? If so, do something like 'WHERE latitude BETWEEN ($latitude - 1) AND ($latitude + 1)' for both X and Y coordinates. This will eliminate the need to sort the whole table before outputting results.
| 8:42 pm on May 4, 2010 (gmt 0)|
What I do is first pare down the entire table to smaller group of "nearby" locations with something like the following:
WHERE Latitude BETWEEN $latitude-alpha AND $latitude+alpha AND Longitude BETWEEN $longitude-alpha AND $longitude+alpha
Then compute distances on just that subset. (alpha will depend on the "radius" you want yo use for your subset.
| 2:40 am on May 6, 2010 (gmt 0)|
Thank you for your help and reassurance. I set the latitude and longitude columns as indexes, and I started to create latitude and longitude limits for the database. (I also updated my distance formula to one more complex but more precise.)
$query=mysql_query("SELECT *, (big_long_math_formula) AS distance FROM coordinates WHERE latitude BETWEEN ".$searchBox["south"]." AND ".$searchBox["north"]." AND longitude BETWEEN ".$searchBox["west"]." AND ".$searchBox["east"]." ORDER BY distance ASC");
By my use of the WHERE statement, is this what is meant by computing the distances on just the subset?
And now that I limit my results to within this searchBox, I realize that half the box will be unaccounted for if the longitude happens on or near -180 or 180. I'll have to rewrite the statement to account for this. If I detect my search box overlapping this line, would I be able to use something like:
WHERE latitude BETWEEN a AND b AND longitude (BETWEEN x AND 180 OR -180 AND y)
(I'm thinking this would be a non-issue for latitudes near 90 or -90, since they don't wrap around, right?)
| 12:49 pm on May 6, 2010 (gmt 0)|
You would need to use:
WHERE latitude BETWEEN a AND b AND (longitude BETWEEN x AND 180 OR longitude BETWEEN -180 AND y)