Welcome to WebmasterWorld Guest from 220.127.116.11
If you don't need completely up to date data then the geography pages like [census.gov...] will help.
Getting good, up to date data can be expensive from a purchasing and maintenance point of view, if you can live with some changes then I'd stick with the free data.
As for how often data changes, I know that in the UK there are changes to postcodes almost constantly. New housing developments are the obvious ones but there are re-classifications too. It's a nightmare.
voltrader, thats what I was afraid of.
Thank Rusty, that one seems like they do some decent updates and are cheap. As long as they are inexpensive I can afford to test the data. I just didn't want to pay $100-400 for data I can't test.
Now the hard part. If the info is in constant flux, how is everyone updating their records from a DB. If it was constant, each row of the table would have an id and each biz would be assigned one of those id's. But since this information will change, and the changed information will come in another bulk format, thats not really possible. I was thinking to have each biz have their location info in their own seperate table, with the zip/city/etc data in a table of it's own. The zip code data will basically only be used for distance searches. When I go to update the zip/city/etc data, write a small program to do a compare between the new data and old data and flag any rows that have changed or are new. From there it will compare and flag any biz that contains the same location info so they can be manually reviewed and updated if needed.
Anyone using any thing more efficient they could share?
As for how to cope with change, you are in luck. If you are using zipcode data for locations then a re-classification does not change the physical location of the business significantly, you just need to keep on top of the zips that you use to lookup user location. You should still take care with your DB though, always try to keep it up to date (and backed up).
I'm not experienced enough with zipcodes vs postcodes to give you the best way to handle changes in zipcodes but I'm sure there will be someone here who can.
a "local search" service implies you already have the locations but need a lat/long associated with them in order to generate a geospatial search.
See this for more info:
It is a google maps thread about someone asking how to do a geospatial search.
Geocoder.us is one of the free sites available to do free geocoding.
The more ideal solution would be to have people be able to search from a certain radius of a town since that is how people think about location, but that gets more complicated. It's on the list though.
- we did buy a service which included periodic updates, but the niche we were dealing with turned out to include many businesses in rural areas which tended to be undercovered. If you're starting with major metros, you probably won't have to worry about missing data.
- if I had to do it again, I would cross-ref the latest census data with the paid to try to plug holes.
- the big circle calc can be S - L - O - W if you're using a script to search thorugh the matrix of possibilities. Find ways to optimize it. A compiled plug-in?
Just thinking out loud...
If you don't use perl, you can get just the text file with the data here:
[tjmather.com...] (url can change when postal codes are updated)
An easier option would be to simply have a postal code associated with each location and make it a simple "keyword"-based search (if user select "12111," show them results in "12111" only). However, if you add geospatial search, zipcode boundaries won't matter and it can give you the "true" closest locations. Keeping the geospatial search allows for building in the town-based searches you want to add later...towns will just have lat/longs associated with them and the search would happen the exact same way.
If you want some help building this, ping me with details.
Also on the slow aspect, I tried a few different distance calc functions and most where slow,but I found one here [micahcarrick.com...] that can pump out all zips within 100 miles of Dallas in half a second and all zips within 1000 miles in 2 secs.
I read some places that the speed can be increased even more by having the database itself do the calcs, but I haven't gotten that deep into MySQL to be able to do that yet.
The information I have gotten will also allow me to run the calcs by city name, but that needs to wait until my buisness database is properly populated because there will need to be some decent coding to catch mispellings of city names and the ability to suggest the correct ones. I know I can't spell so I can't expect my visitors to :D
If you take the UK as an example, it's not big enough for the curvature of the earth to make hugely significant differences to the calculations so I use the grid coordinate system adopted by many others. This means you can pre-populate x and y coords for all points of interest and then run a very simple database query.
Here is the query I use on a live site. It is able to sort and show the first 25 results from a 600 branch network by distance from the visitor in 0.02 seconds! (it is also matching only those branches that cater for a particular trade, so it's possible to use the $trade variable as a filter to suit your needs)
$sqlquery = "SELECT * FROM yourtable WHERE $trade='yes' ORDER BY (sqrt((Xcoord - $xcust)*(Xcoord - $xcust) + (Ycoord - $ycust)*(Ycoord - $ycust))) LIMIT 25";
The problem with large countries like the US is that the curvature of the earth means that grid coordinates tend not to work well. You can cheat if you are only looking at local results (maybe 100 miles) by splitting up the data into 400 mile segments which overlap by 200 miles. This allows you to create accurate grid coordinates in each area and use the above query. There would have to be a simple calculation to see which 400 mile square is the best to use for any query but run-time calculations would be reduced considerably.
There would be additional work in creating and maintaining the data but you could automate that. If you find your local calculations are loading your server too much because of great circle issues this seems to be a good solution.