Forum Moderators: open
citylistings
id of person
city they're listed in
state they're listed in
appraisers
id of person
name
appraiserinfo
id of person
email, address, phone, etc.
localareas
city
latitude/longitude
I need a query that does the following:
Here's the query I came up with:
SELECT citylistings.zipid, appraisers.name, appraiserinfo.company, appraiserinfo.address, appraiserinfo.city, appraiserinfo.state, appraiserinfo.zip, appraiserinfo.phone, appraiserinfo.fax, acos(sin(localareas.latitude*0.01745329252)*sin(40.4192*0.01745329252) + cos(localareas.latitude*0.01745329252)*cos(40.4192*0.01745329252)*cos(localareas.longitude*0.01745329252 - -111.565*0.01745329252))*3958.75586 AS distance
FROM citylistings, appraisers, appraiserinfo, localareas
WHERE citylistings.city = 'American Fork' AND citylistings.st = 'UT' AND appraisers.zipid = citylistings.zipid AND appraiserinfo.zipid = citylistings.zipid AND localareas.zipcode = appraiserinfo.zip AND localareas.ordinal = 1
ORDER BY appraiserinfo.linkhighlighted DESC, distance;
The query returns the right results, but it takes about three minutes! Here's another query that the PEAR DB_DataObject library sort of "generated" for me:
SELECT *, citylistings.id as listing_id , citylistings.zipid as listing_zipid , citylistings.st as listing_st , citylistings.city as listing_city , citylistings.timelisted as listing_timelisted
, acos(sin(localareas.latitude*0.01745329252)*sin(40.4192*0.01745329252) + cos(localareas.latitude*0.01745329252)*cos(40.4192*0.01745329252)*cos(localareas.longitude*0.01745329252 - -111.565*0.01745329252))*3958.75586 AS distance
FROM appraisers
INNER JOIN zipappraisers.citylistings ON zipappraisers.citylistings.zipid=appraisers.zipid
INNER JOIN zipappraisers.appraiserinfo ON zipappraisers.appraiserinfo.zipid=appraisers.zipid
INNER JOIN zipappraisers.localareas ON zipappraisers.localareas.zipcode=appraiserinfo.zip
WHERE ( localareas.ordinal = 1 ) AND ( ( ( citylistings.st = 'UT' ) AND ( citylistings.city = 'American Fork' )) ) AND ( ( ( appraiserinfo.licensestate = 'UT' ) ) )
ORDER BY appraiserinfo.linkhighlighted DESC, distance
That one takes four seconds which is a great improvement, but it shouldn't take that long. How can I write my query so it does things in the most efficient order?
Presumably, you are calling this from a programming language. Do the math in the programming language!
Why do you have seperate appraisers and apprasierinfo tables? Can an appraiser have multiple appraiserinfo entries? If not, combine the tables.
You do have indices on the appropriate attributes, right?
Why are you doing that crazy math in your SQL?
You do have indices on the appropriate attributes, right?
How much data is in the tables? Are you disk swapping?
I'm not sure why the first version took so much longer, as it's the same query with old-style joins. Guess the optimizer didn't like it.
We don't know what database system is being used here, but MySQL isn't known for having a great optimizer. I'd still lift that calculation from the SQL.
It just strikes me that it's pretty unconventional to do this much math in a SQL statement. So, it's likely that it's a use case that hasn't gotten much attention from the authors of the database system. It's a good idea to just avoid unconventional use cases - to do so is just asking for trouble. Do it when you have to, but avoid it when you can.
Either first create a temporary table, or read the result into the program without first sorting by distance. Do your distance calculation in the program, either storing it in the temporary result table or sorting in-memory. My assumption is that this is a small result set, so either way will be practical.
How many rows do you have in each table?
I'll ask the obvious question- do you have good indexes on your tables? That by itelf will usually speed things drastically.
Also, remember that char/varchar searches are a lot slower than integer searches. So it would also speed things up if you could add a CityID field to citylistings and localareas and search/join on that instead of city.
Other thoughts:
Rethink your table structure
Can you ask your customers for a zip code to plop into the
citylistingstable? If so, add zip code to the lat/long file and link on that instead. A 5-digit numeric entry is going to be a lot less bytes than a 30-character city name, especially if it is of type VARYING CHARACTER (VARCHAR).
Why two tables for appraiser
information?
Seems these two tables could and should be combined. I don't see where the data would require any type of separation and therefore multiple-table management processes? One less table to JOIN as well.
Just some food for thought ...
What use is the birds eye view? This is where mathematical calculation lets you down, its just not sensible :)
In the UK you can buy the pivot table with the real info (distance by road) its based on postcodes and sectors and you just import it into your database, there must be something similar for the US?
These are the slow bits:
citylistings.st = 'UT'
citylistings.city = 'American Fork' appraiserinfo.licensestate = 'UT'
Because you are restricting by text you query does a a table scan of a big table, 3 times! , you really need to change it to this:
citylistings.st_id =1
citylistings.city_id =2
appraiserinfo.licensestate_id =1
By adding City and state lookup tables to the model. Now the lookups are tiny and properly indexed.
Can you create / compile UDF for the calculation ? It might speed it up a bit?
Because you are restricting by text you query does a a table scan of a big table, 3 times!
That's not true. As long as "city" is indexed, there's no reason for a table scan. Granted, it would be better to use an integer code value, but text fields benefit from indexing as well as integer fields. The only time an exhaustive scan is necessary is if you do a search that isn't anchored to the first character. (e.g. a wild-card search that doesn't specify the first character, a substring search not starting with the first character, etc.)