|Why does this query take 80 seconds?|
I have a table with 500,000 rows in it.
- 20 fields
- indexes set up an all the fields in the where
Here's the query
SELECT * FROM table
WHERE (startDate BETWEEN '2012-05-10' AND '2012-05-16')
AND (productCode = 'a1' OR productCode = 'a2' OR productCode = 'a3' OR productCode = 'a4' OR productCode = 'a5' OR productCode = 'a6' OR productCode = 'a7' OR productCode = 'a8' OR productCode = 'a9' OR productCode = 'a10' OR productCode = 'a11' OR productCode = 'a12' OR productCode = 'a13' OR productCode = 'a14') AND value >= '2' AND startDate != '2012-05-13' AND productName != '' AND time = '3' AND hasExtras = '1';
What do you see when you add an "EXPLAIN [dev.mysql.com]" before that SELECT statement?
One of the best ways I've found to speed up a query is creating indexes, which unfortunately you've already done.
I don't know if the "IN" operator will speed things up (although according to the manual, it could), but it does make the query shorter.
Also not sure whether >= and <= are any faster than BETWEEN, but it might be worth an experiment.
Other experiments could be worthwhile, such as moving to the top of the query conditions that will exclude the most records, or conditions that are fastest to evaluate.
Here is a possible rewrite as an example:
SELECT * FROM table
productName != ''
AND value >= '2'
AND time = '3'
AND hasExtras = '1'
AND startDate >= '2012-05-10'
AND startDate <= '2012-05-16'
AND startDate != '2012-05-13'
AND productCode IN ('a1','a2','a3','a4','a5','a6','a7','a8','a9','a10','a11','a12','a13','a14')
A few questions:
1) Do you really need to select *, or can you get away with selecting fewer fields?
2) Is StartDate a VARCHAR field or DATETIME field?
3) Have you tried rebuilding/reorganizing/refreshing (depending on the terminology of your DBMS) your indexes?
4) What's your DB server configuration like? Do you have sufficient memory and CPU? Are there other users/processes on the DB server that could be slowing things down for you?
If this table gets many insertions and deletions, you might also try defragging and compacting it (purging deleted rows) with:
OPTIMIZE TABLE tablename;
I've been assuming MySQL, and only realized from LifeinAsia's comments that that's not necessarily the case. Not sure which of my suggestions might be MySQL-only.
Oh, also noticed the value='2' part. Are all the values in that field numeric? If so, changing the field from a VARCHAR should speed up things a bit.
Thank you very much for the replies.
I think a penny might have just dropped.
Using explain, it appears that the ORDER in which I put things in a query has an effect on how many rows are used in a query. If put say a product id first and then the date range, it makes a big impact on the number of rows queried.
Is that correct? If so then I can drastically speed up my queries by re-writing them.
|it appears that the ORDER in which I put things in a query has an effect on how many rows are used in a query |
That is correct. In general, you want your more-restrictive, index-using conditions to appear first in your WHERE clause.
That way, whichever portions of the select end up not taking advantage of an index are working on a smaller subset of rows.
>> That is correct.
Is this true of other enterprise dbms? I would think the query optimizer would pick up on this, doesn't make sense that it wouldn't.
>> One of the best ways I've found to speed up a query is creating indexes
This will work in most cases if the index is properly set up. However, in cases where a large portion of the data is going to be returned from a query, indexing may actually be detrimental to performance.
|I would think the query optimizer would pick up on this, doesn't make sense that it wouldn't. |
Most of the time it does. :)
If you are doing just a few simple
field_a='x' AND field_b='y' AND field_c='z' then you won't notice any difference no matter what order you put them in. I think that's where the query optimizer automatically "fixes" the order.
However, if you are doing very complex queries then the order of the WHERE can make a difference. (I'm talking about nested ORs, subselects, that sort of thing.)