Forum Moderators: open

Message Too Old, No Replies

MySQL indexing

Some strange things...

         

zoltan

7:09 pm on Dec 13, 2005 (gmt 0)

10+ Year Member



I have a big table with 100,000+ records.
3 fields (int(1)) are indexed + there is a primary key.
field1
field2
field3

When I do a:
"select count(primary) from table where field1 = 1".
The query is extremely fast: 0.06 sec.
When I do a:
"select count(primary) from table where field2 = 1".
The query is extremely fast: 0.04 sec.
When I do a:
"select count(primary) from table where field3 = 1".
The query is extremely fast: 0.00 sec.

So far, everything is OK.

Now, another query:
"select count(primary) from table where field1 = 1 or field2 = 1 or field3 = 1".
This query is slow: 1.96 sec

Why is this query that slow? Any comments?

physics

11:56 pm on Dec 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nice find zoltan. My guess is that mysql isn't using the indexes for the count when you query with the three different fields and the or... just a guess though! Anyone actually have a clue on this?

[edited by: physics at 12:06 am (utc) on Dec. 14, 2005]

physics

12:06 am on Dec 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just found this related info:

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index finds fewer rows and using that index to fetch the rows.


[dev.mysql.com...]

There's more on that page on this topic...

zoltan

7:01 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Actually, separate indexes exist on field1, field2 and field3...
So, it is still a little strange...

zCat

9:04 am on Dec 14, 2005 (gmt 0)

10+ Year Member



I seem to recall having been bitten by a similar problem in MySQL once, The problem with MySQL is that it is blindingly fast for simple things, but once you get above a certain level of complexity, it starts to get tricky...

What does "EXPLAIN select count(primary) from table where field1 = 1 or field2 = 1 or field3 = 1" say?

zoltan

10:52 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Here is the response for the explain:

+-----------+------+----------------------+------+---------+------+--------+------------+
¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+-----------+------+----------------------+------+---------+------+--------+------------+
¦ tablename ¦ ALL ¦ field1,field2,field3 ¦ NULL ¦ NULL ¦ NULL ¦ 104500 ¦ where used ¦
+-----------+------+----------------------+------+---------+------+--------+------------+
1 row in set (0.00 sec)

zCat

11:20 am on Dec 14, 2005 (gmt 0)

10+ Year Member



It's definitely not using an index: if one or more was being used, this would be named in the "key" output column.

A possible workaround would be to use a UNION join.

zoltan

11:30 am on Dec 14, 2005 (gmt 0)

10+ Year Member



Is union working on mysql 3.23.58? Also, how should I use an union?

Thanks in advance.

zCat

12:08 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



Is union working on mysql 3.23.58?

Not sure, it should be in the online docs. I'd upgrade, the 3.23 versions are pretty much history.

Also, how should I use an union?


Select count(primary) from table where field1 = 1
union
Select count(primary) from table where field2 = 1
union
Select count(primary) from table where field3 = 1

although thinking about it, it might lead to a different total than your original query.

zoltan

12:15 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



Yes, it is history... but I found it to be faster and more stable than version 4. Haven't tried with version 5.
Unfortunately, I just read that union can not be used on version 3+. Thanks for your advices.

arran

12:31 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



You are running into this problem because when resolving a query, MySQL can only use one index per table. Even creating a compound index on all three fields wouldn't work as your query uses disjunction.

arran.

zoltan

12:54 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



What would be the best workaround?
I tried using a 4th field (varchar(6)) where I store all the 3 previous fields like this: "1:2:3". Then, tried using a like statement:
select count(primary) from table where field4 like '%1%'

Unfortunately, this query is still long, between 2-3 secs. Any suggestion?

arran

1:10 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



zoltan,

I know it's nasty, but doing 3 separate count() queries then summing the results within your application will probably be faster than the table scan.

arran.

coopster

7:20 pm on Dec 14, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Running 3 different queries will indeed return a different count than expected when totaling though, as zCat noted. If a row had field1 and field2 both contain the value of 1 then that row would end up being counted twice, not the expected result. Have you tried formatting the query differently? IN expressions are usually slower, but I'm curious if it would use your multi column index?
SELECT COUNT(primary) FROM table WHERE 1 IN(field1,field2,field3);

zoltan

7:51 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



Same results...
How can I actually know if mysql use the indexes or not?

zoltan

7:54 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



Also, I have a complex SQL query. Here is the result of the explain:

+-----------+--------+---------------+---------+---------+----------------------+--------+----------------+
¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+-----------+--------+---------------+---------+---------+----------------------+--------+----------------+
¦ table1 ¦ ALL ¦ field1 ¦ NULL ¦ NULL ¦ NULL ¦ 104540 ¦ Using filesort ¦
¦ table2 ¦ eq_ref ¦ PRIMARY ¦ PRIMARY ¦ 4 ¦ table1.field1 ¦ 1 ¦ ¦
¦ table3 ¦ eq_ref ¦ PRIMARY ¦ PRIMARY ¦ 4 ¦ table1.field2 ¦ 1 ¦ ¦
+-----------+--------+---------------+---------+---------+----------------------+--------+----------------+

Is this query optimal? Any comment here?

sja65

8:59 pm on Dec 14, 2005 (gmt 0)

10+ Year Member



One work around you could try is to make a combined index for field1,field2,field3 and then run 8 queries (or 1 with a union of 8 statements)

select count(primary) from from table where field1=1 and field 2=1 and field3=1
union
select count(primary) from from table where field1=1 and field 2=1 and field3=0
...
until you have all 8 combinations of field1,2,3.

With a combined index of all 3 fields, each query is going to be essentially no time, and 8 times almost 0 is still almost 0. Then you just need to add the 8 numbers together.

zoltan

11:42 am on Dec 20, 2005 (gmt 0)

10+ Year Member



And something strange again.
I store the unixtimestamp of the date when a record is added. Now, I want to find the records added in the last 5 - 10 - 15 etc days. I use this command:

select my_recordid from my_records where postdateunixtimestamp >= UNIX_TIMESTAMP(date_add(curdate(), interval -5 day));

postdateunixtimestamp is INT(14) and has an index.

Everything is OK and the index is used if I check records for the last 189 days (every number that is <= 189):

explain select my_recordid from my_records where postdateunixtimestamp >= UNIX_TIMESTAMP(date_add(curdate(), interval -189 day));
+------------+-------+-----------------------+-----------------------+---------+------+-------+------------+
¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+------------+-------+-----------------------+-----------------------+---------+------+-------+------------+
¦ my_records ¦ range ¦ postdateunixtimestamp ¦ postdateunixtimestamp ¦ 5 ¦ NULL ¦ 16872 ¦ where used ¦
+------------+-------+-----------------------+-----------------------+---------+------+-------+------------+
1 row in set (0.00 sec)

Now, if I use 190 or a bigger number, the index is not used any more:

explain select my_recordid from my_records where postdateunixtimestamp >= UNIX_TIMESTAMP(date_add(curdate(), interval -190 day));
+------------+------+-----------------------+------+---------+------+-------+------------+
¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+------------+------+-----------------------+------+---------+------+-------+------------+
¦ my_records ¦ ALL ¦ postdateunixtimestamp ¦ NULL ¦ NULL ¦ NULL ¦ 60395 ¦ where used ¦
+------------+------+-----------------------+------+---------+------+-------+------------+
1 row in set (0.00 sec)

Why is that? What am I doing wrong? Why mysql does not use the index if the range is bigger?

arran

12:31 pm on Dec 20, 2005 (gmt 0)

10+ Year Member



Why mysql does not use the index if the range is bigger?

Based on column statistics, if MySQL predicts the number of rows returned by a query will equate to a sizable percentage of the entire table, it will ignore the index. The reasoning is that a table scan will work out cheaper than the cost of utilizing the index.

arran.

zoltan

1:04 pm on Dec 20, 2005 (gmt 0)

10+ Year Member



Is there a clear definition of a "sizeable percentage"? Like 50 or 60%?

arran

1:38 pm on Dec 20, 2005 (gmt 0)

10+ Year Member



I don't think it's as clearcut as that. Why not just force the index and compare the performance?

FalseDawn

3:31 pm on Dec 22, 2005 (gmt 0)

10+ Year Member



If you are searching for values x,y,z in fields 1,2,3 then maybe this would work:

define another field (field4) to be the product of
(field1-x), (field2-y), (field3-z)

This will be zero only if field1=x or field2=y or field3=z

and can be indexed. Then you'll just need a simple SELECT blah FROM blah WHERE field4=0

FalseDawn

3:49 pm on Dec 22, 2005 (gmt 0)

10+ Year Member



Of course, this won't help if you are looking for a solution that doesn't require the overhead of maintaining an extra field, and I guess you might as well define the extra field to be 0 if "field1=x or field2=y or field3=z", and 1 otherwise, so the solution is rather pointless anyway :-)

I can't think of any solution that will use an index to do what you are trying to do simply and in 1 query. Interesting, though.