Forum Moderators: coopster

Message Too Old, No Replies

Mysql random rows from big tables

         

MattAU

4:13 am on Jun 11, 2005 (gmt 0)

10+ Year Member



I'm trying to pull a few random rows from a mysql table, which is easy using rand(), however it's extremely slow on bigger tables. Is there a better way to do this with larger amounts of data?

Thanks :)

MattAU

4:43 am on Jun 11, 2005 (gmt 0)

10+ Year Member



Ok, it seems the best way is to use rand() to get a few row_ids and then use a second query to pull the full rows.

Doing this dropped the time taken from 30 to 0.4 seconds using 65k rows, which seems reasonable to me. If anyone's got a better solution, please let me know :)

coopster

12:57 pm on Jun 11, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Either solution is going to get you the results you want, go with the speedier. The first solution which is using RAND() [dev.mysql.com] in a MySQL query should not be slow though. Have you run an EXPLAIN [dev.mysql.com] over the query to see where your issues lie? Try a simple query over your table to get started.
SELECT * FROM table ORDER BY RAND() LIMIT 10;

It should run fast. Next, add your columns, clauses, etc and run it again. If it is slow, EXPLAIN it. For that matter, EXPLAIN it anyway just to see how you can improve it's performance.

MattAU

2:10 am on Jun 12, 2005 (gmt 0)

10+ Year Member



This is what mysql returns from
EXPLAIN SELECT * FROM rev ORDER BY RAND() LIMIT 10

Array
(
[0] => rev
[table] => rev
[1] => ALL
[type] => ALL
[2] =>
[possible_keys] =>
[3] =>
[key] =>
[4] =>
[key_len] =>
[5] =>
[ref] =>
[6] => 62998
[rows] => 62998
[7] => Using temporary; Using filesort
[Extra] => Using temporary; Using filesort
)

I haven't used mysql much so this doesn't really mean anything to me... As I said before using that select statement takes a long time. If I use "SELECT rev_id FROM rev ORDER BY RAND() LIMIT 10" it takes 1/100th of the time, I presume because there's an index on rev_id.

I guess the easiest (and quickest) way to do this is with a subquery, something like:

SELECT * FROM rev WHERE rev_id IN (
SELECT rev_id FROM rev ORDER BY RAND() LIMIT 10
)

Thanks for your help coopster!