Forum Moderators: coopster

Message Too Old, No Replies

Slow results on date queries

         

brandon0401

6:42 pm on Jan 2, 2008 (gmt 0)

10+ Year Member



Hey everyone..Having a tricky little problem not sure where im going wrong..

Have a cms with a farely large db that gets queried(stories) - around 1.5 million records right now.

I am experiencing very slow results, when I query top storires for the last 30 days.

using mysql with php - in mysql here is its structure:
pub_date int(11) UNSIGNED

here is how I pull it out:

$time=mktime(date("H"),date("i"),date("s"),date("m"),date("d")-30,date("y"));
mysql_free_result($res);
$res1=mysql_query("select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25");
echo mysql_error();
while($row1=mysql_fetch_array($res1))
{
$input=$row1['feed_title'];
}

On a dual xeon 2gb ram--

When I do these loops, it takes significant amt of time as the time goes up, ie -24 hours = 1 second load time, - 7 days 2 seconds, -30 days 5-6 seconds.

What am I doing wrong? is it the way I have my table structure, or the way I have the query setup? Thanks in advance!

whoisgregg

9:41 pm on Jan 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What indexes do you have set on the link_feeds table?

PHP_Chimp

9:54 pm on Jan 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[dev.mysql.com...]
Im guessing you want to index the pub_date column, as this should speed up looking for specific dates.

ALTER TABLE link_feeds ADD INDEX(pub_date);

Assuming no indexes (or something like a serial id column) -
Your script is having to look through c. 1.5 million entries to pick out the ones that you are looking for.
So although you have set a limit of 25 stories mysql will still need to look through all of the database to get those 25 entries.
There may be someone who knows if what I am posing is incorrect, but this is how I believe mysql is working -
If you get 30 stories each day for your first query mysql will look at the data and automatically discount anything older than 24 hours (as well as anything not posted by admin or with linktype not text). So you then have 30 stories that are parsed through the 'order by hits desc limit 0,25' So the top 25 hit stories are taken.
If we then assume a constant 30 stories per day your 7 day record search will return 210 stories that are passed through the 'order by hits desc limit 0,25'...your 30 day query will return 900 stories to be parsed through the final filter.
Assuming each story is 1k then your 900 stories will take up 900k of memory before they are cut down to 25 by the final part of your query.

However to make your $time a little easier to read you could use -


$time=str_to_time('-30 days');
// or even
$time=str_to_time($_POST['pub_date']); // or whatever you are using to allow people to choose the date range for the stories

Also using -

while($row1=[url=http://uk2.php.net/manual/en/function.mysql-fetch-array.php]mysql_fetch_array[/url]($res1, MYSQL_ASSOC))
//

may help to speed up the php side of things, as this will only return the associative indexed array, not the numeric as well. However I dont think that the php side is your bottle neck.

<edit>
Try -


EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;

As this may well give you some ideas about how this is actually getting executed. May help :)

[edited by: PHP_Chimp at 9:57 pm (utc) on Jan. 2, 2008]

brandon0401

11:02 pm on Jan 2, 2008 (gmt 0)

10+ Year Member



I already run indexes, it did make it faster, but not that much faster

pub_date INDEX 684607

here is result from explain too

mysql> EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
¦ id ¦ select_type ¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
¦ 1 ¦ PRIMARY ¦ link_feeds ¦ range ¦ pub_date ¦ pub_date ¦ 4 ¦ NULL ¦ 1369214 ¦ Using where; Using filesort ¦
¦ 2 ¦ DEPENDENT SUBQUERY ¦ category_rss ¦ unique_subquery ¦ PRIMARY,postedby,linktype ¦ PRIMARY ¦ 2 ¦ func ¦ 1 ¦ Using index; Using where ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
2 rows in set (0.04 sec)

PHP_Chimp

10:09 am on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From [dev.mysql.com...]
The main things to look out for are rows where the key column is NULL, where the type column is range, index, or ALL, or where the Extra column contains Using filesort or Using temporary. Such queries should be closely examined for proper index usage as they generally indicate that no index is being used.

You have a few of those items mentioned showing.

There are a load of ideas on the EXPLAIN manual [dev.mysql.com] page that may be worth checking out, as you may need to optimize both tables, or split up your tables even more so that you can use smaller fields to index.

whoisgregg

2:11 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The EXPLAIN command shouldn't include PHP code. The non-date value of ".$time." is not what mysql is normally receiving (at least, hopefully not). Echo out your $res1 value in your script and add an "EXPLAIN " to the front of it and post back that output. :)

 mysql> EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>[red]" . $time . "[/red] and ...

brandon0401

8:12 pm on Jan 3, 2008 (gmt 0)

10+ Year Member



Thanks for your help guys, I thought that about the php, here is explain with out it - looks to be same though - I dont get why it woudl do that as there is a key on the pub_date - any ideas why not saying key?, thanks.

mysql> explain select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>1198786191 and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
¦ id ¦ select_type ¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
¦ 1 ¦ PRIMARY ¦ link_feeds ¦ range ¦ pub_date ¦ pub_date ¦ 4 ¦ NULL ¦ 149344 ¦ Using where; Using filesort ¦
¦ 2 ¦ DEPENDENT SUBQUERY ¦ category_rss ¦ unique_subquery ¦ PRIMARY,postedby,linktype ¦ PRIMARY ¦ 2 ¦ func ¦ 1 ¦ Using index; Using where ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
2 rows in set (0.00 sec)

mysql>

whoisgregg

9:23 pm on Jan 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is probably the ordering of the results. It's fetching 149344 records then it has to go through and sort them all by hits.

Simply adding an index to hits and an "AND hits > 10 " could help considerably. (Because you are avoiding a sort of 150,000 records.) For the value of "hits" to hardcode in, just determine what value will, on average, knock out the lowest 75% of records.

Added: Heck, just adding an index to `hits` might be enough to speed it up, according to the manual [dev.mysql.com]. Also, I've reread my post and it might be confusing... just let me know if it makes no sense and I'll try to say it a different way. ;)

brandon0401

10:03 pm on Jan 3, 2008 (gmt 0)

10+ Year Member



thanks for your help, ya ive tore it apart and not been able to make it better - it does seem like the last part is the tricky part that is making it long, its either the date or later..

but I already have index on hits, so not sure what else can do on that too..

I have indexes on all the ones you check on table:

Keyname Type Cardinality Action Field
PRIMARY PRIMARY 1382366 Edit Drop id
rss_id INDEX 4473 Edit Drop rss_id
pub_date INDEX 691183 Edit Drop pub_date
hits INDEX 272 Edit Drop hits

let me know, thanks

brandon0401

10:49 pm on Jan 3, 2008 (gmt 0)

10+ Year Member



Wow, so I added hits > 10 and it speeded it up incredible! Thanks so much, seems so weird, but makes sense now!

whoisgregg

1:14 am on Jan 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm glad that worked for you. :)