Forum Moderators: coopster
Have a cms with a farely large db that gets queried(stories) - around 1.5 million records right now.
I am experiencing very slow results, when I query top storires for the last 30 days.
using mysql with php - in mysql here is its structure:
pub_date int(11) UNSIGNED
here is how I pull it out:
$time=mktime(date("H"),date("i"),date("s"),date("m"),date("d")-30,date("y"));
mysql_free_result($res);
$res1=mysql_query("select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25");
echo mysql_error();
while($row1=mysql_fetch_array($res1))
{
$input=$row1['feed_title'];
}
On a dual xeon 2gb ram--
When I do these loops, it takes significant amt of time as the time goes up, ie -24 hours = 1 second load time, - 7 days 2 seconds, -30 days 5-6 seconds.
What am I doing wrong? is it the way I have my table structure, or the way I have the query setup? Thanks in advance!
ALTER TABLE link_feeds ADD INDEX(pub_date);
Assuming no indexes (or something like a serial id column) -
Your script is having to look through c. 1.5 million entries to pick out the ones that you are looking for.
So although you have set a limit of 25 stories mysql will still need to look through all of the database to get those 25 entries.
There may be someone who knows if what I am posing is incorrect, but this is how I believe mysql is working -
If you get 30 stories each day for your first query mysql will look at the data and automatically discount anything older than 24 hours (as well as anything not posted by admin or with linktype not text). So you then have 30 stories that are parsed through the 'order by hits desc limit 0,25' So the top 25 hit stories are taken.
If we then assume a constant 30 stories per day your 7 day record search will return 210 stories that are passed through the 'order by hits desc limit 0,25'...your 30 day query will return 900 stories to be parsed through the final filter.
Assuming each story is 1k then your 900 stories will take up 900k of memory before they are cut down to 25 by the final part of your query.
However to make your $time a little easier to read you could use -
$time=str_to_time('-30 days');
// or even
$time=str_to_time($_POST['pub_date']); // or whatever you are using to allow people to choose the date range for the stories
while($row1=[url=http://uk2.php.net/manual/en/function.mysql-fetch-array.php]mysql_fetch_array[/url]($res1, MYSQL_ASSOC))
//
<edit>
Try -
EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;
[edited by: PHP_Chimp at 9:57 pm (utc) on Jan. 2, 2008]
pub_date INDEX 684607
here is result from explain too
mysql> EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>" . $time . " and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
¦ id ¦ select_type ¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
¦ 1 ¦ PRIMARY ¦ link_feeds ¦ range ¦ pub_date ¦ pub_date ¦ 4 ¦ NULL ¦ 1369214 ¦ Using where; Using filesort ¦
¦ 2 ¦ DEPENDENT SUBQUERY ¦ category_rss ¦ unique_subquery ¦ PRIMARY,postedby,linktype ¦ PRIMARY ¦ 2 ¦ func ¦ 1 ¦ Using index; Using where ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+---------+-----------------------------+
2 rows in set (0.04 sec)
The main things to look out for are rows where the key column is NULL, where the type column is range, index, or ALL, or where the Extra column contains Using filesort or Using temporary. Such queries should be closely examined for proper index usage as they generally indicate that no index is being used.
You have a few of those items mentioned showing.
There are a load of ideas on the EXPLAIN manual [dev.mysql.com] page that may be worth checking out, as you may need to optimize both tables, or split up your tables even more so that you can use smaller fields to index.
mysql> EXPLAIN select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>[red]" . $time . "[/red] and ...
mysql> explain select id,feed_title, pub_date, feed_detail from link_feeds where pub_date>1198786191 and (rss_id) in (select rssid from category_rss where postedby='admin' and linktype='text') order by hits desc limit 0,25;
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
¦ id ¦ select_type ¦ table ¦ type ¦ possible_keys ¦ key ¦ key_len ¦ ref ¦ rows ¦ Extra ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
¦ 1 ¦ PRIMARY ¦ link_feeds ¦ range ¦ pub_date ¦ pub_date ¦ 4 ¦ NULL ¦ 149344 ¦ Using where; Using filesort ¦
¦ 2 ¦ DEPENDENT SUBQUERY ¦ category_rss ¦ unique_subquery ¦ PRIMARY,postedby,linktype ¦ PRIMARY ¦ 2 ¦ func ¦ 1 ¦ Using index; Using where ¦
+----+--------------------+--------------+-----------------+---------------------------+----------+---------+------+--------+-----------------------------+
2 rows in set (0.00 sec)
mysql>
Simply adding an index to hits and an "AND hits > 10 " could help considerably. (Because you are avoiding a sort of 150,000 records.) For the value of "hits" to hardcode in, just determine what value will, on average, knock out the lowest 75% of records.
Added: Heck, just adding an index to `hits` might be enough to speed it up, according to the manual [dev.mysql.com]. Also, I've reread my post and it might be confusing... just let me know if it makes no sense and I'll try to say it a different way. ;)
but I already have index on hits, so not sure what else can do on that too..
I have indexes on all the ones you check on table:
Keyname Type Cardinality Action Field
PRIMARY PRIMARY 1382366 Edit Drop id
rss_id INDEX 4473 Edit Drop rss_id
pub_date INDEX 691183 Edit Drop pub_date
hits INDEX 272 Edit Drop hits
let me know, thanks