homepage Welcome to WebmasterWorld Guest from 54.166.113.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderator: open

Databases Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
MySQL Optimization
Huge increase in traffic this week, causing MySQL problems.
Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 2:40 am on Nov 11, 2006 (gmt 0)

One of my sites this week has all of a sudden seen a huge(relatively speaking) increase in traffic this week. I guess Google decided it liked us.

The problem is, the server has gone down twice. Each time, MySQL was probably the cause due to extremely high CPU usage. My stats won't even run because of the cpu load, so I can't even tell how much traffic I'm really getting.

The server is a linux based pentium 4 2.8ghz. I can upgrade, but pretty sure, I just need to do some db optimization.

My question is, where do I start? Is there a MySQL optimization checklist I could start going through? If not, can y'all post some suggestions?

 

baxuyen

5+ Year Member



 
Msg#: 3153256 posted 4:40 am on Nov 11, 2006 (gmt 0)

there's a thread in this forum about database optimization by marcus007.
He seem to be able to handle about 4million+ page view per day out of one single server.

I'm not saying what he's doing is easy, but if you read his thread you'll be able to see some of the step he and others have taken to optimize their database.

physics

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 5:22 pm on Nov 11, 2006 (gmt 0)

Indexing your columns is usually a big one:
[databasejournal.com...]

Also, make sure you're making the most of the mysql query cache.
[databasejournal.com...]

Check mysql load with mytop (linux command line).
[freshmeat.net...]

Finally, if your database is badly designed, normalizing could help a lot.
[dev.mysql.com...]

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 2:09 am on Nov 12, 2006 (gmt 0)

Thanks! Exactly what I was looking for.

physics

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 5:29 am on Nov 12, 2006 (gmt 0)

Good to hear :) Keep us posted on what helps...

freeflight2

10+ Year Member



 
Msg#: 3153256 posted 5:46 am on Nov 12, 2006 (gmt 0)

often it's due to 1 or 2 slow queries.

adding

long_query_time=2
log-slow-queries=/var/lib/mysql/slow-queries.log

to /etc/my.cnf and working on tuning these slow queries found in the slow-queries.log can enlighten you

physics

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 6:52 am on Nov 12, 2006 (gmt 0)

Good point freeflight2. Here's a link to a slow query log tutorial Jordo:
[databasejournal.com...]

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 2:50 pm on Nov 13, 2006 (gmt 0)


Indexing your columns is usually a big one:
[databasejournal.com...]

I was already pretty much indexed everywhere I could, but looked over the tables again, and couldn't really see anything else to index.

Also, make sure you're making the most of the mysql query cache.
[databasejournal.com...]

Wasn't sure if I was taking advantage of this, so I checked the config, and I am using it with a 32mb cache.

Check mysql load with mytop (linux command line).
[freshmeat.net...]

Looked into this, but the cms already has a screen that pretty much tells me all this. Is there an advantage to installing it that I don't see?

Finally, if your database is badly designed, normalizing could help a lot.
[dev.mysql.com...]

I'm still looking at the tables to make sure they're designed ok.

The slow query log is enabled, but I'm not sure about the long query option. I'm not at a place I can check the config. But to give y'all and idea of what I'm dealing with, I restarted MySQL last night, and as of this morning I already have 57 slow queries.

My slow query log is full of just about every table, some a lot more than others, but it's pretty diverse, except for 2 queries that really stand out. One is a simple standard query that is run on every page. It's an IP table that I use for a bot trap. All it does is lookup up the IP, and if it exists, blocks the bot. It's been logged over 400 times.

The other big hitting slow query has only been logged twice, but the execution time is very very high. But this is another query that runs alot. So only having 2 logged out of maybe 15k executions a day isn't all that bad is it?

One thing I realized over the weekend was that I did enable the faster crawl speed for googlebot a couple of weeks ago. It's hitting me currently at about an average of 16k hits a day with a maximum of 36k hits in one day. I'm guessing (lol) that this isn't helping when I've also gotten that huge increase in live person page visits. I did go ahead and change it back to the normal crawl rate via the google webmaster page, but not sure when it will take effect.

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 3:34 pm on Nov 13, 2006 (gmt 0)

One more thing. As I mentioned earlier, it is a CMS. It's running about 30-50 queries a page, depending on the type of page. It takes anywhere from .7 - 2.7 seconds to run all the queries. On one page, it could take .7 seconds to run the query, then if I refresh it, it could take 1.56 seconds to run all the queries.

[edited by: Jordo_needs_a_drink at 3:43 pm (utc) on Nov. 13, 2006]

stajer

10+ Year Member



 
Msg#: 3153256 posted 6:41 pm on Nov 13, 2006 (gmt 0)

Your OP makes me suspect you are running mysql on the same server as the cms and http. Often, as traffic increases, the number 1 thing you can do to improve performance is to move your db to its own server.

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 7:17 pm on Nov 13, 2006 (gmt 0)

Your OP makes me suspect you are running mysql on the same server as the cms and http. Often, as traffic increases, the number 1 thing you can do to improve performance is to move your db to its own server.

Yes, they are on the same server. May have to seperate them, but I was hoping (maybe foolishly) that I could find an optimization trick or recommendation that I hadn't thought of or knew of before spending the money.

[edited by: Jordo_needs_a_drink at 7:19 pm (utc) on Nov. 13, 2006]

physics

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 7:43 pm on Nov 13, 2006 (gmt 0)


It's an IP table that I use for a bot trap. All it does is lookup up the IP, and if it exists, blocks the bot. It's been logged over 400 times.

Is the IP field indexed? Also it might improve speed if you moved this job off to Apache's httpd.conf instead of embedding it in your pages.

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 8:19 pm on Nov 13, 2006 (gmt 0)

Is the IP field indexed?

Yes, the IP field is the primary key.

It's a myisam table and here's the create for it...

CREATE TABLE ip_ban (
ip varchar(15) NOT NULL,
date datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
PRIMARY KEY (ip)
);

It currently has 128 rows.

I've thought about using the .htaccess file for it, but I'll look at using the httpd.conf also.

[edited by: Jordo_needs_a_drink at 8:21 pm (utc) on Nov. 13, 2006]

eelixduppy

WebmasterWorld Senior Member eelixduppy us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3153256 posted 9:39 pm on Nov 13, 2006 (gmt 0)

As a side note: Don't forget to check the documentation on Optimization [dev.mysql.com]. :)

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 5:29 am on Nov 14, 2006 (gmt 0)

As a side note: Don't forget to check the documentation on Optimization. :)

Yup, when all else fails read the manual. Thanks for the reminder. I started reading it tonight and it's already helped me with another problem I found in the slow-query-log. One of my selects has been showing up a lot in the log today and it happens to be a select I have on every page that shows a random 5 items from a table of about 105k items. After reading the above, it appears that ORDER BY RAND() Limit 5, doesn't do so well with more than 1000 or so records because it pulls the records then sorts the records. The slow-query-log even shows 210k results instead of 105k. I rewrote the query so hopefully that will help some.

I also started using ip2long for my ip addresses for the ip_ban table used by the bot trap. By doing that, I could change my primary key from varchar to integer which, I guess, is supposed to be better for key indexing.

Still reading and coding...

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 1:56 am on Nov 16, 2006 (gmt 0)

Just a quick post to let y'all know I think I'm in the clear now. I've gone 2 days without a slow query logged and the mysql cpu usage being extremely slow. In fact the top process today for mysql was %CPU 0.6. And, I'm now seeing all the queries take .04 - .7 seconds to run on a page, rather than the .7 - 2.4 seconds. Crossing my fingers that it stays that way.

Thanks for the advice and reading material.

In addition the the items I did in my previous posts, I also found some bad coding on my part that sent the results of one query into a while statement that queried the db 5 more times. I easily turned that into just one join query.

The optimization lessons I learned were:

You need perform all the great optimization methods that Physics posted:
Indexing your columns is usually a big one:
[databasejournal.com...]

Also, make sure you're making the most of the mysql query cache.
[databasejournal.com...]

Check mysql load with mytop (linux command line).
[freshmeat.net...]

Finally, if your database is badly designed, normalizing could help a lot.
[dev.mysql.com...]

Good point freeflight2. Here's a link to a slow query log tutorial Jordo:
[databasejournal.com...]

But, don't forget to read the manual, because it may be your query coding also...
eelixduppy said:
As a side note: Don't forget to check the documentation on Optimization. :) [dev.mysql.com...]

Because that's where I found most of my probs.

physics

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 12:17 am on Nov 17, 2006 (gmt 0)

Glad to hear it Jordo. I had a similar problem with the random items query. How did you optimize the query? Generate random numbers in your code and use those to pull entries from the db?

AlexK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 12:56 am on Nov 17, 2006 (gmt 0)

Jordo needs a drink:
the IP field is the primary key.
.
CREATE TABLE ip_ban (
ip varchar(15) NOT NULL,
date datetime DEFAULT '0000-00-00 00:00:00' NOT NULL,
PRIMARY KEY (ip)
);
...
I could change my primary key from varchar to integer which, I guess, is supposed to be better for key indexing

One of the biggies for speed is whether a (complete) table resolves in MySQL to 'fixed' or 'dynamic'. The presence of a
varchar field will *always* cause the entire table to be 'dynamic', which automatically means slower access speed. The differences can be dramatic. The downside is that often 'fixed' tables occupy more disk space than 'dynamic' tables.

Another small change which has worked for me is to spot tables which, apart from one or two fields, could be both 'fixed' and unchanging, or rarely changed. I was very surprised to find the speed increases possible by moving problematic fields out into their own tables, and joining the two tables in any searches. I thought (before doing it) that the join would slow the whole thing down, but no, quite the reverse. Text and varchar fields are the obvious candidates for swapping out, but I also made good savings with an integer field used for page stats. The original table was getting accessed a lot by the stats routines, and could not be made 'fixed' because of the presence of text fields. The new stats table was only integers and therefore 'fixed' format, and lightning fast to access.

Added:
The final item is to understand just how much time-savings the new cache can allow in MySQL 4.1, and to arrange your tables accordingly. Using phpMyAdmin, I've seen live a first access of 0.5 sec drop to 0.0001 on a second access. The key is to understand that this can only happen when the table has not been updated in between (any table updates auto-squelch the cache on that table).

Therefore, look at your table structures to see if it is possible to place frequently-updated fields together within the same table, or group of tables, and--by reverse logic--whether infrequently updated fields can logically be kept together. If the latter are also frequently accessed, you can save pots and pots of time.

[edited by: AlexK at 1:09 am (utc) on Nov. 17, 2006]

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 4:56 pm on Nov 17, 2006 (gmt 0)

I had a similar problem with the random items query. How did you optimize the query? Generate random numbers in your code and use those to pull entries from the db?

Kinda...

The problem is that the table is very dynamic. Records are added and deleted regularly. So, I couldn't generate a random record ID in PHP and query the table using it because it may not exist anymore.

What I did do, which isn't perfect, but still works, is generate a random number between 1 and 150. Another portion of my script already does a count of all the records, so I divide this record count by 150 and then multiply times the random number to get a high and multiply by the random number -1 to get a min. This gives me a random chunk of of less than 1000 records to query.

$mult = rand(1, 150);
$low = round(($numrec/150))*($mult-1);
$high = round(($numrec/150))*$mult;

I then do a query "WHERE recordID > $min and recordID < $max ORDER BY RAND() limit 5".

So basically I'm getting random records from random chunks, instead of completely random.

Even though it did a lot to improve the performance, I know this isn't the best solution and am open to hearing better ways of doing this.

Added:
The final item is to understand just how much time-savings the new cache can allow in MySQL 4.1, and to arrange your tables accordingly. Using phpMyAdmin, I've seen live a first access of 0.5 sec drop to 0.0001 on a second access. The key is to understand that this can only happen when the table has not been updated in between (any table updates auto-squelch the cache on that table).

Therefore, look at your table structures to see if it is possible to place frequently-updated fields together within the same table, or group of tables, and--by reverse logic--whether infrequently updated fields can logically be kept together. If the latter are also frequently accessed, you can save pots and pots of time.

I've noticed that also, but didn't think about redoing the tables structures. Great suggestion!

[edited by: Jordo_needs_a_drink at 5:00 pm (utc) on Nov. 17, 2006]

justageek

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 11:37 pm on Nov 18, 2006 (gmt 0)

I use this to grab 1 random record from a db that has millions of rows and seek time is always 0.0x or better.

SELECT URL FROM mydb.url_map AS r1 JOIN (SELECT ROUND(RAND()*(SELECT MAX(URL_id) FROM mydb.url_map)) AS id) AS r2 WHERE r1.URL_id >= r2.id ORDER BY r1.URL_id ASC LIMIT 1

And if I want to grab a random block I just change the limit to whatever number I want.

JAG

absolutelyme

5+ Year Member



 
Msg#: 3153256 posted 1:35 pm on Nov 19, 2006 (gmt 0)

I think the above random record method does not function 100% where you have gaps in your rows .. ie: for 10 rows with primary keys 1-10 no problem, but for 10 rows with primary keys 1-8, 10-11 unexpected things will happen if you land on 9.

To fix this (low probability) bug simply add an application-level check that repeats the query (have a statistically unlikely maximum like three times) if it failed.

Could be wrong with the above, I have seen this sort of thing happen though.

I have done a lot of optimisation recently as I have built a CPU-killing site with more server-side processing per page than probably most sites out there. Randomised content, logins, muliple languages, on-demand server-generated maps and images at multiple resolutions -- you get the picture. In order to cope with it (by the way, CMS/DB on same server!) I have had to get right in to the optimisation game, which can be a huge time drain but has taught me a lot and will ensure that code I build now and in the future will be highly scalable.

Some fantastic secrets have been mentioned, heres some extra off the top of my head.

#1 is a well-tuned MySQL cache, which is not only MySQL configuration but also application configuration. For example, consider using SQL_NOCACHE in appropriate queries to preserve your query cache from being bogged down by occasional junk such as batch processing. The BEST tool for monitoring this cache is MySQL Administrator. Download it from MySQL.com for free, there's also a couple of other tools that are under development but worth having a poke at.

#2 Use more than one level of cache. Make sure the HTTP headers generated by your dynamic sites allow for caching where appropriate. Nowhere is this more important than for server-generated images, and secondly complex pages.

#3 Put EVERYTHING in the URL. Don't be tempted to use cookies for
any part of user-state except login/logout status. A key example here is user language. Despite this causing the number of pages to grow, it means better search engine rankings (I think), easier caching and generally cleaner and less complex code.

#4 Manually add the use of a memory or filesystem cache to your system for static (parts of) pages after they have been rendered out from the database, etc. This is a total winner. If you use memcache instead of the filesystem, then it also lets you easily balance load between various servers at a later stage should your site grow further.

#5 Use a PHP optimiser like APD if you are coding in PHP. Despite being a pain to get installed and used to, it will scare the pants off you how bad your code was before you ran it.

#5 Many people consider moving to lighttpd and fastcgi good for performance, personally I have done it and found it a hugely time consuming pain in the ass. Even so, I'm sure it has some positive effect.

Good luck :)

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 5:54 am on Nov 21, 2006 (gmt 0)

To fix this (low probability) bug simply add an application-level check that repeats the query (have a statistically unlikely maximum like three times) if it failed.

I don't think it would be so low probability for me, because, currently, there there are about 106k records and the top primary id is 136k.

That's about one out of every three query's would generate another query?

tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 7:20 am on Nov 21, 2006 (gmt 0)

Very nice post!

From Databasejournal
Firsly, note that when you update a table with an index, you have to update the index as well...

Could anyone show an example or explain me how to update index when updating the table.

Thank you.

absolutelyme

5+ Year Member



 
Msg#: 3153256 posted 5:44 am on Nov 22, 2006 (gmt 0)

The index is updated automatically by MySQL, the point being that this generates extra system and database load, and should be avoided if possible.

tomda

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 5:57 am on Nov 22, 2006 (gmt 0)

Thanks Absolutelyme...

Please note that articles in databasejournal shown above are a bit outdated (2003).

If you read the article about the slow-log-query, note that from MySQL 4.1, "log-long-format" (showing queries that are not indexed) is no longer used and that "log-queries-not-using-indexes" should be used instead.

justageek

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3153256 posted 5:19 pm on Nov 29, 2006 (gmt 0)

I don't think it would be so low probability for me, because, currently, there there are about 106k records and the top primary id is 136k.

That's about one out of every three query's would generate another query?

The where clause prevents problems if there are gaps. See where it says 'where r1.url_id >= r2.id'? If there is a gap then it goes to the next highest id so the query will never fail and it is lightning fast :-)

JAG

whoisgregg

WebmasterWorld Senior Member whoisgregg us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3153256 posted 5:35 pm on Nov 29, 2006 (gmt 0)

This is a great thread. :)

My small tip: For IP addresses, switch to an INT(4) field and use ip2long [php.net] and long2ip [php.net] functions (in PHP, or equivalent in your server side language of choice) to speed up the DB query for the IP address.

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 8:14 pm on Nov 29, 2006 (gmt 0)

The where clause prevents problems if there are gaps. See where it says 'where r1.url_id >= r2.id'?

Hmmm... Thanks! I'll try that.

Jordo needs a drink

5+ Year Member



 
Msg#: 3153256 posted 11:02 pm on Dec 1, 2006 (gmt 0)

The where clause prevents problems if there are gaps. See where it says 'where r1.url_id >= r2.id'?

Hmmm... Thanks! I'll try that.

Tried it, but went back to my other way. Your query is great for one result, but changing the limit to anything more than one, results in the first returned result being random, but the others being the next id in order.

For example, if you have 100 records and use your query with a limit 5, you will end up with:
1. url_id = 35 (Random ID)
2. url_id = 36 (Next ID)
3. url_id = 37 (Next ID)
4. url_id = 38 (Next ID)
5. url_id = 39 (Next ID)

foxfox

5+ Year Member



 
Msg#: 3153256 posted 2:43 pm on Dec 2, 2006 (gmt 0)

i want to ask:

are there any tool(s) that can read your query log, and try to give a summary of frequently used qeuries, and maybe re-run the log to perform benchmarking?

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved