Forum Moderators: bakedjake

Message Too Old, No Replies

Help with Load Average - linux guru?

Site is bogging down with CPU idle!

         

spagmoid

3:36 pm on May 29, 2003 (gmt 0)

10+ Year Member



I understand that Load Average is the average number of processes in the run queue. My site activity has grown so that I get 100-150 httpd processes active at once. Last night the LA went up to 50 and the site dropped to a CRAWL. I assumed this meant my CPU was maxxed out - but to my surprise, I still had a 50% idle CPU. I am nowhere near my bandwidth limits. So what on earth could be causing the web server to basically freeze?

All I can figure is there is something blocking the httpd processes from finishing. Something that keeps them in the run state instead of the sleep state.

dingman

9:42 pm on May 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is it a dynamic site, or a static one? The first thing that occurrs to me is that things might be bogging down on I/O to the hard drive.

Another thought is to check your memory usage - if something is hogging lots of RAM, you might be swapping to the HD a lot, which slows things down without keeping the CPU busy.

Finally, by "I am nowhere near my bandwidth limits", do you mean that you are nowhere near having transferred the X gb your host allows in a month, or that at the time the server wasn't sending anywhere near X mb/sec? Only the latter matters in this case, and I assume it's what you meant, but I could easily imagine someone getting confused about which is which by hosting marketing.

spagmoid

3:28 pm on May 31, 2003 (gmt 0)

10+ Year Member



Thanks dingman,

It's very dynamic, all PHP.
This is a pretty newbie question but, how would I be able to tell if the hard drive is bogging down?

I'm thinking memory may be the problem now - I only have 512MB ram, and each of my httpd processes is using "85000" VSZ (doesn't that mean 85meg? how is that possible?) and 11000 RSS.
There are also several few mysql processes using 36000 VSZ & 9000 RSS each.
How can I reduce Apache & mysql memory usage?

As for bandwidth, I have 400 gig/month xfer allotted and I'm going to use about 60 gig of it this month, so I doubt bursting was causing it.

dingman

8:24 pm on Jun 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



where are you getting these VSZ and RSS numbers? None of the utilities I'm using show those units, which makes it hard to know what they really mean.

How you would reduce memory usage depends largely on how your dynamic app works. The first thing that occurrs to me is to check that you aren't selecting more records than you need out of the database all over the place. If something is leaking memory, you might also find that configuring Apache to serve fewer requests before killing and re-spawning the slave processes helps. Really, though, optimizing for reduced memory usage is going to be very much dependent on what the code for the dynamic site does.

When the server was crawling, how did the text you typed at a command prompt behave? was it nice and smooth, like typing locally, or did it show up in bursts?

spagmoid

10:51 pm on Jun 3, 2003 (gmt 0)

10+ Year Member



VSZ and RSS are shown in "top". vsz is some sort of size, and rss is the resident size.

The text showed up in bursts.. I have tweaked the apache MaxClients down to 50, and got the load avg below 2, but then there's a delay to users because their requests are queued.

There aren't any big database things going on, a lot of these threads are just downloading images, etc. It seems like a big waste for apache to use 10 megs (with 7 of it shared), just for a client to download an image. I was thinking Apache could handle 500+ processes at once with 512meg RAM, if there's no way to get the non-shared size down under a meg this will never happen.. Has anyone done this?

So how do I check my hard drive usage remotely?

spagmoid

2:58 pm on Jun 5, 2003 (gmt 0)

10+ Year Member



Anyone?

dingman

3:27 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I keep partially writing back to you and then getting interrupted. It looks like you're right about the memory sizes. As for HD access, I'd look for apache processes in states 'S' or 'D' while the server is under load.

Bursty text might be an indication that you're actually at the limits of some portion of the connection between you and the machine in question, though the bottleneck is not necessarily the network card on the server. The only way to check that that comes to mind immediately involves looking at output from /sbin/ifconfig at known intervals and doing a bit of math to figure out how much bandwidth it is using. Depending on the network topography at your host, you might also have trafic suffer from heavy load on nearby machines. (Ethernet performance degrades exponentially as you put more machines in a single collision zone. It can suck.)

dingman

3:37 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



as an alternative to /sbin/ifconfig, you can also read the file /proc/net/dev at a known interval and calculate from that, or use 'iftop'. I'm not sure where to find info about transfers to and from disk.

drbrain

3:40 pm on Jun 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How many apache modules do you have loaded? Eliminate those you are not using.

mtr can help you determine if there are problems with the connection to your site.

Storyteller

5:18 am on Jun 6, 2003 (gmt 0)

10+ Year Member



spagmoid, iostat and lsof commands can help you figure out what's going on with the I/O system. Consult appropriate man pages for the details. Also, you can run 'ps xa' and post the output here so we can get a clearer picture and maybe point out what's going wrong indeed...

daisho

1:42 pm on Jun 7, 2003 (gmt 0)

10+ Year Member



do a search of "iftop" it's a great program that littleman mentioned awhile ago. This will help you see if the problem is Network IO or not.

daisho

spagmoid

6:36 pm on Jun 7, 2003 (gmt 0)

10+ Year Member



Thanks everyone, I don't believe this could be a network problem, since during this slowdown I tried downloading a file, and I got 200KB/s without a problem (the limit of my cable modem). But I will try those programs.

I had a ton of processes in state S but none in D that I could see. I assume D means disk access, so that's probably not the problem?

I've removed some unneeded modules from Apache, here are the ones remaining, are any non-essential?

LoadModule env_module modules/mod_env.so
LoadModule config_log_module modules/mod_log_config.so
LoadModule agent_log_module modules/mod_log_agent.so
LoadModule referer_log_module modules/mod_log_referer.so
LoadModule mime_module modules/mod_mime.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule status_module modules/mod_status.so
LoadModule info_module modules/mod_info.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule dir_module modules/mod_dir.so
LoadModule cgi_module modules/mod_cgi.so
LoadModule asis_module modules/mod_asis.so
LoadModule imap_module modules/mod_imap.so
LoadModule action_module modules/mod_actions.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule proxy_module modules/libproxy.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule access_module modules/mod_access.so
LoadModule auth_module modules/mod_auth.so
LoadModule anon_auth_module modules/mod_auth_anon.so
LoadModule db_auth_module modules/mod_auth_db.so
LoadModule digest_module modules/mod_digest.so
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule usertrack_module modules/mod_usertrack.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule ssl_module modules/libssl.so
LoadModule php4_module modules/libphp4.so
LoadModule httpdmon_module /usr/lib/apache/httpdmon.so
LoadModule httpd_defines_module /usr/lib/apache/httpd_defines.so
LoadModule gzip_module modules/mod_gzip.so

I figured out that I can move MaxClients up to 100, after that it starts swapping. During peak, I do get 100 active at once. This MaxClients has improved the slowdown a lot, so I think maybe the problem was when the computer began to swap to disk. Of course, limiting MaxClients causes requests to queue, which causes a more manageable slowdown.

Let me explain a little, this is a site where many people meet and talk, so when they are online they are looking at others and writing to them, and loading graphics intensive pages at a pretty high rate. I was hoping to be able to handle 1,000 online at once on one machine, thinking they would each access a page every 30-60 seconds. So 200-300 processes active at once. But then I realized that browsers send several requests at once, so multiply that by 4 and its 1,000 processes at once, each taking up 3 meg of unshared RAM. Were my estimates unrealistic, or is there any way to tweak this? How much traffic can you expect to handle with Apache on a 1ghz machine with 512 meg RAM? I don't understand why Apache needs so much RAM for each freakin process. At its core, a web server is really just a simple file server! I would think 500K per process would be plenty.

I REALLY don't want to get into multiple machines just yet..

spagmoid

9:03 pm on Jun 13, 2003 (gmt 0)

10+ Year Member



OK now the load average is 3 (it was 6), with CPU about 40% idle, and plenty of free RAM. Here's the output from top. Any ideas?

--------------------------------------------
4:06pm up 1 day, 45 min, 1 user, load average: 3.00, 3.23, 2.76
146 processes: 144 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 34.9% user, 5.7% system, 18.5% nice, 40.7% idle
Mem: 1031248K av, 955832K used, 75416K free, 0K shrd, 43620K buff
Swap: 1020116K av, 32932K used, 987184K free 560928K cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1938 apache 15 0 9780 9112 6492 S 5.3 0.8 0:17 httpd
14547 apache 15 0 9816 9148 6460 S 3.7 0.8 0:01 httpd
9198 apache 15 0 10440 9772 6504 S 3.5 0.9 0:06 httpd
25323 apache 15 0 10612 9944 6484 S 3.1 0.9 0:30 httpd
31254 apache 15 0 10420 9752 6504 S 3.1 0.9 0:20 httpd
2595 apache 16 0 9900 9232 6492 S 2.7 0.8 0:19 httpd
6230 apache 15 0 9880 9212 6500 S 2.3 0.8 0:10 httpd
31895 apache 15 0 10508 9840 6524 S 2.1 0.9 0:23 httpd
9201 apache 16 0 10700 9.8M 6484 S 2.1 0.9 0:07 httpd
14782 apache 15 0 9616 8948 6412 S 1.9 0.8 0:00 httpd
14788 apache 16 0 9720 9052 6424 S 1.9 0.8 0:00 httpd
3539 apache 15 0 10356 9688 6516 S 1.7 0.9 1:03 httpd
13818 admin 15 0 1260 1260 908 R 1.1 0.1 0:03 top
10319 apache 15 0 9796 9128 6508 S 0.9 0.8 0:03 httpd
309 apache 15 0 10784 9.9M 6508 S 0.3 0.9 0:18 httpd
8561 apache 15 0 10688 9.8M 6504 S 0.1 0.9 0:54 httpd
25358 apache 15 0 10448 9780 6592 S 0.1 0.9 0:26 httpd
303 apache 15 0 10116 9448 6492 S 0.1 0.9 0:20 httpd
1158 apache 15 0 9724 9056 6492 S 0.1 0.8 0:16 httpd
9195 apache 15 0 9740 9072 6500 S 0.1 0.8 0:06 httpd
1 root 15 0 508 456 432 S 0.0 0.0 0:04 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0
4 root 15 0 0 0 0 SW 0.0 0.0 0:03 kswapd
5 root 15 0 0 0 0 SW 0.0 0.0 0:01 bdflush
6 root 15 0 0 0 0 SW 0.0 0.0 0:00 kupdated
7 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd
11 root 15 0 0 0 0 SW 0.0 0.0 0:10 kjournald
88 root 16 0 0 0 0 SW 0.0 0.0 0:00 khubd
180 root 15 0 0 0 0 SW 0.0 0.0 0:00 kjournald
754 root 15 0 584 580 536 S 0.0 0.0 0:00 syslogd
769 root 15 0 1208 492 492 S 0.0 0.0 0:00 klogd
831 named 15 0 1672 1292 1160 S 0.0 0.1 0:00 named
833 named 15 0 1672 1292 1160 S 0.0 0.1 0:00 named
834 named 15 0 1672 1292 1160 S 0.0 0.1 0:00 named
836 named 15 0 1672 1292 1160 S 0.0 0.1 0:00 named
837 named 15 0 1672 1292 1160 S 0.0 0.1 0:00 named
854 root 15 0 1196 1040 1008 S 0.0 0.1 0:00 sshd

cminblues

5:10 am on Jun 19, 2003 (gmt 0)

10+ Year Member



Interesting thing you have 32M of memory swapped..
This _isnt_ good, but the useful issue here is that you can search who's the responsible, and maybe you'll end up with some clues about the overload causes.

Have you tried, about I/O problems, 'hdparm -t'? [better on non-mounted disks]

Also, I recommend 'noatime' entry in /etc/fstab.

spagmoid

5:35 am on Jun 19, 2003 (gmt 0)

10+ Year Member



How can I find out who is swapped out? I assumed it was something rarely used so not a problem..

hdparm -t: yes, my drive is quite fast.

Now, I'm leaning toward it being a mysql locking issue. I do have a ton of queries going on. When writing to a table though, I think it only locks 1 row against reading, so not sure why that would be an issue.

dingman

2:26 pm on Jun 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Really, I wouldn't worry about 32mb swapped. Sometimes stuff seems to get used so rarely that it just never gets pulled back from swap after it was put there. If it's practically never getting accessed, who cares if it's in swap instead of RAM? if anything, it'll save time when an active process needs to malloc() a bit more, 'cause the disk writes to make space in real RAM have already happened. I even have a vague recollection of having read something to that effect on LKML, though of course I could be making that up or assuming that the poster knew more than they did.

cminblues

2:20 am on Jun 20, 2003 (gmt 0)

10+ Year Member



Yes I've emphasized too much the 'isnt good' POV of the issue, my 'tip' was [assuming that swap space dont remains stuck i.ex.] to indagate further..

[I remember an HTTP server having some 60MB or so ever swapped, who cares I thought, all well anyway, and then found a resurrected 04 AM cron job doing some crazy/useless/expensive things on the whole /var/log.. :)]

spagmoid

3:02 am on Jun 20, 2003 (gmt 0)

10+ Year Member



Well nobody's still told me how I would find that out..

cminblues

3:34 am on Jun 20, 2003 (gmt 0)

10+ Year Member



You're right he..

Anyway, watching at your top's dump, most 'TIME' values of httpds spawned are too high, and also the pids are a bit strange, but of course this can be due to overload.

Try also a 'netstat -aln > log0.txt' to watch at who's connected, and doing what.

Maybe a solution is to shorten the timeout values in httpd.conf,
and/or also in /proc/sys/net/[ipv4], and/or doing less dynamic stuff, and _more_ with static pages..
[ask Brett he.. :)]

But I think that if you configure/upgrade well your Apache server, things will go better.

BTW, which version of Apache do you have?
If you've not upgraded/patched, it's also possible someone is dossing you.

somerfeld

10:24 am on Jun 22, 2003 (gmt 0)

10+ Year Member



I am no expert, in fact i may not know what im talking about. I run a 1.4g p4 1g ram. I run a large gallery which can operate at a good volume in a short period of time [djgateway.com...] . Why do you have your max clients so low on such a big site..? i dont know man. i have mine at 250 and childs at 1000. If i was at 100 my site would freez. Your top doesnt look like your running any mysql? whats your site?

spagmoid

5:45 pm on Jun 22, 2003 (gmt 0)

10+ Year Member



I'm running Apache/1.3.27

MaxClients was 100 because each Apache process is taking up 3.3 megs of unshared memory. So 100 means 330 megs of RAM will be used by Apache, and that's about all I had free of 512 megs. When it starts swapping to disk, it REALLY slows to a crawl. Limiting MaxClients keeps it running faster.

Now I have 1 gig of RAM, so I can probably set MaxClients to 250. I don't know how anyone could get more running smoothly on a machine with 1 gig RAM, any ideas are welcome. I think its ridiculous for Apache processes to use up 3 frickin megs of unshared RAM each.

This is a friendship/dating site.
It's very dynamic, every single page has dynamic content (status of mailbox etc). It's not possible to make it static.

Is there any way to limit requests from a single IP? ie, a user opens 3 browser windows at once, or refreshes quickly, and I get 30 requests coming from one computer.

somerfeld

9:18 pm on Jun 22, 2003 (gmt 0)

10+ Year Member



well I'm very intrested in your problem becauase i have the same.. still. Eventually I got so pissed off that i hired a programer to see if it was my scripts. After I found out it had nothing to do with them. I yelled at my isp admins for telling me it was my scripts. They went into the bios and chnaged some settings.. I have no clue what the did but my load average went down from 20 - 2 on a heavy load when they were done. I still have problems when my site gets about 20 visitors clicking away.

cminblues

11:59 pm on Jun 22, 2003 (gmt 0)

10+ Year Member



spagmoid, I'd try one/some of these [not knowing deeply your config etc..]:

- use 'ab' [Apache Benchmark tools], experimenting with various parameters[concurrent requests etc..], files requested, etc.. (of course, in a window time without heavy traffic)

- try to compile in C some of your scripts
[but this depends also on their complexity, your need of frequent updates..]

- try to have the files accessed by your scripts, on a separated hd. [and, maybe, also another hd for logs]

- look deeply at your httpd.conf
if you use ModRewrite i.ex., it's not so unlikely to have a 'preloaded' script eating lot of RAM.

- try to use Apache 2 [IMHO is still a little less stable than 1.3, but performance is truely improved]

ggrot

12:40 am on Jun 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a site like yours (worse actually), and I can offer you some of the things that I have encountered in the past that could cause this:

a) Log File. Is your log file greater than 2 gigs in size? Linux doesn't handle writes to files greater than 2 gigs very well. If you are burning through 2gigs/bandwidth a day, you could accumulate 2 gigs of logs in a month I would guess. Delete the log file, restart apache, see what happens.

b) mySQL locks. mySQL doesn't support row level locking, only table level. So when you make a write to a table, it has to wait for all the reads to finish, and then all the reads have to wait for the write to finish. Queues can pile up fast. The best way to see if this is your problem is to login to the mysql command line and run the command "show processlist;". If everything is going smoothly, you will have <20 processes, and usually only about 5. If you see 200, thats a problem. You can also see the state the processes are in, look for 'locked' and look at the time they have been running. If it is more than a couple seconds, you have a serious locking problem. This is harder to solve. Migrate to postgreSQL or put mySQL on a separate machine with more RAM/faster processor.

c) mySQL swapping. Do you have large (hundreds of megs) databases? If so, you might need to look into changing your mysql settings (my.cnf) as the defaults offer very little RAM alloted to the mySQL processes.

Hope that helps you.

spagmoid

4:50 am on Jun 23, 2003 (gmt 0)

10+ Year Member



My log files are 1.6gig, rotated weekly. I guess I'll have to do something about that soon. I run Urchin daily, does that mean I can delete the logs after a day?

I've never seen more than 5 mySQL processes. Maybe table locks aren't the problem then? I just noticed Table_locks_immediate and Table_locks_waited
They are currently about:
Table_locks_immediate: 11,000,000
Table_locks_waited: 76,000

My database is 600 meg, and radical changes to the mysql ram config didn't have a noticeable effect.

spagmoid

3:18 am on Jul 22, 2003 (gmt 0)

10+ Year Member



I've drawn a few conclusions that may be useful to share.
There's no CPU or RAM problem.

I believe this is all due to MySql database usage. There are no really bad queries, but there are many queries going on (80/second+ during peak). Most of these take 1-2ms but a few were taking around 400ms. The problem has not been reduced until I optimized ALL of the longer queries down to 5-10ms. Optimizing a few of them didn't have much effect so I tended not to suspect the database. Things that were helpful were mytop, and enabling the slow query log.

With a little math I can see that my server can only handle in theory about 3x more traffic then I will be in trouble again.

Thanks everyone for your suggestions!

martin

4:35 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



Just an idea, if you have many lock waits you might want to make your tables InnoDB instead of MyISAM, at least that would be a lot easier than migrating to PostgreSQL as ggrot suggested.

seindal

7:35 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



What is your setting for MaxRequestsPerChild?
If you have a memory leak, your apache processes might grow quite big. By setting MaxRequestsPerChild to some not too high value, you can force apache to shut down and restart child processes, which will free the leaked memory.

spagmoid

8:13 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



I tried messing with MaxRequestsPerChild, it didn't help - I haven't detected any sort of memory leak. My main problem with Apache is that it takes up so much unshared RAM for each process - like 4 meg. Anybody with insight on reducing that..?

martin

11:00 am on Jul 27, 2003 (gmt 0)

10+ Year Member



I'm sure you don't use all those modules, not that they should take much non-shared RAM but at least it should free some if you don't load them all.
This 33 message thread spans 2 pages: 33