Welcome to WebmasterWorld Guest from 22.214.171.124
12:01pm up 20 min, 1 user, load average: 196.58, 61.90, 22.30
801 processes: 617 sleeping, 45 running, 139 zombie, 0 stopped
CPU states: 22.1% user, 16.6% system, 0.9% nice, 60.1% idle
Mem: 1031180K av, 636636K used, 394544K free, 0K shrd, 2232K buff
Swap: 2064344K av, 322980K used, 1741364K free 24356K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1653 apache 16 0 9536 7684 6444 D 5.0 0.7 0:01 httpd
5 root 15 0 0 0 0 SW 4.6 0.0 0:01 kswapd
1613 apache 15 0 10440 8328 6456 S 2.7 0.8 0:01 httpd
1584 root 15 0 1412 1352 752 R 2.5 0.1 0:11 top
1517 apache 15 0 8920 6956 6752 S 2.1 0.6 0:01 httpd
7703 admin22 15 0 1968 1968 1936 S 0.9 0.1 0:00 /usr/bin/perl sto
7901 admin12 15 0 1964 1964 1932 S 0.9 0.1 0:00 /usr/bin/perl sto
7904 admin23 15 0 1968 1968 1936 S 0.9 0.1 0:00 /usr/bin/perl sto
7907 admin14 15 0 1872 1872 924 D 0.9 0.1 0:00 /usr/bin/perl sto
7910 admin20 15 0 1964 1964 1932 S 0.9 0.1 0:00 /usr/bin/perl sto
7912 admin12 15 0 1968 1968 1936 S 0.9 0.1 0:00 /usr/bin/perl sto
7914 admin15 15 0 1968 1968 1932 S 0.9 0.1 0:00 /usr/bin/perl sto
7921 admin15 15 0 1964 1964 1932 S 0.9 0.1 0:00 /usr/bin/perl sto
7924 admin15 15 0 1964 1964 1932 S 0.9 0.1 0:00 /usr/bin/perl sto
7909 admin13 15 0 1964 1964 1932 S 0.7 0.1 0:00 /usr/bin/perl sto
1616 apache 15 0 9656 6360 6360 D 0.5 0.6 0:00 httpd
7473 apache 15 0 7516 5876 4672 S 0.5 0.5 0:00 httpd
Your system is probably swapping like mad, which often leaves processes in disk-wait status.
Since disk-wait is normally a short term condition, these processes also count in the calculation of the load, which is the average length of the queue of active processes on the system.
The three load numbers are the average length of the run-queue over 1 minute, 5 minutes and 15 minutes. Your numbers are
load average: 196.58, 61.90, 22.30
so the condition had been building up for some time, starting 20-30 minutes before you discovered it.
If the system was swapping itself to pieces, it had also gotten over the worst, because you still have free memory. In could indicate that some process had grown very large, eating up all your memory and causing the system to swap, slowing everything down and raising the load factors, but then the process exited or died because it ran out of memory, which freed a lot of memery, allowing the system to swap the other processes back in again.
I got these numbers from here. [en.tldp.org]
Your apache processes are 10Mb, with ~6Mb shared, so count 5Mb for each apache child process. You currently have
which would take up to 2.5Gb memory, but you only have 1Gb. That is bad. You absolute upper limit will be MaxClients 200, but since you also have other things on the server, you'll have to go lower than that.
Your perl processes are small and share almost all the memory, so little is needed there, but they seem to take ~1% CPU each, so you don't want too many of them, since they would compete too much for the cpu.
Besides the processes you must always have spare memory for I/O buffers and cache.
Assuming you have one perl process for each request, my guess would be that you should use MaxClients 100. That'll require up to half your memory for apache alone and it won't let the perl programs eat up all the cpu. The remaining memory will be for perl and whatever else you have on the server, and for i/o and cache.
Better to be a bit too conservative and have a generally functional server all of the time, that set parameters too high, and have an unresponsive server at peak load. Better to serve those you can handle well and maybe lose some, that to lose them all because the server is swapping.
6:27pm up 6:45, 1 user, load average: 0.42, 11.31, 12.43
195 processes: 192 sleeping, 2 running, 1 zombie, 0 stopped
right now. Would night be the best time to make the change?
7:56pm up 1 day, 17:22, 1 user, load average: 0.25, 0.32, 0.31
152 processes: 149 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 11.6% user, 2.9% system, 1.3% nice, 83.9% idle
Mem: 1031180K av, 841932K used, 189248K free, 0K shrd, 48404K buff
Swap: 2064344K av, 41836K used, 2022508K free 437344K cached
Is MaxClients the number of users that can be loading a file at one time? I don't want to get it too low that visitors get a deny message from too many users trying to access the server at one time. I got 28 domains on the server.
You can calculate the capacity of your server (ignoring bandwidth issues) as I have explained earlier in this thread. I think you still have your MaxClients too high, though. I'd set it lower to make sure your server would never, ever swap.
Your demand, as opposed to your server's suppy, is a different thing. You say 28 domains, but that is not really a relevant number. I have a server with some 20 sites, but only two are really pulling some traffic. You need the total number of hits you server gets on all sites, split in intervals of, say, one hour. When you have that, find the one hour of day where you have the most traffic.
Lets say you have N hits in that hour, on all domains together. That translates to N/3600 hits per second for your server, assuming most hits take less than a second to satisfy. They should, if you don't want visitors to lose their patience.
Say your server gets 650,000 hits/day on the average, peaking at 1,000,000 on the worst day. Assume the hour with the most load has a tenth of that, 100,000 hits, which is probably set a bit high, since there are 24 hours in a day, but traffic is not spread out evenly over the day. The highest I have on a site, is a peak hour of 6.5% of total daily traffic.
Divide by 3600, and 100,000 hits/hour is 28 hits/second for the worst hour of the worst day. It is a worst case scenario. Even if you allow for some short term spikes in traffic, a MaxClient of 150 would serve you quite well 99.9% of the time. The remaining time will be less than two minutes daily where the configuration is unable to serve all visitors.
Now, your situation might be very different, depending on how long it takes to satisfy a request on the average, how many of you hits have for static cacheable files, how your traffic is distributed over time, but if you take the worst day of the month, and the worst hour of that day, and do the calculation above, you should be able to find the highest number of current clients your server has to withstand. Unless you run some really high traffic sites or have a strange traffic distribution over time, it whould be quite simple.
My experience is that it is much better to refuse a few visitors every once in a while than to have a server configuration that can bring the server to its knees by causing it to swap. The lower performance caused by swapping will turn away more visitors for a longer period of time that a very conservatively configured server, that occasionally turns away someone, but remains snappy at all time for the ones it serves.