Welcome to WebmasterWorld Guest from 34.229.113.106

Forum Moderators: open

Message Too Old, No Replies

How hard should push your server before upgrading

     
5:49 pm on Mar 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2612
votes: 763


I just changed server from sharing space on someone else's server to a VPS. The VPS provides great analytics that allows me to see CPU, Memory and other kpi's. I can also very easily upgrade plans that provide more disk space, memory cpu's etc...

So at my current traffic levels I am pushing the server pretty hard (75% to 95% cpu usage) but I have no idea how hard I can actually push it before it starts causing issues like slowing page speeds.

I have very little to no experience managing my own server. Thus far I have two other sites running on VPS but those machines have been well over-sized for the traffic they have been getting.

How do you judge server performance?
What are performance indicators to watch for?
What are the trade-offs?

Basically, an upgrade is an extra 20$ a month, is there any value in spending the extra cash?
7:49 pm on Mar 17, 2018 (gmt 0)

Junior Member

joined:Feb 22, 2018
posts:146
votes: 22


What are performance indicators to watch for?

On Linux, the load average. [en.wikipedia.org...]

Beside CPU, bottleneck can be also the disk I/O , and the RAM usage.
3:00 pm on Mar 18, 2018 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1315
votes: 414


Your simple question has a not so simple answer. :)

My personal preference is that CPU usage normally not exceed 50%, the rationale being that it leaves 50% for an unexpected traffic surge (I got slashdotted a few times back in the day). That said the critical concern is that frequent or lengthy 90-100% usage can cause hardware reliability problems as well as slow or no connection for visitors.

Before simply 'upgrading' I'd suggest considering why and other potential solutions. So step one is to identify bottlenecks. As TravisDGarrett mentions it's not just processing usage that should be considered but memory, I/O...
Note: it is a good idea to do this on a regular basis just as a maintenance check to keep things well tuned.

1. it is rarely an OS problem per se, however it is common for a particular module or service to become a resources hog. Default OS installs often load things that you don't actually need. A quick search typically lists various bits that are suggested be removed or default settings changed. However, some commonly suggested may well impact other services such as backups or security. That said all popular OS can often be fine tuned to your specific requirements with double digit improvements.

2. consider adding/maximising/optimising page caching/serving; this may require upgrading/increasing memory.

3. consider offloading the DB (or images, video, etc.) to a separate real or virtual server; may be a better scaling solution than simply going (all on one) larger.

4. consider switching from HTTP/1.1 to HTTP/2 (note: requires HTTPS as well) to both minimise (so increase potential) connections and (typically) increase speed.
Note: a site that has optimised for HTTP/1.1 will be severely NOT optimised for HTTP/2; it requires an entire different optimisation setup.

5. consider adding/inproving bot blocking. The average small to medium website's traffic is 50-90% bots. A very rough rule of thumb is that half of these are obvious and a good host has already blocked them dropping the bot traffic to 25-45%. Half of those remaining can be fairly readily identified and blocked with not too much effort and fairly simple tests. Leaving 12-23% of traffic as bots you want, i.e. googlebot, and bots that are much more difficult to identify.

If one does decide on bot blocking identifying those that are of value, i.e. from SEs, ad/media trackers, and those that are not - to you - is as important as identifying those you want to boot. Unfortunately, what was state of the bot art a few years ago is now off the shelf; fortunately many bot herders are lazy/incompetent. Regardless the bot war is harder every year.

6. Etc.

A webdev's life: always something, never one size fits all. :)
4:03 pm on Mar 18, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:2091
votes: 370


It depends on the number of CPU cores you have available. With one core, your max CPU usage will be 100%, and 75-95% would be too close for comfort. With 4 cores, however, since it's 100% per code, it's not even 1/4 of the total processing power (on Linux, anyway).

If you run the top command, you can get a quick idea of what the CPU is doing:
Cpu(s): 1.0%us, 0.6%sy, 0.0%ni, 98.2%id, 0.1%wa, 0.0%hi, 0.0%si, 0.2%st

If the "wa" (i/o wait) percentage is high, your disk can't keep up*. If the "us" (user-space programs) percentage is high, use the > or < key until the list of programs is sorted by %CPU, then see if you can identify the main culprit(s). And "id" stands for % idle, so there a high value is good to have.

(* Possibly because you're running out of RAM and the system is swapping.)

I don't like to upgrade a server unless it has been optimized (including the site it runs). That means updating software where possible, looking into the efficiency of code and queries, as well as options for memory caching. Focus on the worst offenders first, sometimes it can be small things.
1:44 am on Mar 19, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9880
votes: 967


If you aren't doing bot management, do try that as soon as possible. Will make the biggest difference immediately. If you ARE managing bots then look to optimizing ram usage and cacheing. These also give the most immediate results.
2:35 am on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2612
votes: 763


Thanks for all the great input.

the biggest issue I am having is when you say:
If the "wa" (i/o wait) percentage is high,

I don't know what high is, or more specifically I don't know what too high is. As I'm writing this I'm in my peak traffic of a typical day the "us" figure peaks up to 85 or 90 but the peak is short lived. Most local peaks are reaching about 75 every second and dips are at 60 to 65.

"wa" is at zero, as everything should be happening in memory, except in the event of ram spike when it will swap. But again, how much swapping is acceptable?

As for the topic of bots, I'm not actively blocking or monitoring them, but I casually go over my logs to look for anomalies, so far all I see is voracious crawling by Googlebot with some interest from the others, Bing, Baidu and Yandex.
3:24 am on Mar 19, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9880
votes: 967


If that is all you see then you might not be looking hard enough. My bot traffic is AT LEAST 40% at all times and spikes up a bit when a new one shows up (every other day).
10:13 am on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:2091
votes: 370


You can tell from the list of processes in "top" which programs require the most processing power, and you might be able to better focus your optimization efforts based on that. It could be MySQL, or PHP, or something else, depending on your set-up. Sounds like it's mostly just your website, and not an issue with a slow disk or insufficient memory. Occasional swapping shouldn't be a problem if you have SSD-based storage.

With 85-90% peaks, while technically still within capacity, it's inevitable that you'll hit 100% occasionally, and response times may slow down a little bit. At that point it's quite easy for the server to be toppled over at the hands of a rogue bot or user.

As TravisDGarrett noted, the load average (e.g. "load average: 0.10 [1 min], 0.03 [5 min], 0.01 [15 min]") is also a good indicator, perhaps better than looking at the peaks. I don't let my 5+ minute averages get anywhere near 10%, helps me sleep at night :-)

A monitoring service like Munin or Nginx Amplify might give you a better overall view of your server's health.
3:56 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2612
votes: 763


@robzilla
What are the units of the load, is 0.10 == 10%, my understanding is that 1 = 100% of one CPU, 2 = 100% of two CPUs. So are you saying that on a system with 2 CPUs you would be considering an upgrade if you found the 5min load at a value above 0.20?
5:02 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:2091
votes: 370


Your understanding is correct. 1 CPU at full capacity = 100% or 1.00, and 2 CPUs at full capacity would equal 200% or 2.00, etc. The percentages can be confusing, but that's just how it's calculated on Linux. Both examples essentially mean you're at 100% of your processing power.

I wouldn't consider upgrading at an avg. load of 0.2 (20%), especially on a system with multiple cores, since that's still well within capacity. As a hypothetical rule (since I never see such high loads, not enough traffic I guess), anything above 70% of total processing power I would certainly take action on. If I've optimized everything, then I would be forced to upgrade the server to maintain good site health.

Also, I would prefer to be looking at the trend rather than any particular moment. A server monitoring service like Munin [demo.munin-monitoring.org] can show you what your average load (as well memory usage, network throughput, etc.) looks like over time.
6:02 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2612
votes: 763


Munin looks very similar to what is provide for free by my vps provider. They also provide a pro-version at a cost.
8:34 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:2091
votes: 370


Whatever gets the job done :-) Should give you a good view of resource usage.

Sounds like Linode, by the way. A $20 upgrade means you're now on the 2-core plan, so if you're reading load averages of 0.8-0.9 you're not even using half of the available processing power, i.e. you have a full core to spare, so (if my assumptions are correct) you're not in the danger zone.

Still, if you're expecting growth, see if you can bring that CPU usage down a notch. If you're running PHP, for example, try upgrading to PHP 7.x [webmasterworld.com] if you haven't already. But if you're new to all this, just start out getting familiar with Linux and the command line.
9:55 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2612
votes: 763


Yes Linode. No, I'm not reading 0.8 or 0.9. It is actually at 90%. In fact, above 1.9 on all three load readings right now. I have been seeing a some steady traffic growth over the past few days. I'm not sure if it is due to the Google update that occurred last week, or due to my website upgrade that coincided with the Google Update, or both. But at the next opportunity I will need to step-up to the next plan.

No PHP or mySQL. I'm running Python with a MongoDB backend, Apache. I'm fairly confident that things are running as efficiently as possible, any improvements would be marginal at best. The new version of my site is in AMP, so at some point if the pages are indexed and cached by Google, that might relieve some load. But I think its clear I'm red-lining the server pretty hard.
10:13 pm on Mar 19, 2018 (gmt 0)

Junior Member

joined:Feb 22, 2018
posts:146
votes: 22


If your pages are dynamically generated (without caching the result), you can time how long the page is taking to generate, like that you can see over the time, if it's taking more time or not.

You can also check the Google Search Console, in the Exploration statistics and see the evolution of the downloading time from Googlebot.
10:22 pm on Mar 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 25, 2005
posts:2091
votes: 370


Sounds like good news to me :-) Throw the extra $20 at it and you'll roughly be at 45% of processing power, so yes there's definitely value in that.

Maybe look into cProfile [docs.python.org] to profile your Python code if you haven't already. I do the same with PHP, huge help.