lammert - 5:03 am on Jul 13, 2010 (gmt 0)
All my servers run on Linux, and therefore almost automatically all my web pages are served by Apache.
Apache is a stable and versatile web server and can be configured for almost every web serving task. The newest version 2.2 even has good functionality for caching, proxying and load-balancing. But since the beginning I am missing something and that is tuneability. The best part of Apache is that it can be used for almost everything, but that comes at a cost. The Apache processes tend to grow during runtime as they serve more page requests and at some time they need to be shut down just to reclaim the allocated memory which was used only once, hours or days before by one bulky server side script. Fortunately this shutdown and restart of Apache processes is an automatic process guarded by parameters in the configuration file, but it doesn't sound like an industry grade solution.
Another problem with Apache is that in the default prefork mode, A number of processes are spawned which are each able to only serve one request at a time. On busy websites many of those processes are running in parallel consuming huge amounts of memory, but in practice they are doing almost nothing. Many of the larger sites running on the web have hundred or more parallel Apache processes running. When looking at the activities of these processes with mod_status, you will see what these processes are doing: almost nothing. On a default web server with KeepAlive connections switched on, at least 80% of the "working" Apache instances is waiting to close a connection. Another bunch of idle processes is waiting for new incoming connections. Only very few are in a productive working state. Switching KeepAlive off will decrease the number of processes waiting for processes, but it will increase the number of independent connections coming in from every visiting client, thereby increasing the overall number of active Apache processes.
I am currently in a special situation with one of my servers. This server serves one low-traffic website which requires a significant amount of computing resources. Most of the processing is currently done in PHP, and some page requests require the PHP memory limit to be set at 128MB. Most page requests are however just static files like images and CSS files. The result is that after some visits to the site, I have Apache processes of 150MB each sitting in memory, doing not much more than serving image files or waiting for new or closing connections. This is a really silly situation. I like the versatility of the Apache server, but that shouldn't come at the cost of needing to allocate 150MB permanently to server 10kB files mostly. That is a 1:15000 waste ratio. I have tried to tune this a little bit by configuring Apache to kill processes as soon they reach 25 served requests and this helps a lot because that large script execution happens only once every now and then, but practically spoken I am just moving the load from the memory subsystem to the process manager. That is not a solution, it is just a way to obscure the poor resource management architecture of Apache.
I don't know what happens if this server gets more traffic. The disk and CPU subsystem are capable to manage a traffic increase with at least a factor 100, but I will run out of memory long before. And adding some small high-traffic websites to this server for a better utilization of resources is also no option, because those small sites will still be served by these voluminous Apache instances.
The ideal situation would be where the web server software is able to wait for new connections and closing connections without consuming a lot of memory, and only spawn a large subprocess when some significant computing needs to be done, but this is not how Apache operates. Some light weight web servers have been developed in the past which use such an approach, but they all had the problem of not being able to be configured for complex web server situations.
But there is hope.
Looking at the June 2010 Web server survey [news.netcraft.com] we see that the market is still dominated by Apache and IIS, but there are two other server software packages worth looking at. One is the in-house developed server software of Google and the other is Nginx.
Obviously Google won't release it's own web server software for your website unless you are willing to host on Google Sites, so the only alternative for a Linux environment seems to be Nginx.
Nginx is short for Engine X, and is a server/proxy type of software specifically developed to be used in high-traffic low-resource environments. Nginx is surprisingly developed and maintained by one man, Igor Sysoev, who wrote the software to support the websites of Rambler. Rambler is a Yahoo like company in Russia which is highly popular in the Internet community in that country.
So instead of being a piece of lightweight web software developed in a bedroom as a student assignment, Nginx was developed from the beginning to be able to cope with high load and complex website structures.
I am so fed up with an average of 15 Apache instance consuming continuously more than a gigabyte of RAM on my server that I will give Nginx a try the next days. Maybe the experiment will succeed, maybe not. But in any case it is worth trying a piece of software which was designed to use only resources as needed, instead of being forced to install a huge amount of memory, just because some developers decided to use a mediocre resource management architecture.
I'll keep you informed here.