|Switching from Apache to Nginx|
| 5:03 am on Jul 13, 2010 (gmt 0)|
All my servers run on Linux, and therefore almost automatically all my web pages are served by Apache.
Apache is a stable and versatile web server and can be configured for almost every web serving task. The newest version 2.2 even has good functionality for caching, proxying and load-balancing. But since the beginning I am missing something and that is tuneability. The best part of Apache is that it can be used for almost everything, but that comes at a cost. The Apache processes tend to grow during runtime as they serve more page requests and at some time they need to be shut down just to reclaim the allocated memory which was used only once, hours or days before by one bulky server side script. Fortunately this shutdown and restart of Apache processes is an automatic process guarded by parameters in the configuration file, but it doesn't sound like an industry grade solution.
Another problem with Apache is that in the default prefork mode, A number of processes are spawned which are each able to only serve one request at a time. On busy websites many of those processes are running in parallel consuming huge amounts of memory, but in practice they are doing almost nothing. Many of the larger sites running on the web have hundred or more parallel Apache processes running. When looking at the activities of these processes with mod_status, you will see what these processes are doing: almost nothing. On a default web server with KeepAlive connections switched on, at least 80% of the "working" Apache instances is waiting to close a connection. Another bunch of idle processes is waiting for new incoming connections. Only very few are in a productive working state. Switching KeepAlive off will decrease the number of processes waiting for processes, but it will increase the number of independent connections coming in from every visiting client, thereby increasing the overall number of active Apache processes.
I am currently in a special situation with one of my servers. This server serves one low-traffic website which requires a significant amount of computing resources. Most of the processing is currently done in PHP, and some page requests require the PHP memory limit to be set at 128MB. Most page requests are however just static files like images and CSS files. The result is that after some visits to the site, I have Apache processes of 150MB each sitting in memory, doing not much more than serving image files or waiting for new or closing connections. This is a really silly situation. I like the versatility of the Apache server, but that shouldn't come at the cost of needing to allocate 150MB permanently to server 10kB files mostly. That is a 1:15000 waste ratio. I have tried to tune this a little bit by configuring Apache to kill processes as soon they reach 25 served requests and this helps a lot because that large script execution happens only once every now and then, but practically spoken I am just moving the load from the memory subsystem to the process manager. That is not a solution, it is just a way to obscure the poor resource management architecture of Apache.
I don't know what happens if this server gets more traffic. The disk and CPU subsystem are capable to manage a traffic increase with at least a factor 100, but I will run out of memory long before. And adding some small high-traffic websites to this server for a better utilization of resources is also no option, because those small sites will still be served by these voluminous Apache instances.
The ideal situation would be where the web server software is able to wait for new connections and closing connections without consuming a lot of memory, and only spawn a large subprocess when some significant computing needs to be done, but this is not how Apache operates. Some light weight web servers have been developed in the past which use such an approach, but they all had the problem of not being able to be configured for complex web server situations.
But there is hope.
Looking at the June 2010 Web server survey [news.netcraft.com] we see that the market is still dominated by Apache and IIS, but there are two other server software packages worth looking at. One is the in-house developed server software of Google and the other is Nginx.
Obviously Google won't release it's own web server software for your website unless you are willing to host on Google Sites, so the only alternative for a Linux environment seems to be Nginx.
Nginx is short for Engine X, and is a server/proxy type of software specifically developed to be used in high-traffic low-resource environments. Nginx is surprisingly developed and maintained by one man, Igor Sysoev, who wrote the software to support the websites of Rambler. Rambler is a Yahoo like company in Russia which is highly popular in the Internet community in that country.
So instead of being a piece of lightweight web software developed in a bedroom as a student assignment, Nginx was developed from the beginning to be able to cope with high load and complex website structures.
I am so fed up with an average of 15 Apache instance consuming continuously more than a gigabyte of RAM on my server that I will give Nginx a try the next days. Maybe the experiment will succeed, maybe not. But in any case it is worth trying a piece of software which was designed to use only resources as needed, instead of being forced to install a huge amount of memory, just because some developers decided to use a mediocre resource management architecture.
I'll keep you informed here.
| 1:30 pm on Jul 13, 2010 (gmt 0)|
Great, looking forward to it. Have you reviewed or considered lighttpd [lighttpd.net]?
| 2:30 pm on Jul 13, 2010 (gmt 0)|
I have looked at lighttpd a few years ago but that was for a different setup for a high performance image server. In the current setup I need things like language negotiation. That's not a native functionality of Nginx, but I have figured out that with some server side scripting in the Nginx configuration it should be possible to deliver different language files based on content headers.
| 4:51 am on Jul 26, 2010 (gmt 0)|
The first main problem I have encountered is deciding how the PHP scripts will be run by Nginx.
Plain CGI: The oldest way to call scripts from a web server is with the CGI interface. This interface which was defined in the early nineties spawns an external process for every script which has to be executed. The interface is simple and inter-process communication is straightforward with only one process reporting it's output back to the calling web server when exiting. The main problem with old-fashioned CGI is however the overhead for every process call. During every call a separate process has to be started, the PHP script has to be interpreted, MySQL database connections have to be arranged and when the script exits, all memory has to be reclaimed by the operating system again.
This causes a significant overhead, especially on high-traffic servers. For the specific situation where I want to use Nginx, this might however be a solution, because the main reason to try the test with Nginx is to reduce memory overhead, not process overhead. If a PHP script grows big during execution, it will only occupy its memory during the execution of the script.
FastCGI: As an improvement of the plain CGI interface, the FastCGI interface has been developed. Instead of creating a single process for every script to be executed, the FastCGI interface controls a pool of waiting PHP instances which execute a script whenever the web server receives a request for it. The idea behind it is that there is no overhead anymore for creating and stopping a process and initializing the PHP interpreter. Also connections with a MySQL server can be maintained between script calls, lowering the overhead in the PHP to database communication. The drawback of this approach is that once a PHP instance grows large because it has executed a bulky script which needed a lot of memory to process, that memory will stay occupied until the PHP process is killed.
mod_php: The most common used method to execute PHP scripts currently is the mod_php module which plugs directly into Apache. Instead of executing PHP via an interface, PHP is integrated in the web server. Each Apache web server instance runs one PHP interpreter which causes the fastest possible response time, but also the largest overhead. On a website which mainly serves images and occasionally executes a script, this causes significant overhead regarding memory usage. Every instance of Apache will eventually grow big because of the script execution, even if it will only serve small images most of the time.
It is obvious that mod_php is not available as an Nginx plug-in, but both implementations of plain CGI and FastCGI can be used with this web server.
Comparing with Apache, using FastCGI will reduce the number of PHP capable processes sitting in memory, but it won't necessarily reduce the size of each single process. This is because the FastCGI PHP process uses the same technology to stay resident in memory and wait for new requests as the mod_php implementation does. The difference is that the each mod_php instance is glued to a web server instance while with FastCGI the number of available PHP processes is independently regulated from the number of web server processes. The gain in memory usage should therefore come from a better utilization of existing PHP processes, rather than a reduction of the size of each individual process.
Plain CGI execution has the advantage with Nginx that memory is only occupied by PHP during the execution of a script. If small script is executed the memory usage of that specific PHP instance will be fairly low and large scripts will only occupy memory during the execution phase of the script. The main disadvantage of process overhead is only relative in our specific situation. The website is dealing with a low amount of traffic, but each visit requires a significant amount of server side processing with a few scripts. The relative overhead of the startup of new processes is therefore relatively low, at least much lower than it is with a regular website which executes continuously a large number of small scripts.
The road plan as I have set it now is therefore the following.
- Install Nginx on the same server as Apache on a separate IP listening to port 80
- Configure Nginx to use the current PHP base files in a plain CGI configuration
- Measure its performance, both regarding speed and memory usage
- Configure Nginx to use FastCGI
- Measure its performance
- If there is a significant improvement of Nginx over Apache in one of the two instances, try to use CGI or FastCGI with Apache for the critical scripts and see how a hybrid configuration of Apache (partly mod_php, partly CGI) performs both memory wise and in speed.