Forum Moderators: phranque
I am serving websites websites from an Apache 2.0 server. These sites consist of a combination of static and dynamically generated content. Because a large percentage from my audience lives in parts of the world where internet speed is slow (dial in connections with 2 to 3 kByte per second max) I use Apache's compression feature to serve small files whenever possible. This is done with the mod_deflate package available in Apache 2.0. Users of Apache 1.3 could use mod_gzip instead.
Adding on-line compression is a nice feature but it has some drawbacks. One is, that mod_deflate compresses the content every time a browser requests it. This takes time and CPU overhead. For dynamic generated content there is no alternative, but static content is another issue. It doesn't feel good to see thousands of requests per day in your access_log file of the same static file where every time the compression algorithm is started to generate the same output.
Furthermore, the mod_deflate module available in Apache 2.0 doesn't generate the optimal compressed file. The results are worse than can be achieved with native gzip compression. I have set the compression level to 1 with the option DeflateCompressionLevel 1, but even when set to 9, the result is worse than regular gzip compression. For the files I tested the difference is somewhere between 10 and 25%. The size difference between compression level 1 and 9 is so small, that I decided to use the fastest compression method, thereby reducing CPU overhead and sending output quicker to clients on broadband connections. They will see an annoying delay when large chunks of data first have to be compressed at a high compression level.
The optimal solution worked out for .js files
I have thought of a solution of the problem mentioned above. The main problem where this occurs is with my external JavaScript files. They are quite large and most of them do not change quite often. However some do change often and I don't want to generate a compressed copy of the file every time I make a change. So ideally it would be a system that can handle both pre-compressed files which were processed with gzip at the maximum compression level, and plain JavaScript files which have no pre-compressed version and are handled by mod_deflate on the fly.
Content negotiation should be taken into account, because not all browsers accept compressed content. Because I have a separate directory for all my .js files, I tested the following code:
<Directory "/var/www/html/scripts">
ExpiresActive On
ExpiresDefault "access plus 2 hours"
Options +MultiViews
AddEncoding x-gzip .gz
AddType application/x-javascript .gz
BrowserMatch ^Mozilla/4 no-gzip
BrowserMatch \bMSIE!no-gzip
Header append Vary User-Agent
AddOutputFilter DEFLATE js
</Directory>
A small explanation:
The Expires directives are used to tell the browser and intermediate proxies that the retrieved content can be cached for the next two hours. This will cause the browser to used the cached version instead of asking the the server whenever a new page from the site is loaded which requires that JavaScript file. During one visitor session, there will normally be no need to retrieve a file multiple times. If however the visitor comes again another day, the setting of 2 hours will make sure he will receive a fresh version of the file.
The +MultiViews option switches automatic file search on whenever a file doesn't exist. For example if the file test.js is required but doesn't exist, Apache will search for all files with the pattern test.* and decide which one to return to the browser.
The AddEncoding and AddType directives are used to tell that all the .gz files in the directory are in fact JavaScript files. If these lines are not present, Apache will return the default application type application/g-zip which is not understood by the browser.
The BrowserMatch directives disable compression for specific old versions of browsers with bugs. The Header append directive tells intermediate caching proxies that the content may change depending on the browser identification string.
The last directive adds the default mod_deflate compression filter for regular js files found.
Process Flow if .js file exists
If the browser requests a .js file which exist, the browser type is checked to see if it is an old version. Furthermore Apache looks if the browser accepts compressed content. If both are OK, the .js file is compressed by mod_deflate and sent to the browser. If compression is not possible, to script file is sent without modifications.
Process Flow if .js file does not exist
The browser request for a non-existent file (for example test.js) causes the MultiViews system to start. This file looks for all files with the pattern test.js.* and checks which file types are accepted by the browser. Either the plain version (test.js.en), or the gzipped version (test.js.en.gz) is returned.
Remaining questions
The system as mentioned above works. I have however some questions and remarks left for optimizing.
Having only read it twice, and with little time to think about it deeply, I'd just like to offer one very general idea. You might find some benefit from using mod_rewrite's RewriteCond %{REQUEST_FILENAME} -f and RewriteRules with the [T=] (type) flag to solve some of your file-exists and MIME-type variations. You could in fact replace content-negotiation with a series of 'file-exists' checks, with rewrites to the preferred files/filetypes based on the results. This could be 'centrally-controlled' with one or more RewriteMaps.
However, I have no idea what the performance implications would be.
Jim
... the mod_deflate module available in Apache 2.0 doesn't generate the optimal compressed file. The results are worse than can be achieved with native gzip compression.
How did you measure the compression results, lammert? Did you use DeflateFilterNote [httpd.apache.org] to place the compression ratio in a note for logging? The compression algorithm (deflate) used by gzip (also zip and zlib) is likely the same [gzip.org], that is why I wonder which method you were using to note the compression percentage differences. There is a note in the zlib FAQ though that I found interesting, thought it would be a good place to share this:
Ok, so why are there two different formats? [zlib.net]The gzip format was designed to retain the directory information about a single file, such as the name and last modification date. The zlib format on the other hand was designed for in-memory and communication channel applications, and has a much more compact header and trailer and uses a faster integrity check than gzip.
Apache configure [httpd.apache.org] searches automatically for an installed zlib library if your source configuration requires one (e.g., when mod_deflate is enabled). I was wondering if you had a choice, you know? Guess not. It makes sense though, once you think about what is happening, the page being compressed in memory and all. Guess the contributing developers knew what they were doing, eh? ;-)
MultiViews versus type-maps -- good question. Apache has definitely made it easy to setup Content Negotiation for a lazy programmer. I know of a large site running MultiViews with major traffic. Expensive? Well, if it is, it is certainly quite difficult to notice. I have never tested an installation with both structures setup to see which content is drawn or how Apache would determine priority. But I'm guessing as you are, it's either one way or the other because if it finds a type-map it certainly shouldn't need to build one as a fallback. As I said though, I've never tested the theory. As far as maintaining the type-map manually, if you are adding a new file to the directory, wouldn't you just update the type-map at that time as well?
I recently installed a new Apache version and expected the ratio to be the same. Therefore I didn't rerun the test. Because of your post I just reran the tests I did preciously. Now both gzip and mod_deflate obviously use the same routines. I used the transmitted size in the access_log as the compression size for mod_deflate.
Original: 23944 bytes
level 1 - mod_deflate: 7133 bytes
level 1 - gzip: 7141 bytes
level 9 - mod_deflate: 5866 bytes
level 9 - gzip: 5858 bytes
The remaining difference in length could well be the extra path information as you suggested.
So this cleared one problem, the absolute difference between gzip and mod_deflate compression. I am currently running some benchmarks with different setups to see how compression and MultiViews influence performance. As soon as something reasonable comes out I will tell it here.