homepage Welcome to WebmasterWorld Guest from 54.237.98.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
High Performance Cache with mod-rewrite?
How to deliver dynamic content statically, plain or compressed if possible
spyder

10+ Year Member



 
Msg#: 1098 posted 12:33 am on Feb 19, 2004 (gmt 0)

Hello everyone,

I'm looking for a high performance solution to deliver cached pages, compressed if possible, uncompressed otherwise, or dynamically generated content whenever content has changed.

Right now, all content of my high traffic bulletin board is parsed through a forum.php script, which looks at the $_SERVER['HTTP_ACCEPT_ENCODING'] variable to determine whether to send a plain .html file or a zipped one if the browser can handle it. Then, the script looks in a cache directory to find out, whether a cached version is already present. If it is, fine, deliver it. If not, the appropriate script is called to deliver the frontpage, a forums content or a thread. Besides delivering the page, each script also stores a plain and a zipped version in the cache directory.

Whenever a page needs an update, the script simply deletes the corresponding file in the cache directory and it will be recalculated the next time it is requested.

So far so good, or not?

I think, calling a php script for requesting a cached file is not the perfect solution, even more so, as I have to use php as a cgi, not an apache module. My investigation in mod-rewrite led me to do something like this in my /forum directory:

    RewriteEngine On
    RewriteBase /forum
    RewriteRule ^(.+)/$ mycache/forum/$1.html [T=text/html]
    RewriteRule ^$ mycache/forum.html [T=xhtml]

Now

  • /forum/ will be fetched from /mycache/forum.html,
  • /forum/topic1/ from /mycache/forum/topic1.html,
  • /forum/topic1/123/ from /mycache/forum/topic1/123.html, etc.
    (yes, all my files look like directories from the outside).

If the requested file in the mycache directory doesn't exist, a 404 error needs to be avoided by placing another .htaccess file in the mycache directory to handle this and call my forum.php script.

Now my questions:

  1. Is mod_rewrite able to act according to the browsers ability to handle compressed pages? (If not, I can think of a cookie delivering this information from the second page on the user asks for).

  2. How should the .htaccess file in the mycache directory redirect to my forum.php script without the disturbing 404 header?

  3. If question 2 can be handled, what about the performance of this solution compared with something like:

    RewriteCond %{REQUEST_URI} ^/(.+)/$
    RewriteCond path/to/root/forum/mycache/%1.html -f
    RewriteRule ^.*$ mycache/%1.html

The first solution is imho the fastest way to an existing cached version posing the need for an ugly 404-handler in case it doen't exist. The second way avoids the 404-thing, but I guess the price tag is a loss of performance, as each file needs to be looked up in the mycache directory first.

Both, cpu load/RAM limitations as well as bandwidth concerns apply in my case, so I look for a solution to handle plain and compressed cache without unneccessary calling of scripts. Unfortunately, I don't have the option to change the configuration of my apache server, at least it understands mod_rewrite.

What should I do? Any idea is highly appreciated.

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1098 posted 3:07 pm on Feb 19, 2004 (gmt 0)

spyder,

Welcome to WebmasterWorld [webmasterworld.com]!

I've never tried what you're doing, but I'll offer a couple of comments.

You may be able to check the browser capabilities with something like:
RewriteCond %{HTTP_ACCEPT_ENCODING} gzip

You can avoid the need for the URL-to-filename conversion rule in your question 3 example by using
RewriteCond %{REQUEST_FILENAME} -f instead of the RewriteCond %{REQUEST_URI} construct shown.

The "-f" check for file exists should not be too much of a problem. After all, the server itself has to check this in order to decide whether to return a 404 or serve the file. It's likely you'll only have performance problems if you use "-F" or "-U", which invoke an internal subrequest and are therefore relatively slow. Set up your rules such that they will match only cacheable subdirectories and filetypes for maximum efficiency.

In addition, you could invoke a script to create new cached version if the -f fails. Actually, you'd probably want to go ahead and serve the plain version, but "trigger" a script to create a new cached version of the file that was requested. All kinds of options here, all based on performance-tuning...

I'd be very interested in a summary of what you learn while implementing this project -- and others may be, too; We've had some general discussion of using gzip recently, so maybe the ideas of caching and compressing are finally gaining momentum.

Jim

spyder

10+ Year Member



 
Msg#: 1098 posted 5:55 pm on Feb 19, 2004 (gmt 0)

Thanks Jim,

you helped me already very much. I read about a slowdown by using RewriteCond %{REQUEST_FILENAME} -f, but I think I was missguided and confused with "-U", which you mentioned as slowing down by invoking an internal subrequest.

The reason why I use:

    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteCond path/to/root/forum/mycache%1.html -f

instead of:

    RewriteCond %{REQUEST_FILENAME} -f

is the following: My %{REQUEST_URI} or %{REQUEST_FILENAME} ends with a slash, all my URLs look like directories for historical reasons. E.g. I have three URLs to cache:

    /forum/
    /forum/topic1/
    /forum/topic1/123/

I planned to store them as

    /mycache/forum.html
    /mycache/forum/topic1.html
    /mycache/forum/topic1/123.html

instead of

    /mycache/forum/index.html
    /mycache/forum/topic1/index.html
    /mycache/forum/topic1/123/index.html

to avoid unneccessary creation of subdirectories. Therefore, the slash at the end needs to be cut off before adding .html. Is there another, sleeker way to do that?

I tried RewriteCond %{HTTP_ACCEPT_ENCODING} gzip, but it didn't work. On page 12/13 of the Apache documentation (mod_rewrite.pdf) there is a list of the available server variables:

%{NAME_OF_VARIABLE} where NAME_OF_VARIABLE can be a string taken from the following list:

HTTP headers:

    HTTP_USER_AGENT
    HTTP_REFERER
    HTTP_COOKIE
    HTTP_FORWARDED
    HTTP_HOST
    HTTP_PROXY_CONNECTION
    HTTP_ACCEPT

connection & request:

    REMOTE_ADDR
    REMOTE_HOST
    REMOTE_USER
    REMOTE_IDENT
    REQUEST_METHOD
    SCRIPT_FILENAME
    PATH_INFO
    QUERY_STRING
    AUTH_TYPE

server internals:

    DOCUMENT_ROOT
    SERVER_ADMIN
    SERVER_NAME
    SERVER_ADDR
    SERVER_PORT
    SERVER_PROTOCOL
    SERVER_SOFTWARE

system stuff:

    TIME_YEAR
    TIME_MON
    TIME_DAY
    TIME_HOUR
    TIME_MIN
    TIME_SEC
    TIME_WDAY
    TIME

specials:

    API_VERSION
    THE_REQUEST
    REQUEST_URI
    REQUEST_FILENAME
    IS_SUBREQ

I think, %{HTTP_COOKIE} will do the job, not on the first page a user requests (having no cookie), but on every subsequent one after I've set a cookie according to the browsers ability.

After searching a little bit more, I stumbled over the apache docs about mod_rewrite with the following subheader: On-the-fly Content-Regeneration ([engelschall.com ]). They call it esoteric :-) but it is pretty much the same idea.

so maybe the ideas of caching and compressing are finally gaining momentum.

You say it. My website is mainly a bbs with 20 posts per page and 5 mio page impressions per month with a bandwidth of 50 gigs. This means about 10 kbyte per page including graphics and everything.

Sounds impossible? No, it's not.

I use graphics sparely, external cascading stylesheets extensively (cachable), no table constructs and other waste of code, all navigational things stuffed into an external javascript (cachable), gzipped whenever possible, and everything works like a charm... except... I have a little too much RAM consumption on a 256 mbyte RAM linux server causing "CGI limits reached" repeatedly during the rushhour.

I'll keep you updated on my experience.

PS: %{HTTP_ACCEPT} allows testing for graphic formats and e.g. whether a client is accepting shockwave, but not for gzip support.

spyder

10+ Year Member



 
Msg#: 1098 posted 11:32 pm on Feb 19, 2004 (gmt 0)

Testing for the existance of a cached version and calling a script if not found works smoothly right now using:

    RewriteEngine On
    RewriteBase /forum
    RewriteCond %{REQUEST_URI} ^(.*)/$
    RewriteCond path/to/root/forum/mycache%1.html -f
    RewriteRule ^.*$ mycache%1.html

    RewriteRule ^([^/]+)/$ forumlist.php?myforum=$1
    RewriteRule ^([^/]+)/(\d+)/$ thread.php?myforum=$1&mythread=$2

    DirectoryIndex index.php

The next problem now is: How do I do the same with a zipped version? To create a zipped file with php I do:

    $text = "Hello World";

    $zipped = "\x1f\x8b\x08\x00\x00\x00\x00\x00";
    $zipped .= substr(gzcompress($text, 2), 0, -4);

    $fp = fopen("$_SERVER[DOCUMENT_ROOT]/forum/mycache/forum/test.zip", 'w');
    flock ($fp,2);
    fwrite($fp, $zipped);
    flock ($fp,3);
    fclose($fp);

If I want to send it out with php I do:

    header('Content-Encoding: gzip');
    print rtrim(readfile("$_SERVER[DOCUMENT_ROOT]/forum/mycache/forum/test.zip"),"\n\r");
    exit;

So how should I do it now? Writing the header into the .zip file, send the header with mod-rewrite? Right now I see binary output in my browser if I try to read the zipped file with mod_rewrite.

Thanks in advance

spyder

10+ Year Member



 
Msg#: 1098 posted 12:19 am on Feb 23, 2004 (gmt 0)

As promised, I just wanted to let everyone know about my "esoteric" cache 'n' zip rewrite rules.

First of all I finally found a direct way to check the browsers ability to handle zipped content. It's the %{HTTP:Accept-Encoding} variable I had overlooked before, similar to what Jim had suggested. Secondly, we need to tell the browser about the zipped nature of the content (encoding) in case we deliver a zipped file. And finally we should tell the browser what kind of content (text/html) we deliver.

All together we get a code like:

    AddEncoding x-gzip .zip
    AddType text/html .zip
    AddType text/html .txt

    RewriteEngine On
    RewriteBase /forum

    RewriteCond %{HTTP:Accept-Encoding} gzip
    RewriteCond %{REQUEST_URI} ^(.*)/$
    RewriteCond /path/to/root/mycache%1.zip -f
    RewriteRule ^.*$ /mycache%1.zip [L]

    RewriteCond %{HTTP:Accept-Encoding} !gzip
    RewriteCond %{REQUEST_URI} ^(.*)/$
    RewriteCond /path/to/root/mycache%1.txt -f
    RewriteRule ^.*$ /mycache%1.txt [L]

    RewriteRule ^([^/]+)/$ forumlist.php?myforum=$1 [L]
    RewriteRule ^([^/]+)/(\d+)/$ thread.php?myforum=$1&mythread=$2 [L]
    ...

Alternatively, one could replace the last two (or more) lines by a redirect to a wrapper script, which decides what to do and which cares for the newly creation of cached file versions.

    RewriteRule ^(.*)$ /forum.php?$1 [L]

I'm using these settings now and experience a slight but noticeable reduction in reaction time. More important is the reduction of the server load which has remarkably reduced the traffic jam (CGI limits reached) in my websites rush hours.

Good luck.

PS: I was not able to use %{DOCUMENT_ROOT} in my rewrite rules, instead I had to hardcode /path/to/root/, as this was quite different. I have no idea why %{DOCUMENT_ROOT} is different from PHP's $_SERVER['DOCUMENT_ROOT']. And %{ENV:DOCUMENT_ROOT} did not show any value.

ruserious

10+ Year Member



 
Msg#: 1098 posted 10:20 am on Mar 10, 2004 (gmt 0)

Is there a special reason why you are not using the caching-classes for php that are available? There is for exampe jpcache and there are also two PEAR-Classes that offer caching functions.
I am very happily using jpcache. Adding it is very easy, with auto_prepend you won't even have to edit your scripts...

spyder

10+ Year Member



 
Msg#: 1098 posted 3:39 pm on Mar 12, 2004 (gmt 0)

Is there a special reason why you are not using the caching-classes for php that are available? There is for exampe jpcache and there are also two PEAR-Classes that offer caching functions.
I am very happily using jpcache. Adding it is very easy, with auto_prepend you won't even have to edit your scripts...

There are several ways to do that with Perl and PHP. Unfortunately, on my old server both Perl and PHP had been invoked as a CGI and not as an apache module. Thus, on my heavy traffic forum I had a lot of problems with CGI limits. It was my assumption, that doing the job without starting PHP for every single page request would improve my performance.

Now I got a better server with 4 times RAM and Perl/PHP as apache modules. Looking back, I cannot definitely say whether my caching mechanism was improving the performance on the old server. The time to judge was just too short. But I am still curious, whether serving the cached pages would be a win in performance in this situation.

The downside of the described solution: I cannot control the headers on my managed server when reading files directly (this would be different if I could change the settings of my apache, but unfortunately I cannot). So some pages are cached on the user side or by proxies in some cases where I don't want them to do. Meta tags didn't really help me here as they are frequently ignored by e.g. proxies.

PHP would allow my to influence the header, so I'm thinking of using the PHP built in mechanisms you mentioned.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved