Forum Moderators: phranque
A few days ago, i was given the "out of the box" .htaccess you can see below, and i put it at the root of my 80 000 pv/day website.
It was like it's working perfectly, till it overloaded then crashed my dedi (unix/apache) server.
I made a lot of long and unsuccessfull tests on my php scripts before finaly discovering that once this .htaccess removed , server CPU usage imediatly fell from 99% to 5%!
Since this moment, i have been spending my time searching what was so server intensive, and how to build a smoother .htaccess.
Many of you probably know or can imagine how it can be when you need to stop a new cool and addictive service you had opened to your site members the day before...
Enough talk, here is the .htaccess i was given :
RewriteEngine On
Options +Followsymlinks
RewriteBase /
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.* - [L,QSA]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/page([^/]+)/?$ /index.php?u=$1&page=$2 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/([^/]+)/([^/]+)/?$ /archive.php?u=$1&y=$2&m=$3 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/([^/]+)?/?$ /archive.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/([^/]+)/?$ /entry.php?u=$1&e_id=$2 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/?$ /index.php?u=$1 [L]ErrorDocument 404 /404.htm [L]
here is what i found at the begining of the official Apache url rewriting guide [httpd.apache.org] :
The crazy and lazy can even do the following in the top-level .htaccess file of their homedir. But notice that this creates some processing overheadRewriteEngine on
RewriteBase /~quux/
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]
I wonder if the problem is the use of %{REQUEST_FILENAME} on a top directory, or maybe the way it was used in my .htacces? or if something was bad written, or if the overload was due to the fact that we have a forum directory that has its own .htaccess and it may have caused recursions?
Here is a new .htacces i wrote to replace the cpu burning one. As you can see some things are not processed no more by .htaccess, but by php. Don't mind the comments and white lines, i will delete it when i upload the file to my server. It is only here to help understanding :
RewriteEngine On
Options +Followsymlinks# checks before all if we are calling a physical file or directory
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f# if yes stop and do nothing
RewriteRule ^.*$ - [L]# then checks if it is a "/username/archive/2005/4/" like page
RewriteRule ([^/]+)/archive/([^/]+)/([^/]+) archive.php?u=$1&y=$2&m=$3 [L]# then is it a "/username/something/" or "/username/something" like page
RewriteRule ([^/]+)/([^/]+) index.php?u=$1&task=$2 [L]# then finaly is it the user homepage ( "/username/" or "/username")
RewriteRule ([^/]+) index.php?u=$1 [L]
So what can you say? Can you help? Thanks.
[edited by autor to correct a code]
Welcome to WebmasterWorld!
The basic problem here is that there is a check for either "file exists" or "directory exists" before every rule. This results in a call to the filesystem to see if that resource exists for every request. There are other ways to identify the requests which should not be handled by your php scripts.
You might be better off actually excluding all of the resources that you know exist and should not be handled by your scripts, rather than having the server try to find them first. Examples would be images and multimedia files, CSS, external JavaScript files, etc. Just for example:
RewriteEngine On
Options +Followsymlinks
RewriteBase /
RewriteCond $1 \.(gif¦jpe?g¦css¦mpe?g)$ [OR]
RewriteCond $1 ^robots\.txt$ [OR]
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) - [L]
#
RewriteRule ^([^/]+)/page([^/]+)/?$ /index.php?u=$1&page=$2 [L]
#
RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
#
RewriteRule ^([^/]+)/archive/([^/]+)/([^/]+)/?$ /archive.php?u=$1&y=$2&m=$3 [L]
#
RewriteRule ^([^/]+)/archive/([^/]+)?/?$ /archive.php?u=$1 [L]
#
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]
#
RewriteRule ^([^/]+)/([^/]+)/?$ /entry.php?u=$1&e_id=$2 [L]
#
RewriteRule ^([^/]+)/?$ /index.php?u=$1 [L]
A minor tweak that may improve performance a tiny bit would be to combine rules like
RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
and
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]
RewriteRule ^([^/]+)/(archive¦profile)/?$ /$2.php?u=$1 [L]
You can often take advantage of the fact that RewriteConds are not processed unless the pattern of the RewriteRule matches, so more RewriteConds are not necessarily much slower, *unless* the pattern to be matched in the RewriteRule is very ambiguous, such as ".*". See the mod_rewrite documentation for details on RewriteRule and RewriteCond processing order.
I should note here that all of the above is based on questions: Is the majority of requests to your server for images and other included objects? Are there many directories and pages that do exist as separate files and are not handled by php? The amount of improvement you might see from implementing the changes described above will vary greatly depending on the answers to these questions.
It is important to know what the 'mix' of requests to your site looks like, in order to do optimization. The basic idea is to identify those requests which are both very frequent and not handled by your php, and to exit from mod_rewrite as fast as possible if they are requested. Then do the slow filesystem checking later.
If you are on a dedicated server, you should certainly inquire into getting the code installed in httpd.conf. Code in httpd.conf is compiled at server start-up as opposed to being interpreted on every HTTP request, and is therefore much more efficient.
Note: WebmasterWorld member Andreas Friedrich did a series of benchmark tests that showed that in httpd.conf, it is faster to use individual RewriteConds to process alternate possible patterns, rather than using the (a¦b¦c) construct in a single RewriteCond. Take this into account if you port the code to httpd.conf.
Change all broken pipe "¦" characters in the code above to solid pipe characters before use. Posting on this board modifies them.
Jim
I tried to apply all this, and wow...server CPU usage has never been sooo down!
First i avoided using htaccess for file/directories exists checking.
Most common requests call files or directories that does not need to be rewrited at this level, so first :
RewriteEngine On
Options +Followsymlinks
RewriteCond $1 \.(php¦gif¦jpe?g¦png¦css¦js¦swf¦txt)$ [OR]
RewriteCond $1 ^(forum¦images¦blogs¦uploads¦include¦lang¦cache)/?$
RewriteRule ^(.*) - [L]
...etc
My whole htacces is now ordered by popularity descending.
Actualy i have more than 40 physical directories to check for.
It seems not to be server processing, even with this big RewriteCond line. Rather more dummmy precise matches than hard processing ambigous paterns.
I suppose (a¦b¦c) order is also important inside a rewritecond line.
Anyway, i ordered it like if it is cleverly read from left to right, and it stops as soon as it finds a match.
As you advice thanks to Andreas Friedrich, i ll also consider multiple rewriteCond instead of one big (a¦b¦..z) one, once i ll put this in the httpd.conf, after a good one or two days monitoring serverload with new .htaccess.
If anybody is interrested, i will post my server loadaverage stats from htaccess to httpd.conf processing...but i suppose it's already been done many times here.
I ve got so much to read in webmasterworld!
In order to run the code in httpd.conf, change the rule from
RewriteRule ^(.*) - [L]
RewriteRule [b]^/([/b].*) - [L]
Jim