Forum Moderators: phranque

Message Too Old, No Replies

.htaccess server overload and crash

help to build a smooth root top .htaccess

         

goldminer

2:17 pm on Apr 24, 2005 (gmt 0)

10+ Year Member



Hello Webmasterworld! :)

A few days ago, i was given the "out of the box" .htaccess you can see below, and i put it at the root of my 80 000 pv/day website.
It was like it's working perfectly, till it overloaded then crashed my dedi (unix/apache) server.
I made a lot of long and unsuccessfull tests on my php scripts before finaly discovering that once this .htaccess removed , server CPU usage imediatly fell from 99% to 5%!

Since this moment, i have been spending my time searching what was so server intensive, and how to build a smoother .htaccess.
Many of you probably know or can imagine how it can be when you need to stop a new cool and addictive service you had opened to your site members the day before...

Enough talk, here is the .htaccess i was given :

RewriteEngine On
Options +Followsymlinks
RewriteBase /
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.* - [L,QSA]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/page([^/]+)/?$ /index.php?u=$1&page=$2 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/([^/]+)/([^/]+)/?$ /archive.php?u=$1&y=$2&m=$3 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/([^/]+)?/?$ /archive.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/([^/]+)/?$ /entry.php?u=$1&e_id=$2 [L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^/]+)/?$ /index.php?u=$1 [L]

ErrorDocument 404 /404.htm [L]


You have to know this was at the root of my website, and it was for a blog service where users can have site.com/me adresses.
But we have also site.com/forums , site.com/chat and many already existing directories...

here is what i found at the begining of the official Apache url rewriting guide [httpd.apache.org] :

The crazy and lazy can even do the following in the top-level .htaccess file of their homedir. But notice that this creates some processing overhead

RewriteEngine on
RewriteBase /~quux/
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]

I wonder if the problem is the use of %{REQUEST_FILENAME} on a top directory, or maybe the way it was used in my .htacces? or if something was bad written, or if the overload was due to the fact that we have a forum directory that has its own .htaccess and it may have caused recursions?

Here is a new .htacces i wrote to replace the cpu burning one. As you can see some things are not processed no more by .htaccess, but by php. Don't mind the comments and white lines, i will delete it when i upload the file to my server. It is only here to help understanding :

RewriteEngine On
Options +Followsymlinks

# checks before all if we are calling a physical file or directory
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f

# if yes stop and do nothing
RewriteRule ^.*$ - [L]

# then checks if it is a "/username/archive/2005/4/" like page
RewriteRule ([^/]+)/archive/([^/]+)/([^/]+) archive.php?u=$1&y=$2&m=$3 [L]

# then is it a "/username/something/" or "/username/something" like page
RewriteRule ([^/]+)/([^/]+) index.php?u=$1&task=$2 [L]

# then finaly is it the user homepage ( "/username/" or "/username")
RewriteRule ([^/]+) index.php?u=$1 [L]

So what can you say? Can you help? Thanks.

[edited by autor to correct a code]

jdMorgan

4:30 pm on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Goldminer,

Welcome to WebmasterWorld!

The basic problem here is that there is a check for either "file exists" or "directory exists" before every rule. This results in a call to the filesystem to see if that resource exists for every request. There are other ways to identify the requests which should not be handled by your php scripts.

You might be better off actually excluding all of the resources that you know exist and should not be handled by your scripts, rather than having the server try to find them first. Examples would be images and multimedia files, CSS, external JavaScript files, etc. Just for example:


RewriteEngine On
Options +Followsymlinks
RewriteBase /
RewriteCond $1 \.(gif¦jpe?g¦css¦mpe?g)$ [OR]
RewriteCond $1 ^robots\.txt$ [OR]
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*) - [L]
#
RewriteRule ^([^/]+)/page([^/]+)/?$ /index.php?u=$1&page=$2 [L]
#
RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
#
RewriteRule ^([^/]+)/archive/([^/]+)/([^/]+)/?$ /archive.php?u=$1&y=$2&m=$3 [L]
#
RewriteRule ^([^/]+)/archive/([^/]+)?/?$ /archive.php?u=$1 [L]
#
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]
#
RewriteRule ^([^/]+)/([^/]+)/?$ /entry.php?u=$1&e_id=$2 [L]
#
RewriteRule ^([^/]+)/?$ /index.php?u=$1 [L]

This code dispenses with the file-exists checking right away, but only in the case where the requested resource is NOT a media, CSS, JS, or robots.txt file. In that case, the file-exists check and all the other rules can be skipped. In the case of a request that does not match one of those specific filetypes, the directory-exists and file-exists checks are done. So, put those filetype-checking RewriteConds in order of "most-likely-to-match" first to "least-likely-to-match" last. Then put the slow and processor-intensive filesytem checks last. Note that I removed [QSA] since it's not needed in this rule.

A minor tweak that may improve performance a tiny bit would be to combine rules like


RewriteRule ^([^/]+)/profile/?$ /profile.php?u=$1 [L]
and
RewriteRule ^([^/]+)/archive/?$ /archive.php?u=$1 [L]

into one rule:

RewriteRule ^([^/]+)/(archive¦profile)/?$ /$2.php?u=$1 [L]

However, read on to the end before doing this.

You can often take advantage of the fact that RewriteConds are not processed unless the pattern of the RewriteRule matches, so more RewriteConds are not necessarily much slower, *unless* the pattern to be matched in the RewriteRule is very ambiguous, such as ".*". See the mod_rewrite documentation for details on RewriteRule and RewriteCond processing order.

I should note here that all of the above is based on questions: Is the majority of requests to your server for images and other included objects? Are there many directories and pages that do exist as separate files and are not handled by php? The amount of improvement you might see from implementing the changes described above will vary greatly depending on the answers to these questions.

It is important to know what the 'mix' of requests to your site looks like, in order to do optimization. The basic idea is to identify those requests which are both very frequent and not handled by your php, and to exit from mod_rewrite as fast as possible if they are requested. Then do the slow filesystem checking later.

If you are on a dedicated server, you should certainly inquire into getting the code installed in httpd.conf. Code in httpd.conf is compiled at server start-up as opposed to being interpreted on every HTTP request, and is therefore much more efficient.

Note: WebmasterWorld member Andreas Friedrich did a series of benchmark tests that showed that in httpd.conf, it is faster to use individual RewriteConds to process alternate possible patterns, rather than using the (a¦b¦c) construct in a single RewriteCond. Take this into account if you port the code to httpd.conf.

Change all broken pipe "¦" characters in the code above to solid pipe characters before use. Posting on this board modifies them.

Jim

goldminer

1:14 am on Apr 26, 2005 (gmt 0)

10+ Year Member



Jim, thank you for all the precious info , logical basis and advices.

I tried to apply all this, and wow...server CPU usage has never been sooo down!

First i avoided using htaccess for file/directories exists checking.
Most common requests call files or directories that does not need to be rewrited at this level, so first :

RewriteEngine On
Options +Followsymlinks
RewriteCond $1 \.(php¦gif¦jpe?g¦png¦css¦js¦swf¦txt)$ [OR]
RewriteCond $1 ^(forum¦images¦blogs¦uploads¦include¦lang¦cache)/?$
RewriteRule ^(.*) - [L]

...etc
My whole htacces is now ordered by popularity descending.

Actualy i have more than 40 physical directories to check for.
It seems not to be server processing, even with this big RewriteCond line. Rather more dummmy precise matches than hard processing ambigous paterns.

I suppose (a¦b¦c) order is also important inside a rewritecond line.
Anyway, i ordered it like if it is cleverly read from left to right, and it stops as soon as it finds a match.

As you advice thanks to Andreas Friedrich, i ll also consider multiple rewriteCond instead of one big (a¦b¦..z) one, once i ll put this in the httpd.conf, after a good one or two days monitoring serverload with new .htaccess.

If anybody is interrested, i will post my server loadaverage stats from htaccess to httpd.conf processing...but i suppose it's already been done many times here.

I ve got so much to read in webmasterworld!

jdMorgan

1:38 am on Apr 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I'm glad that helped!

In order to run the code in httpd.conf, change the rule from


RewriteRule ^(.*) - [L]

to

RewriteRule [b]^/([/b].*) - [L]

You discovered the key: Order your patterns first by specificity and then by popularity. The idea is to make the most common rewrites happen and exit from mod_rewrite as soon as possible, but without creating problems due to ambiguous regular-expressions patterns.

Jim