Forum Moderators: phranque

Message Too Old, No Replies

help understanding mod rewrite logs

         

cquezel

3:34 am on Sep 8, 2009 (gmt 0)

10+ Year Member



I'm rather new to mod_rewrite and I am trying to understand the log files (the behaviour of mod_rewrite actually).

I'm using an .htaccess file with the following code:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /

RewriteRule ^test/(en)/(.*)\.html$ SinglePageDisplayer.php?lang=$1&name=$2&test=/test [PT,L]

RewriteRule ^test/(en)/(.*).pdf$ /$2_$1.pdf [L]
RewriteRule ^test/(en)/(.*).gif$ /$2.gif [L]
RewriteRule ^test/(en)/(.*).css$ /$2.css [L]
RewriteRule ^test/(en)/(.*).jpg$ /$2.jpg [L]
RewriteRule ^test/(en)/(.*).png$ /$2.png [L]
RewriteRule ^test/(en)/(.*).js$ /$2.js [L]

</IfModule>

When I request http://www.example.com.trunk/test/en/company.html, I get the following log file (with line numbered). My question is why does the matching process continue after line 4 when mod_rewrite has found a match at line 3? I get the correct result but the logic seems a bit inefficient. Of course, I'm pretty sure the problem is in my config.

1) [rid#2139d88/initial] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] add path info postfix: C:/wamp/www/www.example.com.trunk/www/test/en -> C:/wamp/www/www.example.com.trunk/www/test/en/company.html
2) [rid#2139d88/initial] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/test/en/company.html -> test/en/company.html
3) [rid#2139d88/initial] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*)\.html$' to uri 'test/en/company.html'
4) [rid#2139d88/initial] (2) [perdir C:/wamp/www/www.example.com.trunk/www/] rewrite 'test/en/company.html' -> 'SinglePageDisplayer.php?lang=en&name=company&test=/test'
5) [rid#2139d88/initial] (3) split uri=SinglePageDisplayer.php?lang=en&name=company&test=/test -> uri=SinglePageDisplayer.php, args=lang=en&name=company&test=/test
6) [rid#2139d88/initial] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] add per-dir prefix: SinglePageDisplayer.php -> C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php
7) [rid#2139d88/initial] (2) [perdir C:/wamp/www/www.example.com.trunk/www/] forcing 'C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php' to get passed through to next API URI-to-filename handler
8) [rid#2139d88/initial] (2) [perdir C:/wamp/www/www.example.com.trunk/www/] trying to replace prefix C:/wamp/www/www.example.com.trunk/www/ with /
9) [rid#2139d88/initial] (5) strip matching prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
10) [rid#2139d88/initial] (4) add subst prefix: SinglePageDisplayer.php -> /SinglePageDisplayer.php
11) [rid#2139d88/initial] (1) [perdir C:/wamp/www/www.example.com.trunk/www/] internal redirect with /SinglePageDisplayer.php [INTERNAL REDIRECT]
12) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
13) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*)\.html$' to uri 'SinglePageDisplayer.php'
14) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
15) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).pdf$' to uri 'SinglePageDisplayer.php'
16) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
17) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).gif$' to uri 'SinglePageDisplayer.php'
18) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
19) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).css$' to uri 'SinglePageDisplayer.php'
20) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
21) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).jpg$' to uri 'SinglePageDisplayer.php'
22) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
23) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).png$' to uri 'SinglePageDisplayer.php'
24) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] strip per-dir prefix: C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php -> SinglePageDisplayer.php
25) [rid#214f548/initial/redir#1] (3) [perdir C:/wamp/www/www.example.com.trunk/www/] applying pattern '^test/(en)/(.*).js$' to uri 'SinglePageDisplayer.php'
26) [rid#214f548/initial/redir#1] (1) [perdir C:/wamp/www/www.example.com.trunk/www/] pass through C:/wamp/www/www.example.com.trunk/www/SinglePageDisplayer.php

jdMorgan

4:22 am on Sep 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because after any rule is applied, mod_rewrite in .htaccess is re-started.

So even with the [L] flag, every rule is evaluated at least once, and those before any rule that matches are evaluated at least twice.

Jim

cquezel

4:40 am on Sep 8, 2009 (gmt 0)

10+ Year Member



Thank you for the very quick and informative reply.

I'm very surprised by this. Obviously this means that the logic of writing rules is quite different in .htaccess than in .conf file. In .htaccess I must make sure that no rules after the first matching rule also matches right? I had also read about the "/" prefix difference in the URI. Are there other differences I should know?

Does this mean that the [L] flag is usless in .htaccess files?

jdMorgan

2:12 pm on Sep 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This has to do with the fact that .htaccess files are processed in the 'fix-up' phase of the API.

Yes, you do have to make sure that your rules are mutually-exclusive, and that the pattern and conditions of each rule do not unexpectedly match the substitution URL-path or file-path of any other rule.

In cases where you see this could happen, it's a matter of explicitly excluding previously-rewritten filepaths or previously-redirected URLs from matching, in order to prevent an 'infinite' loop. This is commonly (and easily) done using a negative-match RewriteCond, examining the requested path for previously-rewritten paths and the server variable THE_REQUEST for previously-redirected URLs. Other useful techniques are to examine REDIRECT_REQUEST_URI or to create and subsequently check your own variable using the [E=Var:Val] flag on RewriteRule.

A good example of these exclusion techniques can be seen in the (hundreds of) threads here about redirecting client requests for the URL-paths "/index.html" or "/index.php" back to the URL-path "/". In order to prevent looping due to interaction with the action of a DirectoryIndex directive (or another rewriterule) which rewrites requests for the URL-path "/" to the filepath "/index.xyz", it is necessary to use the exclusion logic previously described if the code is to work in .htaccess.

The [L] flag is certainly not useless. In all cases, it stops mod_rewrite from processing subsequent rules in the current iteration. And in the case of redirects (using [R=30x] flag), it immediately invokes the redirect. The [L] flag is of most benefit to efficiency in a well-ordered list of rules, but is always useful except on the very last rule in the file.

Aside from this 'looping' problem and the fact that URL-paths 'seen' by RewriteRule in .htaccess are stripped of the path-parts describing the path to 'this' .htaccess file, the only other major caveat I can think of is that RewriteMaps may be used, but not defined, in a .htaccess per-directory context.

An interesting result of some testing done several years ago by WebmasterWorld member Andreas_Friedrich was that the most-efficient approach to RewriteConds also varies according to context. In server config files, where the code is compiled once when the server is restarted, it is most efficient to use multiple RewriteConds to examine the same server variable for multiple values. In contrast, in the .htaccess context where the code is re-compiled for every HTTP request, it is more efficient to use the 'local OR' operator in a single RewriteCond. To clarify, the two following snippets are logically equivalent, but optimized for each context:


# Server config (e.g. httpd.conf)
RewriteCond %{REQUEST_URI} !^/common/js/
RewriteCond %{REQUEST_URI} ^/admin/ [OR]
RewriteCond %{REQUEST_URI} ^/reports/ [OR]
RewriteCond %{REQUEST_URI} ^/stats/
RewriteRule ^/(.+\.js)$ /common/js/$1 [L]

# Per-directory (.htaccess)
RewriteCond %{REQUEST_URI} !^/common/js/
RewriteCond %{REQUEST_URI} ^/(admin¦reports¦stats)/
RewriteRule ^(.+\.js)$ /common/js/$1 [L]

(Note also the explicit 'loop stopping' function of the first RewriteCond in each snippet.)

One other thing to keep in mind: It is folly to worry about a few hundred lines of mod_rewrite code in .htaccess on a site that routinely invokes server-side scripts (e.g. PHP or PERL) containing thousands of lines of code. The major risk to server performance is not in the 'second pass' through the mod_rewrite code, but rather in inefficient mod_rewrite coding -- Such things as pattern with multiple ".*" sub-patterns in patterns, and RewriteConds doing unnecessary and/or insufficiently-qualified file- or directory-exists checks or reverse-DNS lookups are much more serious problems.

Jim

cquezel

12:44 am on Sep 9, 2009 (gmt 0)

10+ Year Member



Thank you Jim for taking the time to write such a detailed and precise response.

Claude