Forum Moderators: phranque

Message Too Old, No Replies

I thought I knew mod rewrite

parsing an entire url to a page

         

RyanM

12:31 am on Oct 30, 2006 (gmt 0)

10+ Year Member



Basically my website runs on the fact that every URL is a unique identifier within the database.

I have succesfully used the following rewrite rule:


RewriteRule (.+).html /index.php?path=/$1.html [nc]

IE if the path ends with .html then send it to the script.

I also had to put in various rules to ignore certain directories:


RewriteCond %{SCRIPT_FILENAME}!.*stats.*
RewriteCond %{SCRIPT_FILENAME}!.*phpMyAdmin.*

etc.

This worked quite succesfully however with the rewrite rule I can only process pages with the path as indexed in their database entry, I can not parse any get variables via the url string as anything after .html is ignored.

So I thought that:


RewriteRule ^(.*) index.php?path=$1 [NC]

Would do the trick, ie parse any page into the path. With this aproach I should get all get variables as well as the page path, I can then process it all in code.

However I have a little problem. Any URL that I enter returns index.php?path=index.php for instance /test/index.html should return index.php?path=/test/index.html however it doesnt.

Please help

RyanM

12:31 am on Oct 30, 2006 (gmt 0)

10+ Year Member



Ok this works:

RewriteRule (.+).html(.*) /index.php?path=/$1.html$2 [nc]

but I would like something a little cleaner.

jdMorgan

4:44 am on Oct 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule (.+).html /index.php?path=/$1.html [nc]

IE if the path ends with .html then send it to the script.

Actually, the way that really works is: "if the path ends with <any character>html, then send it to the script."

You'll need to escape the period if you want it interpreted literally, since "." is a regular-expressions token meaning, "match any single character." Also, do not feel free to vary the case of directives or flags. Try:


RewriteRule (.+)\.html$ /index.php?path=/$1.html [NC,L]

I also had to put in various rules to ignore certain directories:

RewriteCond %{SCRIPT_FILENAME} !.*stats.*
RewriteCond %{SCRIPT_FILENAME} !.*phpMyAdmin.*

These can be shortened to


RewriteCond %{SCRIPT_FILENAME} !stats
RewriteCond %{SCRIPT_FILENAME} !phpMyAdmin

since the patterns are unanchored, adding ".*" on the beginning or end changes nothing, but wastes CPU time.

So I thought that:

RewriteRule ^(.*) index.php?path=$1 [NC]

would do the trick...


The reason that this failed is that that rule will also rewrite "index.php?path=<something>" to index.php?path=index.php, since there is nothing to exclude index.php itself from being rewritten. Remember that mod_rewrite in an .htaccess context behaves as if it were recursive, so anything that has been rewritten can indeed be rewritten again, unless you explicitly prevent it.

To fix that problem explicitly, as well as another literal-period problem, use:


RewriteCond $1 !index\.php$
RewriteRule (.*) index\.php?path=$1 [NC,L]

Note the [L] flags I added to your rules. Always include an [L] flag on rewrites or redirects unless you have a good reason not to. Otherwise, the output of this rule will be processed through all subsequent RewriteRules, which is usually a big waste of time.

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].

Jim