Forum Moderators: phranque

Message Too Old, No Replies

rewrite php to htm and vice versa

causes problems to existing rules

         

omoutop

12:06 pm on Jul 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have the following rule:

# backward compatibility ruleset for rewriting document.htm to document.php
# when and only when document.php exists
# but no longer document.htm
# parse out basename, but remember the fact
RewriteRule ^(.*)\.htm$ $1 [C,E=WasHTML:yes]
# rewrite to document.php if exists
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ $1.php [S=1]
# else reverse the previous basename cutout
RewriteCond %{ENV:WasHTML} ^yes$
RewriteRule ^(.*)$ $1.htm


But this set messes some of my other rules.
For example the following:
RewriteRule ^greece/fly-and-drive/index.htm$ fly_drive/index.php [nc]
gives a 403 error :
You don't have permission to access greece/fly-and-drive/index/greece/fly-and-drive/index/greece/fly-and-drive/index/greece/fly-and-drive/index/greece/fly-and-drive/index/greece/fly-and-drive/index/.......

The above rule is below the complex first rule.
As far as i can tell, the situation improves if i use the [L] flag in all the other rewriting rules.

Can anyone explain to me the "why"? Why the first set of rules behaves strange (an infinite loop as i can see it)?
And what can i do to improve it?

jdMorgan

5:08 am on Jul 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You've hit a nasty bug in mod_rewrite [archive.apache.org] that was supposed to be fixed in Apache 2.x, but wasn't (at least according to some testing I did on real servers a year or more ago).

If more that one rewrite is applied to a requested URL-path in .htaccess, Apache sometimes "duplicates" parts of that path, resulting in the messed-up filepath that you see.

As such, the code in your top rule above won't work, even though it looks just like one of the examples in the Apache URL Rewriting Guide.

The cure is to do two things when coding for .htaccess:

First, always use the [L] flag, and to make sure that for any given HTTP request, one and only one rewrite is ever done. This means that you've got to code things such that "if more than one thing needs to be changed, then change all of those things using one and only one RewriteRule."

Second, always order your rewriterules with all external redirects first, in order from most-specific patterns and conditions to least-specific patterns and conditions, followed by all internal rewrites, again in order from most- to least-specific. A most-specific rule affects only one or a very few URLs, while a least-specific rule affects many URLs.

Basically, if you have a URL on your site that *could* be rewritten by two or more rules, even though those two rules are individually as specific as possible, then put the rule that you want to apply to that URL first, and --as noted above-- end all rules with an [L] flag.

I'd suggest:

# Rewrite a specific single .htm URL to a .php file
RewriteRule ^greece/fly-and-drive/index\.htm$ fly_drive/index.php [NC,L]
#
#
# Backward-compatibility rule for rewriting document.htm URLs to document.php files
# when and only when document.php exists but document.htm no longer exists
#
# If requested URL-path does not resolve to an existing .htm file
RewriteCond %{REQUEST_FILENAME} !-f
# and if URL-path with ".htm" removed and ".php" appended resolves to an existing file
RewriteCond %{DOCUMENT_ROOT}/$1.php -f
# then rewrite .htm URL-path to .php file
RewriteRule ^(.+)\.htm$ /$1.php [L]

Here, the complex two-step rewrite is eliminated by independently 'calculating' the server filepath to be checked for 'exists' by taking the server's DocumentRoot and appending the requested URL-path with the ".htm" filetype removed and a ".php" filetype appended. This method won't work on *all* servers, but it will work on many, many servers -- It depends on the server's configuration.

Note that the existence of the ".htm" file is checked first, in accordance with the comments in your original code (Despite those comments, your original code did not in fact do this check).

Note also the leading slash I added on the "$1" in the rule substitution. This may cause problems on a few servers. But use it if you can, because it closes a well-known security hole.

Jim

omoutop

5:48 am on Jul 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thank you Jim
but i don't think i will continue to use this set of rules - my apache is 1.3
of course i will try to follow your suggestions on the rest of the rules and optimize them as best as i can