Forum Moderators: phranque
While working on rewrites (in .htaccess) for a site I encountered a few issues regarding RewriteRule flags and setting variables.
Here is the code, details follow :
RewriteEngine on
# Redirect nude domain to www
RewriteCond %{HTTP_HOST}!^www.domain.com$ [NC]
RewriteRule ^(.*) http://www.domain.com/$1 [QSA,R=301,L]
# 1st, rewrite some existing urls to a SSI script and parsing for variables using regex
RewriteCond %{REQUEST_FILENAME} ^.*\/(en¦fr)\/.+\.html$
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:text\/html,QSA,C]
# 2d, chained rule to send as xhtml if accepted
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteRule ^.*$ - "[T=application/xhtml+xml,E=MIMETYPE:application/xhtml\+xml,QSA,L]"
The issues :
1. When setting the [C] flag for the 1st rule, the following rewritecond is correctly processed (i.e. the variable 'MIMETYPE' is set to 'app...xml'), but the resulting page is not sent as xhtml - though it should according to the [T] flag! How strange.
The only way to make it work is not to chain rules, but it then needs a new RewriteCond to check if {requested_filename} is the script filename, in order to send only this file as xhtml.
Pretty silly as this extra check would probably consume extra server time.
2. In this later case, when setting the variable MIMETYPE (first as 'text/html') and setting it again later to 'app...xml', a new variable is created and the old one is prefixed with one more REDIRECT_.
It then results in two vars, REDIRECT_MIMETYPE=app[...]xml (correct) and REDIRECT_REDIRECT_MIMETYPE=text/html! How to avoid that?
Hope I was clear. By the way, if you have general tips or suggestions to improve my code, you're welcome! :)
Thanks in advance,
HotShot
Welcome to WebmasterWorld!
I'd suggest you "flatten" the code so that chaining is not needed.
Always check for HTTP_HOST non-blank when doing a negative-pattern redirect. This will prevent redirection looping on HTTP/1.0 client requests.
Do not end-anchor the HTTP_HOST pattern. If you do, any client which appends a port number may be mis-handled, e.g. if HTTP_HOST="www.domain.com:80"
Check for file exists only after checking http-accept, since file-checking is slower than checking a local var.
[QSA] is not needed, since you are not adding a var to the existing query string. The existing query string will be passed-through unchanged.
No need to escape characters in anything but regex patterns.
Use [L] flag to terminate rule processing if either rule matches.
Due to their length, the flags on the rules below appear to be wrapped to a second line. However, they must be on the same line as the RewriteRule in the actual code (forum posting artifact).
Replace all broken pipe "¦" characters with solid pipes before attempting to use this code (forum posting artifact).
RewriteEngine on
#
# Redirect alternate domain to www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www.domain.com [NC]
#
# Rewrite for non-XML-accept request
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:text/html,L]
#
# Rewrite for XML-accept request
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [T=application/xhtml+xml,E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:application/xhtml+xml,L]
(edit) Another question : what about testing URL-matching pattern in a RewriteCond rather than in the RewriteRule itself? Could it save some processing? For instance :
RewriteCond %{REQUEST_FILENAME} ^.*/(en¦fr)/
RewriteRule ^..(/?)etc. [E=BASE:%1,etc.]
No, not that I am aware of. I removed "RewriteCond %{REQUEST_FILENAME} ^.*\/(en¦fr)\/.+\.html$" because it appeared to be entirely redundant.
> what about testing URL-matching pattern in a RewriteCond rather than in the RewriteRule itself? Could it save some processing?
No, the RewriteConds are only evaluated if the RewriteRule pattern matches. You want your RewriteRule pattern to be as specific as possible. See the Ruleset Processing [httpd.apache.org] section of the Apache mod_rewrite documentation for an explanation of how RewriteRules and RewriteConds are processed.
The best thing you could do to improve the efficiency of your code is to fix the problem with your URLs that makes the "/" characters optional in your rules -- the ones matched into $2, $4, and $6. A regular URL-path structure would eliminate the need to treat those slashes as optional.
Jim
I had missed the point about ruleset processing. I still can't figure what the two first Conds are supposed to do, but I'll dig into literature... ;)
The big regex is actually used to 'explode' the URL to set numerous variables, then rebuild the various "depths" to run multiple SSI includes, e.g. Base/foo.inc + Base/Cat/foo.inc + Base/Cat/Subcat/foo.inc ...
Since my directory structure is 2 levels deep at most, I have to deal with URIs such as en/, en/cat/ and/or en/cat/sub/. Though you said to make rules as specific as possible, I guess a single RewriteRule/regex is better than 3 successive pattern-match-tests for the 3 possible types of requests... well, maybe not.
Greetings,
HotShot
Anyway, there is a silly bug in (my) Apache : [T] flags next to (or chained with) a regex-powered rewrite are ignored. I spent the whole day trying everything, even the most basic rules. Consider the problem solved, I'll add another final cond'rewrite to set correct mimetype.
Learned a lot these last few days. Thanks Jim for your kind assistance. :)