Forum Moderators: phranque

Message Too Old, No Replies

Mod_Rewrite issue : flags

[C] flag disables [T=mimetype]?

         

HotShot

6:51 pm on Feb 14, 2005 (gmt 0)

10+ Year Member



Hi everybody,

While working on rewrites (in .htaccess) for a site I encountered a few issues regarding RewriteRule flags and setting variables.

Here is the code, details follow :


RewriteEngine on
# Redirect nude domain to www
RewriteCond %{HTTP_HOST}!^www.domain.com$ [NC]
RewriteRule ^(.*) http://www.domain.com/$1 [QSA,R=301,L]
# 1st, rewrite some existing urls to a SSI script and parsing for variables using regex
RewriteCond %{REQUEST_FILENAME} ^.*\/(en¦fr)\/.+\.html$
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:text\/html,QSA,C]
# 2d, chained rule to send as xhtml if accepted
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteRule ^.*$ - "[T=application/xhtml+xml,E=MIMETYPE:application/xhtml\+xml,QSA,L]"

The issues :

1. When setting the [C] flag for the 1st rule, the following rewritecond is correctly processed (i.e. the variable 'MIMETYPE' is set to 'app...xml'), but the resulting page is not sent as xhtml - though it should according to the [T] flag! How strange.
The only way to make it work is not to chain rules, but it then needs a new RewriteCond to check if {requested_filename} is the script filename, in order to send only this file as xhtml.
Pretty silly as this extra check would probably consume extra server time.

2. In this later case, when setting the variable MIMETYPE (first as 'text/html') and setting it again later to 'app...xml', a new variable is created and the old one is prefixed with one more REDIRECT_.
It then results in two vars, REDIRECT_MIMETYPE=app[...]xml (correct) and REDIRECT_REDIRECT_MIMETYPE=text/html! How to avoid that?

Hope I was clear. By the way, if you have general tips or suggestions to improve my code, you're welcome! :)
Thanks in advance,
HotShot

jdMorgan

8:54 pm on Feb 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HotShot,

Welcome to WebmasterWorld!

I'd suggest you "flatten" the code so that chaining is not needed.

Always check for HTTP_HOST non-blank when doing a negative-pattern redirect. This will prevent redirection looping on HTTP/1.0 client requests.

Do not end-anchor the HTTP_HOST pattern. If you do, any client which appends a port number may be mis-handled, e.g. if HTTP_HOST="www.domain.com:80"

Check for file exists only after checking http-accept, since file-checking is slower than checking a local var.

[QSA] is not needed, since you are not adding a var to the existing query string. The existing query string will be passed-through unchanged.

No need to escape characters in anything but regex patterns.

Use [L] flag to terminate rule processing if either rule matches.

Due to their length, the flags on the rules below appear to be wrapped to a second line. However, they must be on the same line as the RewriteRule in the actual code (forum posting artifact).

Replace all broken pipe "¦" characters with solid pipes before attempting to use this code (forum posting artifact).


RewriteEngine on
#
# Redirect alternate domain to www
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www.domain.com [NC]
#
# Rewrite for non-XML-accept request
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:text/html,L]
#
# Rewrite for XML-accept request
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(en¦fr)(/?)([^/.]*)(/?)([^/.]*)(/)([^/]+)\.html$ /script.shtml [T=application/xhtml+xml,E=BASE:$1,E=CAT:$3,E=SUBCAT:$5,E=MIMETYPE:application/xhtml+xml,L]

Jim

HotShot

9:18 pm on Feb 14, 2005 (gmt 0)

10+ Year Member



OK, thanks Jim for the explanation. I think I got the point. :)
(BTW I suppose you left some code out in your reply?)

(edit) Another question : what about testing URL-matching pattern in a RewriteCond rather than in the RewriteRule itself? Could it save some processing? For instance :


RewriteCond %{REQUEST_FILENAME} ^.*/(en¦fr)/
RewriteRule ^..(/?)etc. [E=BASE:%1,etc.]

jdMorgan

5:32 am on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> (BTW I suppose you left some code out in your reply?)

No, not that I am aware of. I removed "RewriteCond %{REQUEST_FILENAME} ^.*\/(en¦fr)\/.+\.html$" because it appeared to be entirely redundant.

> what about testing URL-matching pattern in a RewriteCond rather than in the RewriteRule itself? Could it save some processing?

No, the RewriteConds are only evaluated if the RewriteRule pattern matches. You want your RewriteRule pattern to be as specific as possible. See the Ruleset Processing [httpd.apache.org] section of the Apache mod_rewrite documentation for an explanation of how RewriteRules and RewriteConds are processed.

The best thing you could do to improve the efficiency of your code is to fix the problem with your URLs that makes the "/" characters optional in your rules -- the ones matched into $2, $4, and $6. A regular URL-path structure would eliminate the need to treat those slashes as optional.

Jim

HotShot

9:35 am on Feb 15, 2005 (gmt 0)

10+ Year Member



Another very informative reply, thanks a lot!

I had missed the point about ruleset processing. I still can't figure what the two first Conds are supposed to do, but I'll dig into literature... ;)

The big regex is actually used to 'explode' the URL to set numerous variables, then rebuild the various "depths" to run multiple SSI includes, e.g. Base/foo.inc + Base/Cat/foo.inc + Base/Cat/Subcat/foo.inc ...
Since my directory structure is 2 levels deep at most, I have to deal with URIs such as en/, en/cat/ and/or en/cat/sub/. Though you said to make rules as specific as possible, I guess a single RewriteRule/regex is better than 3 successive pattern-match-tests for the 3 possible types of requests... well, maybe not.

Greetings,
HotShot

jdMorgan

11:40 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think three specific rules would be more efficient.

Jim

HotShot

12:11 am on Feb 16, 2005 (gmt 0)

10+ Year Member



OK, seems coherent.

Anyway, there is a silly bug in (my) Apache : [T] flags next to (or chained with) a regex-powered rewrite are ignored. I spent the whole day trying everything, even the most basic rules. Consider the problem solved, I'll add another final cond'rewrite to set correct mimetype.

Learned a lot these last few days. Thanks Jim for your kind assistance. :)