Forum Moderators: phranque
I'm trying to create an httpd file to handle the typical canonical URL issues. I have seen posts from jdMorgan and others about what this might look like in mod_rewrite. I like jdMorgans posts on the topic (like [webmasterworld.com...] have read through them and understood 98% of what he was doing, but they all seem to use the [E=variable:value] flag which unfortunately, ISAPI Rewrite does not yet support.
If you had to implement the following set of rewrite rules without using [E=var:value] to set environment variables in order to pass values from RewriteRule to RewriteRule, how would you recommend doing it?
http://example.com --> http://www.example.com/ (fix www and trailing '/')
http://example.com/ --> http://www.example.com/ (fix www)
http://example.com/default.asp --> http://www.example.com/ (fix www and drop default.asp)
http://example.com/x/y/default.asp --> http://www.example.com/x/y/ (fix www and drop default.asp)
http://www.example.com --> http://www.example.com/ (fix trailing '/')
http://www.example.com/default.asp --> http://www.example.com/ (drop default.asp)
http://www.example.com/x/y/default.asp --> http://www.example.com/ (drop default.asp)
The only way I can think to do this is to have a RewriteRule and associated RewriteConditions for each of the above cases. However, I want to minimize the # of 301s. I don't really want to add 'www.' if it's missing and 301, only to possibly then need to drop the 'default.asp' and 301 again. So even this presents a challenge if I can't set environment variables to flag that at some point later after more RewriteRules have run I need to do a single 301 after everything about the URL that needs to be changed HAS been changed. So the ordering of the above rules will be important.
Although not optimal from a performance perspective I am sure, I thought I would create a series of generic RewriteCond statements that check for specific things which I could then piece together to deal with a single case. For instance:
RewriteCond %{HTTPS} (off) to check for HTTP
RewriteCond %{HTTPS} (on) to check for HTTPS
RewriteCond %{HTTP_HOST} ^(?!www\.)(.+) to check does NOT begin w/ 'www.' and create a backreference to HTTP_HOST
RewriteCond %{HTTP_HOST} ^(www\.)(.+) to check DOES begin w/ 'www.' and create a backreference to HTTP_HOST minus the 'www.'
Also RewriteCond for things like:
REQUEST_URI ends in '/default.asp'
REQUEST_URI does NOT end in '/default.asp'
Though not necessarily the most efficient method of rule writing, it would then seem easy to piece together a rule using the above RewriteCond statements. I'm seeing an httpd file that is similar to the following psuedo code:
# if http and HTTP_HOST is missing 'www.' and URI ends in '/default.asp'
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteCond to check if REQUEST_URI ends in '/default.asp'
RewriteRule to add www to non-www HOST backreference and drop default.asp and append QUERY_STRING [R=301,L]
# if http and HTTP_HOST is missing 'www.' and URI is NULL (homepage no trailing '/')
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteCond to check if HTTP_URI is NULL (^$)
RewriteRule to add www to non-www HOST backreference and add '/' and append QUERY_STRING [R=301,L]
# if http and HTTP_HOST is mising 'www.'
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteRule to add www to non-www HOST backreference and add HTTP_URI and append QUERY_STRING [R=301,L]
etc.
Am I totally off base? Admittedly, I'm a noob at rewrites LOL... I know there are lots of concerns around how long it takes to process the rules. I'm sure there is an easier way to minimize the # of 301s required to clean up a single URL (preferably only 1) while still minimizing the processing speed of the request. But not having the [E=var:value] flag has thrown me a curve in that regard.
Any assistance would be appreciated.
Thanks in advance!
[edited by: ZydoSEO at 7:56 pm (utc) on May 15, 2008]
However, you could also handle some of the rarer-error cases simply by putting your rules in most-specific to least-specific order: For example, if you handle all of the common cases first, and then end up with say HTTPS and a missing www subdomain, then you can redirect to [www...] regardless of the subdomain. In other words, only the really badly messed-up URLs would actually result in multiple redirects.
Be aware that --at least in Apache-- the "!" NOT operator is part of mod_rewrite, and not part of regular-expressions. Therefore, it cannot be used inside a pattern. It can only be applied to an entire pattern. Furthermore, you cannot back-reference a negated pattern. Apache also does not support the "minimum greediness" or other special regex flags available in scripting languages.
You'll probably need to use two RewriteConds
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com
Jim
Though not ideal, it's nice to know that I wasn't way out in left field as to my remaining choices for solutions. Hopefully ISAPI Rewrite will soon support the [E=var:value] flag.
I enjoy all of your posts BTW. They are always very informative. You are an awesome resource for everyone here at WebmasterWorld.
[edited by: ZydoSEO at 12:37 pm (utc) on May 16, 2008]
1) The ! can only apply to an entire pattern (and is not part of the regex pattern itself). I noticed in the log that even when my RewriteCond was RewriteCond %{HTTP_HOST} !^(www\.) [NC] the pattern was logged as 'www\.' and
2) As you said, it does not support creating backlinks on lines where you use the '!' operator.
I certainly wish there were a list somewhere of these little 'quirks'.
Thanks again for the help.