Forum Moderators: phranque

Message Too Old, No Replies

Fixing canonicalizaton issues w/ URLs w/out the [E=var:value] flag

I'm using ISAPI Rewrite on MS platform which does not support [E=x:y] yet

         

ZydoSEO

7:54 pm on May 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So unfortunately my company is a Microsoft shop - W2k3S, IIS, asp/aspx, SQL*Server. We purchased ISAPI Rewrite 3.x recently which uses the same syntax and offers almost all of the same functionality as mod_rewrite. However, there are a few differences such as they do not currently support a couple of the flags available in mod_rewrite.

I'm trying to create an httpd file to handle the typical canonical URL issues. I have seen posts from jdMorgan and others about what this might look like in mod_rewrite. I like jdMorgans posts on the topic (like [webmasterworld.com...] have read through them and understood 98% of what he was doing, but they all seem to use the [E=variable:value] flag which unfortunately, ISAPI Rewrite does not yet support.

If you had to implement the following set of rewrite rules without using [E=var:value] to set environment variables in order to pass values from RewriteRule to RewriteRule, how would you recommend doing it?

http://example.com --> http://www.example.com/ (fix www and trailing '/')
http://example.com/ --> http://www.example.com/ (fix www)
http://example.com/default.asp --> http://www.example.com/ (fix www and drop default.asp)
http://example.com/x/y/default.asp --> http://www.example.com/x/y/ (fix www and drop default.asp)
http://www.example.com --> http://www.example.com/ (fix trailing '/')
http://www.example.com/default.asp --> http://www.example.com/ (drop default.asp)
http://www.example.com/x/y/default.asp --> http://www.example.com/ (drop default.asp)

The only way I can think to do this is to have a RewriteRule and associated RewriteConditions for each of the above cases. However, I want to minimize the # of 301s. I don't really want to add 'www.' if it's missing and 301, only to possibly then need to drop the 'default.asp' and 301 again. So even this presents a challenge if I can't set environment variables to flag that at some point later after more RewriteRules have run I need to do a single 301 after everything about the URL that needs to be changed HAS been changed. So the ordering of the above rules will be important.

Although not optimal from a performance perspective I am sure, I thought I would create a series of generic RewriteCond statements that check for specific things which I could then piece together to deal with a single case. For instance:

RewriteCond %{HTTPS} (off) to check for HTTP
RewriteCond %{HTTPS} (on) to check for HTTPS

RewriteCond %{HTTP_HOST} ^(?!www\.)(.+) to check does NOT begin w/ 'www.' and create a backreference to HTTP_HOST
RewriteCond %{HTTP_HOST} ^(www\.)(.+) to check DOES begin w/ 'www.' and create a backreference to HTTP_HOST minus the 'www.'

Also RewriteCond for things like:

REQUEST_URI ends in '/default.asp'
REQUEST_URI does NOT end in '/default.asp'

Though not necessarily the most efficient method of rule writing, it would then seem easy to piece together a rule using the above RewriteCond statements. I'm seeing an httpd file that is similar to the following psuedo code:

# if http and HTTP_HOST is missing 'www.' and URI ends in '/default.asp'
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteCond to check if REQUEST_URI ends in '/default.asp'
RewriteRule to add www to non-www HOST backreference and drop default.asp and append QUERY_STRING [R=301,L]

# if http and HTTP_HOST is missing 'www.' and URI is NULL (homepage no trailing '/')
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteCond to check if HTTP_URI is NULL (^$)
RewriteRule to add www to non-www HOST backreference and add '/' and append QUERY_STRING [R=301,L]

# if http and HTTP_HOST is mising 'www.'
RewriteCond to check if HTTP
RewriteCond to check if HTTP_HOST is missing 'www.' - create backreference to non-www HOST
RewriteRule to add www to non-www HOST backreference and add HTTP_URI and append QUERY_STRING [R=301,L]

etc.

Am I totally off base? Admittedly, I'm a noob at rewrites LOL... I know there are lots of concerns around how long it takes to process the rules. I'm sure there is an easier way to minimize the # of 301s required to clean up a single URL (preferably only 1) while still minimizing the processing speed of the request. But not having the [E=var:value] flag has thrown me a curve in that regard.

Any assistance would be appreciated.

Thanks in advance!

[edited by: ZydoSEO at 7:56 pm (utc) on May 15, 2008]

jdMorgan

8:43 pm on May 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The method you outline above, "exploding" all possible cases into separate sets of rules, is the only option capable of avoiding multiple "stacked" redirects if you can't create vars.

However, you could also handle some of the rarer-error cases simply by putting your rules in most-specific to least-specific order: For example, if you handle all of the common cases first, and then end up with say HTTPS and a missing www subdomain, then you can redirect to [www...] regardless of the subdomain. In other words, only the really badly messed-up URLs would actually result in multiple redirects.

Be aware that --at least in Apache-- the "!" NOT operator is part of mod_rewrite, and not part of regular-expressions. Therefore, it cannot be used inside a pattern. It can only be applied to an entire pattern. Furthermore, you cannot back-reference a negated pattern. Apache also does not support the "minimum greediness" or other special regex flags available in scripting languages.

You'll probably need to use two RewriteConds


RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTP_HOST} ^([^.]+)\.example\.com

to create a back-reference to a NOT-www hostname.

Jim

ZydoSEO

12:35 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the reply Jim. I had planned on ordering the rules from most specific to most general. That is why I had mentioned above that the ordering of the rules would be very important and in my example I tried to list them in that order.

Though not ideal, it's nice to know that I wasn't way out in left field as to my remaining choices for solutions. Hopefully ISAPI Rewrite will soon support the [E=var:value] flag.

I enjoy all of your posts BTW. They are always very informative. You are an awesome resource for everyone here at WebmasterWorld.

[edited by: ZydoSEO at 12:37 pm (utc) on May 16, 2008]

ZydoSEO

2:33 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PS: We were correct on both counts Jim! ISAPI Rewrite appears to work the same as mod_rewrite in that:

1) The ! can only apply to an entire pattern (and is not part of the regex pattern itself). I noticed in the log that even when my RewriteCond was RewriteCond %{HTTP_HOST} !^(www\.) [NC] the pattern was logged as 'www\.' and

2) As you said, it does not support creating backlinks on lines where you use the '!' operator.

I certainly wish there were a list somewhere of these little 'quirks'.

Thanks again for the help.

jdMorgan

2:52 pm on May 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could make a list, since both of these points are mentioned in the documentation... :)

Jim