Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Help deciphering this canonicalization code

   
2:28 am on Feb 26, 2013 (gmt 0)

5+ Year Member



Hi,

I just took over development of a site and am having trouble figuring out the canonicalization rewrite rules (not my forte). I've pasted the code below.

The first bit seems straight-forward enough, stripping out index.html. The second part redirects non-www TO www and also appears to be redirecting to NO trailing slash: www.example.com (no "/"). I actually can't confirm the trailing slash part though (header sniffers don't show the / redirect).

Does there appear to be a trailing slash redirect in place? (btw, this code is years old, so open to better, more efficient options).

If anyone could help, I'd really appreciate it. :)


----------------------------

RewriteEngine on
RewriteBase /

# REDIRECT INDEX.HTML
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

# REDIRECT TO WWW with no trailing slash?
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
5:37 am on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Where are you getting the slash or non-slash from? The rule captures the entire request, whatever it is, and reattaches it to the correct form of the domain name.

Which, incidentally, is only half correct. It should be
!^(www\.example\.com)?$ for "exactly this form or exactly nothing".

Since this is all happening in htaccess, it's .* rather than .+ because the front page is {null}. There's no leading slash.
6:10 am on Feb 26, 2013 (gmt 0)

5+ Year Member



Hi lucy, thanks so much for the reply. The whole trailing slash issue came about from the original dev. He mentioned he thought the 2nd part of the modrewrite essentially did the following:
example.com/ --> www.example.com

Thus, I wanted to make sure all internal linking fell in line.

Knowing very little mod_rewrite, not seeing a "/" at the end of
RewriteRule (.*) http://www.example.com/$1(no "/" here) [R=301,L]
... I thought that might've been what he was referring to.


If you have the time, I guess that brings up 2 final questions...

1) Is there any part of our rewrite code that does effect the trailing slash of the root URL? (removing or adding a "/")?

2) Appreciate the code suggestion -- just to confirm, do you mean change the RewriteCond line FROM:

RewriteCond %{HTTP_HOST} !^www\.example\.com

TO:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$


=========================

Again, thanks SO much for your time and help! :)
7:27 pm on Feb 26, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



#1 I don't know about "any part of" your code. I can only say that the two Rules you quoted have no effect on the slash.

The canonical name for any directory including the top level (front page) ends in / slash. But in the case of the front page it does not matter so much, because the browser itself will generally supply a missing / in the same way that it supplies "http" et cetera to the front of a typed-in URL. And things that happen in the browser don't come through as redirects so you need not trouble yourself about them.

#2 Yes, that's the optimal wording. You need the closing anchor to get rid of any requests that have a port number at the end, and you need the "make the whole package optional" question mark because HTTP 1.0 doesn't send the "Host:" header at all. Disclaimer: Relata refero. Based on my own header logs, 1.0 either doesn't send a visible header at all, or it sends the whole package including the Host: line.
12:37 am on Feb 27, 2013 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes, it is correct that there is no slash after $1.

For root request, when code is in htaccess, $1 is blank.
3:46 am on Feb 27, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



if you are conocerned about the trailing slash you should also look at the (mod_dir) DirectorySlash Directive:
http://httpd.apache.org/docs/2.2/mod/mod_dir.html#directoryslash
5:35 am on Feb 27, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



But mod_dir only applies if you're working with the name of a directory-- including the root --in the first place. Your domain-name-canonicalization redirect applies to everything.

You could put in yet another RewriteRule involving -d in the Condition and [^./]+$ in the body of the rule, but most people would consider it more trouble than it's worth. All you're doing is saving a duplicate redirect in the case where the original request was wrong in these two specific ways-- and how often does that happen? Show me a missing directory slash and I'll show you the MJ12bot ;)