homepage Welcome to WebmasterWorld Guest from 54.237.54.83
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Help deciphering this canonicalization code
patrick89




msg:4548869
 2:28 am on Feb 26, 2013 (gmt 0)

Hi,

I just took over development of a site and am having trouble figuring out the canonicalization rewrite rules (not my forte). I've pasted the code below.

The first bit seems straight-forward enough, stripping out index.html. The second part redirects non-www TO www and also appears to be redirecting to NO trailing slash: www.example.com (no "/"). I actually can't confirm the trailing slash part though (header sniffers don't show the / redirect).

Does there appear to be a trailing slash redirect in place? (btw, this code is years old, so open to better, more efficient options).

If anyone could help, I'd really appreciate it. :)


----------------------------

RewriteEngine on
RewriteBase /

# REDIRECT INDEX.HTML
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/
RewriteRule index\.html$ http://www.example.com/%1 [R=301,L]

# REDIRECT TO WWW with no trailing slash?
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

 

lucy24




msg:4548894
 5:37 am on Feb 26, 2013 (gmt 0)

Where are you getting the slash or non-slash from? The rule captures the entire request, whatever it is, and reattaches it to the correct form of the domain name.

Which, incidentally, is only half correct. It should be
!^(www\.example\.com)?$ for "exactly this form or exactly nothing".

Since this is all happening in htaccess, it's .* rather than .+ because the front page is {null}. There's no leading slash.

patrick89




msg:4548897
 6:10 am on Feb 26, 2013 (gmt 0)

Hi lucy, thanks so much for the reply. The whole trailing slash issue came about from the original dev. He mentioned he thought the 2nd part of the modrewrite essentially did the following:
example.com/ --> www.example.com

Thus, I wanted to make sure all internal linking fell in line.

Knowing very little mod_rewrite, not seeing a "/" at the end of
RewriteRule (.*) http://www.example.com/$1(no "/" here) [R=301,L]
... I thought that might've been what he was referring to.


If you have the time, I guess that brings up 2 final questions...

1) Is there any part of our rewrite code that does effect the trailing slash of the root URL? (removing or adding a "/")?

2) Appreciate the code suggestion -- just to confirm, do you mean change the RewriteCond line FROM:

RewriteCond %{HTTP_HOST} !^www\.example\.com

TO:

RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$


=========================

Again, thanks SO much for your time and help! :)

lucy24




msg:4549152
 7:27 pm on Feb 26, 2013 (gmt 0)

#1 I don't know about "any part of" your code. I can only say that the two Rules you quoted have no effect on the slash.

The canonical name for any directory including the top level (front page) ends in / slash. But in the case of the front page it does not matter so much, because the browser itself will generally supply a missing / in the same way that it supplies "http" et cetera to the front of a typed-in URL. And things that happen in the browser don't come through as redirects so you need not trouble yourself about them.

#2 Yes, that's the optimal wording. You need the closing anchor to get rid of any requests that have a port number at the end, and you need the "make the whole package optional" question mark because HTTP 1.0 doesn't send the "Host:" header at all. Disclaimer: Relata refero. Based on my own header logs, 1.0 either doesn't send a visible header at all, or it sends the whole package including the Host: line.

g1smd




msg:4549267
 12:37 am on Feb 27, 2013 (gmt 0)

Yes, it is correct that there is no slash after $1.

For root request, when code is in htaccess, $1 is blank.

phranque




msg:4549307
 3:46 am on Feb 27, 2013 (gmt 0)

if you are conocerned about the trailing slash you should also look at the (mod_dir) DirectorySlash Directive:
http://httpd.apache.org/docs/2.2/mod/mod_dir.html#directoryslash

lucy24




msg:4549333
 5:35 am on Feb 27, 2013 (gmt 0)

But mod_dir only applies if you're working with the name of a directory-- including the root --in the first place. Your domain-name-canonicalization redirect applies to everything.

You could put in yet another RewriteRule involving -d in the Condition and [^./]+$ in the body of the rule, but most people would consider it more trouble than it's worth. All you're doing is saving a duplicate redirect in the case where the original request was wrong in these two specific ways-- and how often does that happen? Show me a missing directory slash and I'll show you the MJ12bot ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved