| Help deciphering this canonicalization code
|
patrick89

msg:4548869 | 2:28 am on Feb 26, 2013 (gmt 0) | Hi, I just took over development of a site and am having trouble figuring out the canonicalization rewrite rules (not my forte). I've pasted the code below. The first bit seems straight-forward enough, stripping out index.html. The second part redirects non-www TO www and also appears to be redirecting to NO trailing slash: www.example.com (no "/"). I actually can't confirm the trailing slash part though (header sniffers don't show the / redirect). Does there appear to be a trailing slash redirect in place? (btw, this code is years old, so open to better, more efficient options). If anyone could help, I'd really appreciate it. :) ---------------------------- RewriteEngine on RewriteBase / # REDIRECT INDEX.HTML RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.html\ HTTP/ RewriteRule index\.html$ http://www.example.com/%1 [R=301,L] # REDIRECT TO WWW with no trailing slash? RewriteCond %{HTTP_HOST} !^www\.example\.com RewriteRule (.*) http://www.example.com/$1 [R=301,L]
|
lucy24

msg:4548894 | 5:37 am on Feb 26, 2013 (gmt 0) | Where are you getting the slash or non-slash from? The rule captures the entire request, whatever it is, and reattaches it to the correct form of the domain name. Which, incidentally, is only half correct. It should be !^(www\.example\.com)?$ for "exactly this form or exactly nothing". Since this is all happening in htaccess, it's .* rather than .+ because the front page is {null}. There's no leading slash.
|
patrick89

msg:4548897 | 6:10 am on Feb 26, 2013 (gmt 0) | Hi lucy, thanks so much for the reply. The whole trailing slash issue came about from the original dev. He mentioned he thought the 2nd part of the modrewrite essentially did the following: example.com/ --> www.example.com Thus, I wanted to make sure all internal linking fell in line. Knowing very little mod_rewrite, not seeing a "/" at the end of RewriteRule (.*) http://www.example.com/$1(no "/" here) [R=301,L] ... I thought that might've been what he was referring to. If you have the time, I guess that brings up 2 final questions... 1) Is there any part of our rewrite code that does effect the trailing slash of the root URL? (removing or adding a "/")? 2) Appreciate the code suggestion -- just to confirm, do you mean change the RewriteCond line FROM: RewriteCond %{HTTP_HOST} !^www\.example\.com TO: RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ ========================= Again, thanks SO much for your time and help! :)
|
lucy24

msg:4549152 | 7:27 pm on Feb 26, 2013 (gmt 0) | #1 I don't know about "any part of" your code. I can only say that the two Rules you quoted have no effect on the slash. The canonical name for any directory including the top level (front page) ends in / slash. But in the case of the front page it does not matter so much, because the browser itself will generally supply a missing / in the same way that it supplies "http" et cetera to the front of a typed-in URL. And things that happen in the browser don't come through as redirects so you need not trouble yourself about them. #2 Yes, that's the optimal wording. You need the closing anchor to get rid of any requests that have a port number at the end, and you need the "make the whole package optional" question mark because HTTP 1.0 doesn't send the "Host:" header at all. Disclaimer: Relata refero. Based on my own header logs, 1.0 either doesn't send a visible header at all, or it sends the whole package including the Host: line.
|
g1smd

msg:4549267 | 12:37 am on Feb 27, 2013 (gmt 0) | Yes, it is correct that there is no slash after $1. For root request, when code is in htaccess, $1 is blank.
|
phranque

msg:4549307 | 3:46 am on Feb 27, 2013 (gmt 0) | if you are conocerned about the trailing slash you should also look at the (mod_dir) DirectorySlash Directive: http://httpd.apache.org/docs/2.2/mod/mod_dir.html#directoryslash
|
lucy24

msg:4549333 | 5:35 am on Feb 27, 2013 (gmt 0) | But mod_dir only applies if you're working with the name of a directory-- including the root --in the first place. Your domain-name-canonicalization redirect applies to everything. You could put in yet another RewriteRule involving -d in the Condition and [^./]+$ in the body of the rule, but most people would consider it more trouble than it's worth. All you're doing is saving a duplicate redirect in the case where the original request was wrong in these two specific ways-- and how often does that happen? Show me a missing directory slash and I'll show you the MJ12bot ;)
|
|
|