Forum Moderators: phranque

Message Too Old, No Replies

htaccess 301s

Canonicalize hostname and /index pages?

         

bode

9:12 pm on Feb 1, 2009 (gmt 0)

10+ Year Member



I have the following code in my htaccess for redirecting the non-www to www on an Apache server

Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]

That's working fine, however the www.../index.html is not redirecting to www.example.com/

Also, when I click the homepage links (index.html) in my internal site pages, it arrives at www.example.com/index.html

All my incoming links from external sites arrive at www.example.com/

Will this cause a duplication penalty in google and should I have the www.example.com/index.html redirected?

[edited by: jdMorgan at 11:17 pm (utc) on Feb. 1, 2009]
[edit reason] example.com [/edit]

g1smd

9:38 pm on Feb 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need an additional rule for the index URLs. That rule goes before the one you have now.

That new rule will also need to force www for just those requests so that you don't have a redirection chain.

Code for that has been posted several times in recent weeks in this forum.

For your existing rule you also need to fix the casing of the various parts of the rule as per the Apache docs.

Additionally, you need to link to "/" and to "/folder/" within your site, and not to URLs that include the index file filename.

jdMorgan

11:14 pm on Feb 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And what about redirecting "http://this-is-a-fraudulent-site.example.com" to www.example.com to avoid having the former appear in search as the result of an 'unkind" competitor's link? :)

What about appended FQDN indicators or port numbers? -- www.example.com.:80 is perfectly-valid but non-canonical...

I'd suggest:


Options +FollowSymlinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.(s?html?¦php[456])(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(s?html?¦php[456])$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

In the first rule, THE_REQUEST is the request line sent by the client (e.g. browser or search robot) exactly as it appears in your raw server logs. It is checked here to be sure that the /index file was requested directly by the client and not as the result of the action of mod_dir rewriting a request for "/" to the actual index file. This prevents the 'infinite' rewrite/redirect loop that would otherwise occur.

The pattern in the RewriteCond and Rewrite rule accepts a request for /index.html, /index.shtm, /index.php, /index.php5, etc. in *any* directory, and the rule redirects the client to "/" in that same directory.

Replace the broken pipe "¦" characters with solid pipe characters before use; Posting on this forum modifies the pipe characters.

In the second rule we require an *exact* match on the hostname or a blank hostname. Accepting a blank hostname prevents putting your server into an infinite loop if you ever get a request from an HTTP/1.0 client, which won't ever send a "host" header.

Jim

[edited by: jdMorgan at 2:08 am (utc) on Feb. 2, 2009]

bode

1:25 am on Feb 2, 2009 (gmt 0)

10+ Year Member



Brilliant thanks! Worked a treat.

The solid pipe is a capital I. CTRL and \

First time I've come across that code which addresses all the variations. Is it still necessary to change my site's links pointing to the homepage? Or does this code look after that?

Cheers

g1smd

1:57 am on Feb 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Links should contain the URL that you want users to 'see' and 'use'.

It is those links that 'define' URLs.

See my post above.

jdMorgan

2:06 am on Feb 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, it's necessary. The purpose of this code is to "clean up," and not to "repair."

Run a very tight ship, and you won't find yourself posting in the "sudden mysterious search rankings drop" threads here... :)

Basically, any kind of sloppiness in your linking or your server's handling of requests that "aren't quite right" is an opportunity for another Webmaster to make matters worse for you, whether unintentionally or otherwise. If an "imperfect request" can be unambiguously detected and corrected, then 301-redirect it to the correct URL. If not, a 404-Not Found response is likely most appropriate.

Note to all readers: The solid pipe character mentioned above varies in both presentation and in required keyboard keying based on your browser and operating system character-set and character-encoding. The desired character is the US-ASCII or UTF-8 hexadecimal character-code %7C.

Jim