Forum Moderators: phranque
I have been working via Webmasterworld pages on getting some of my sites correctly canonicalised - some great advice from jpMorgan, then I picked up a thread on the /index rewrite to start cutting off the enemy at the pass. Best forum around for this stuff.
The following is my current (simple) Mod rewrite, and I am still confused as to why the capitalisation in the domain doesn't get forced to lower case.
I assumed that www.EXAMPLE.COM would be forced to www.example.com - doesn't seem to work that way.
Rule 1 below deals with the index.htm(l) problem nicely, Rule 2 with the www. vs. non-www very well, and in combination nthey also work.
But I still have capitalisation issues - I don't mean inside the site or with indiviudal URls - different issue. I mean the capitalisation of the domain name and tld
I have read and think I understand the capitalisation discussion on webmasterworld, so I don't think that is the issue. I have also messed with adding [nc] to different lines and still can't make headway.
Options +FollowSymlinks
RewriteEngine on
rewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/ [nc]
rewriteRule ^index\.html?$ http://www.example.com/ [R=301,L]
rewriteCond %{HTTP_HOST}!^www\.example\.com$
rewriteRule (.*) http://www.example.com/$1 [R=301,L]
Any ideas from the experts? Have I misunderstood how far you can go here?
Thanks
I suggest using the Live HTTP Headers extension to Firefox to actually observe the HTTP request and response headers. This is not only a great test tool, but also a good learning tool. It allows you to see the raw requests your browser sends, and to view the server response headers.
Jim
[added]
Do not end-anchor the domain in this line:
rewriteCond %{HTTP_HOST} !^www\.example\.com$
the line should read either
RewriteCond %{HTTP_HOST} !^www\.example\.com
-or-
RewriteCond %{HTTP_HOST} !^www\.example\.com(:[0-9]+)?$
This will prevent the rule from failing if a port number is appended to the domain -- a perfectly-valid possibility.
[/added]
[edited by: jdMorgan at 8:25 pm (utc) on May 8, 2007]
I also posted this in error in another area
[webmasterworld.com...]
and had some interaction with AjiNIMC and Tedster.
I have been reading your material
[webmasterworld.com...]
A guide to fixing duplicate content & URL issues on Apache
and was concerned with caplitalisation - I now understand (I think) that the main capitalisation issue is after the "/", not in the domain name.
So www.MyFavouriteDomain.com (which looks better in print for humans than www.myfavouritedomain.com) is fine to use, and would resolve with any lower case/uppercase mixture anyway, but
www.MyFavouriteDomain.com/NextPage.htm
needs to be converted to a lower case version such as
www.MyFavouriteDomain.com/nextpage.com
when designing or naming pages.
My last remaining question (in the other thread) is why, for example, www.google.COM would produce a PR of zero, whereas www.google.com produces a PR of 10 (in the google toolbar) - this apparent removal of PR also occurs in my own domain.
Do you have any thoughts? AjiNIMC suggested a possible bug in the google toolbar.
Thanks again
Bryan
Options +FollowSymlinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html?\ HTTP/ [NC]
RewriteRule ^index\.html?$ http://www.example.com/ [R=301,L]
#
RewriteCond %{HTTP_HOST} !^www\.example\.com(:[0-9]+)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
It's good to worry about canonicalization, but the above two rules will take care of 99% of actually-common problems.
Other things to look at:
If (and only if) your site uses static URLs, you may want to remove spurious query strings from requests.
If you see a lot of bad links from forums that can include a period following an auto-linked URL at the end of a sentence, then you might want to address those, too.
One example is [webmasterworld.com....]
Another is [webmasterworld.com...] <- Note that the periods are auto-linked by the forum software, and will appear in the URL.
You posted that you'd read the "canonicalization guide" thread in our forum library; These issues are covered in that thread.
Jim