Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

URL Canonicalization

Penalty in Google or other search engine?

         

Suman

5:11 am on May 23, 2006 (gmt 0)

10+ Year Member



What is url canonicalization? Do canonicalization cause a penalty in Google?

asusplay

8:32 pm on May 23, 2006 (gmt 0)

10+ Year Member



for my understanding it's different versions of a website that can site on a sinle domain or page.

For example, to most of us [domain.com...] and [domain.com...] would be ame thing, but google sees them as different domains so you now need to redirect your non www version to your www version with 301 redirects.

Also your homepage can be indexed as www.domain.com/index.asp and google sees it as being different to your homepage www.domain.com and Google can give you a duplicate penalty. This has happened to some of my sites I think and because I am on shared hosting I can't fix it. Google is splitting ends with this. :(

bobby_boy

8:09 am on May 24, 2006 (gmt 0)

10+ Year Member



Just seen today my main homepage URL has gone from www.abc.com to abc.com. This has coincided with me losing my main rankings. We have never used this URL in any links so I can't understand why Google is now choosing it (it also has a lower PR than the www. version).

I read on the boards here that the only way to remedy this is to do the 301 redirect. Before I do this I just wanted to check that it was still the right way of doing things. This site has been 4 years in the making and was starting to rank really well before today. I'd hate to screw it all up by doing something wrong now.

Thanks in advance for any advice.

trinorthlighting

4:19 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jeez,

You would think with all the very smart engineers working with google, they could fix the issue on their end easily and not have us webmasters do thousands of 301's

How come msn does not have issues like this?

texasville

4:36 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>>This has happened to some of my sites I think and because I am on shared hosting
I can't fix it. Google is splitting ends with this. :( <<<<<

Contact your hosting service and explain it to them. I did and they did a 301 redirect for their entire system. I think it took them no time at all.

>>>You would think with all the very smart engineers working with google, they
could fix the issue on their end easily and not have us webmasters do thousands
of 301's <<<
Try MILLIONS...and I really think it is arrogance on G's part. Part of a subtle plan to get us to jump thru their hoops. Conditions us to run around and whatever they say. Eventually we may have to make a choice between G and all other search engines as they may demand us to do something that is incompatible with other se's.

Stefan

5:53 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We have never used this URL in any links so I can't understand why Google is now choosing it (it also has a lower PR than the www. version).

Someone linked to that URL, G followed it, and when they got there, if you have relative links, they followed internal navigation and thought all the pages are non-www. They got 200's on them because of the way your server is configured. If the non-www link was from a site with high-PR, it caused the new version to outrank the proper version. Visible PR only updates occasionally, so you can't tell that way.

trader

6:05 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



...so you now need to redirect your non www version to your www version with 301 redirects.

I know the experts (including Google's Matt Cutts in his blog) also recommend doing it that way but always wondered why do it that way? Why not go www to non-www?

Lately I have been eliminating the www when submitting to the SE's and also in my links.

The reason is I am doing some educated guessing that more and more websurfers are typing into the address bar the non-www domain vs in the past when most probably used the www for typeins (myself included when surfing using the WWW not too long ago).

This is due to many thinking the www was required for the site to work but most people now realize it most always works both ways, and eliminating the www means less time spent typing and lower chance of typo errors. Any thoughts on this?

tedster

6:15 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



but most people now realize it most always works both ways

Than not my experience. I still see loads of calls for www.subdomain.example.com -- most people who are not working on the web (even part-time "webmasters") regularly still automatically type in the www.

trader

6:36 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right, should have sent some or many people not most.

I am sure you are right Tedster but I think the trend is changing direction. A survey showing the percentages doing it the 2 ways would be quite interesting, especially comparing between a few yrs ago and today.

Anyway, does it make any real difference if we go in either direction as far as the forwarding and links are concerned or is it really a non-issue? In other words, is there a valid reason for the website owner to only use the www version?

TerrCan123

7:06 pm on May 24, 2006 (gmt 0)

10+ Year Member



I just noticed when I check the cache of my homepage mysite.com it shows the cache of www.mysite.com

I know this because I just changed some relative links on my homepage [also says it at the top of the page], maybe this is how to tell which google considers the index page, by the cache? I can't find a cache of mysite.com, only www.mysite.com [so my site is OK is what I am saying].

[Edit] Interesting, I just found they do have an April cache of mysite.com with the old linking so this doesn't work.

g1smd

7:37 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Set up the 301 redirect.

That is the most basic WebHosting 101 stuff.

Shurik

8:48 pm on May 24, 2006 (gmt 0)

10+ Year Member



Lately I see increasing number of googlebot requests for URIs with the wrong capitalization (all lower case). I answer them with error code 410. I suspect it is adversely affecting my ranking in google.

Does anyone know how google handles capitalization in URLs. According to RFC 1945 the hostname portion of URLs is case-insensitive but the rest of it is case-sensitive. Are google cannibalization rules compliant with HTTP RFC? What should be the proper server response for incorrectly capitalized URL?

eeek

8:57 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lately I see increasing number of googlebot requests for URIs with the wrong capitalization (all lower case).

Is the domain part or elsewhere?

Shurik

9:19 pm on May 24, 2006 (gmt 0)

10+ Year Member



Elsewhere. Example:
www.domain.com/abc.html vs. www.domain.com/Abc.html
When googlebot asks for the first URL my server replies 410, in the second case it gets 200.

TerrCan123

9:22 pm on May 24, 2006 (gmt 0)

10+ Year Member



I tried doing a 301 redirect with .htaccess with the following-

Redirect 301 / [mydomain.com...]

and it works for the homepage. However when I type in
mydomain.com/internalpage.htm it doesn't forward to www.mydomain.com/internalpage.htm but sits there and loads forever. Any way to fix that now?

g1smd

9:22 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sites using Apache are immune to capitalisation problems. All filenames are case sensitive.

Sites using IIS have major problems to contend with. Capitalisation is ignored by the server, so any capitalisation in an inbound link will return a file from the site, creating massive duplicate content problems when all the variations are indexed.

Shurik

9:55 pm on May 24, 2006 (gmt 0)

10+ Year Member



Capitalization is ignored by the IIS server creating massive duplicate content problems...

G1smd, are you saying that i can kill all my corporate competitors who just happen to run IIS servers by mass linking to them with upper case links? I wish it was that simple :)

BTW IIS 6.0 is a great platform and you can do anything you want with it if you know what you are doing.

TerrCan123

9:55 pm on May 24, 2006 (gmt 0)

10+ Year Member



OK I changed .htaccess to this and now it works correctly. Hope that works with Google-

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^yourdomain\.com
RewriteRule ^(.*)$ [yourdomain.com...] [R=permanent,L]

[Edit] I also used a server header checker tool to make sure all pages were doing a 301. Works good!

[edited by: TerrCan123 at 10:17 pm (utc) on May 24, 2006]

g1smd

10:12 pm on May 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That is the puppy. It redirects site-wide. Just what you need.

trader

1:43 am on May 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps g1smd (who I know is a long time expert around the web) or Tedtser could answer the issue of does it make any real difference if we go in either direction (www or non-www) as far as the forwarding and links are concerned, or is it really a non-issue? Is there a valid reason for the website owner to only use the WWW and not the non-www?

tedster

2:42 am on May 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No real difference, in my view. As you said, a non-issue.

bobby_boy

8:19 am on May 25, 2006 (gmt 0)

10+ Year Member



Hi,

Just thought it was worth posting back. We added the 301 redirect on all non www pages yesterday morning and today the site is back in its previous rankings as if nothing had happened (showing the correct www version of the homepage).

The only sign of the non www is when I search directly for the URL but now the www version is showing on a URL search too which was not the case yesterday.

Maybe its a coincidence but the 301 redirect is the only thing we changed.

TerrCan123

6:10 pm on May 25, 2006 (gmt 0)

10+ Year Member



Just looking through the pages of my site indexed in google I see that all the 900 pages or so are www.mysite.com/content.htm pages and the only one that doesn't start with www is the index page in the results. It shows they have mysite.com not www.mysite.com.

Anyway I fixed that now with .htaccess so in a month or two hopefully it will filter through to the results. I see they cached the mysite.com page 4 days ago so I only hope that isn't the start of the next indexing!

g1smd

8:37 pm on May 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> choose non-www or www?

My take is that you buy a domain at domain.com and then get your various services online at ftp.domain.com and mail.domain.com and www.domain.com and newtechnology.domain.com, etc.

I usually assume that a website URL begins www.. but there are many exceptions of course.