Forum Moderators: open
It wouldn't be beyond Google's boffs, if they really wanted to sort it out, as well as free up some of their index space, to design a simple text document to be placed in the root directory:
e.g.
googlewww.txt
www true
notwww false
or some similar syntax. It is not a 301, just an instruction to Google to delete a dupe / amend an incorrect listing on the fly. It could even be added to the existing robots.txt within a commented out line.
Think it's worth stickying GoogleGuy with this?
GoogleGuy many people have asked you to comment on this. Any chance of doing so - pleeease :o)
It is obviously a SERIOUS issue and one which should be addressed ASAP.
Added (Sorry SyntheticUpper! I was typing this when your went in but at least we are of the same opinion, i.e. that G should fix this.)
I had problems in Dom/Esm with that www stuff because of one incoming link without the www. I got it changed, then everything was fine until about three months ago. Google suddenly decided to start crawling site.org. As far as I can tell, there are no incoming links in that form... I think I might have caused the problem myself via the toolbar, checking to see if there was anything in that form. Ironic, eh?
I have shared hosting, the company claims I can't do a server-side redirect, so I figured, I'll force the bloody bot to find nothing but www.site.org. It took several days pasting absolute links into all 200 pages. It worked though. I now have googlebot occasionally check for the index page on site.org and that's it. I get about 25% of the site crawled daily, all with a www on the front of the URL.
I kept meaning to mention this rather crude but effective approach before... finally got around to it.
I need to understand one thing a little better, because I also see example.com for my site as well as the usual www.example.com, and I don't want to get hit by the disasters discussed in this thread:
What does it take for Google to view example.com as the correct site and www.example.com as a duplicate? Does it simply take somebody mistakenly linking to example.com, which would cause Google to spider it? (Right now I have a white PR bar for my example.com)
I use only absolute URLs in my own internal links, so maybe I'm safe. I don't know what 301's and 404's and such things are, but I guess I'll have to learn.
Does it simply take somebody mistakenly linking to example.com, which would cause Google to spider it?
Yep.
In my case, I don't seem to have any incoming links to site.org. Jah only knows where Google got it from, (me and the toolbar maybe), so the change from relative to absolute links cleaned things up. I don't how much of a help this would be if you have incoming links from high PR pages to the wrong URL.
BUT: if people are seeing extreme problems, please drop me a report (either email to webmaster [at] google.com or via a spam report) with the keyword urlcanonicalization as one word. I'm happy to pass those reports on the the crawl group to make sure that they group them together, and check out if our canonicalization has developed any problems..
better yet, they should actually read about DNSTo realise that example.org is a subdomain of org, and www.example.org is as well a subdomain of subdomain example.org
Now who thinks all .com sub-domains are the same, thankfully G et al do not
A very selective interpretation on your part.
Did I say that all .com sub-domains are the
same? No, I did not.
Clearly, in the present case where the alias and
canonical name resolve to the same ip, and
the right hand hierarchy matches to at least two
levels, and the content is the same,
then it is a strong indication that these
are one and the same site. For this situation
to result in a duplicate penalty is clearly not
the fault of the site.
DNS&BIND, ch. 15, demonstrates a lookup on
by setting the query type to "cname" in nslookup.
The returned data, should the name be an alias,
returns the canonical properly noted. The code
to do this is widely published in source form,
as it is the code for nslookup.
I am suggesting that if and only if:
alias ip == canonical ip
&&
second level == second level
&&
alias content == canonical contentthen Google should realise that this is not
a duplicate
I am not suggesting that all sites on a single
ip should be treated as the same. Although,
as dns is, generally, a means of mapping a
name to a numeric address, and a that numeric
address is the smallest unit of resolution, an
argument could be made for that case, name based
hosting of RFC 2616 notwithstanding.
Once again, for the umpteenth time,
this is not rocket science.
++
This .htaccess entry worked for me ...
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST}!^www\.mydomain\.co.uk
rewriterule (.*) [mydomain.co.uk...] [R=permanent,L]
Note that there is a space between ...HOST} and ^www\.mydomain. Which may not appear in this post.
BOL!
RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*)$ [example.com...] [R=permanent]
it works! thanks to all!
canonicalization
Does this infer that our DNS should be setup with the www. as a cname (canonical name)? Perhaps people should check with their hosting company about how their DNS is setup, this seems like a firm indication not to setup sub domains as A records. (as per DNS & BIND by O'Reilly)
I tried a 301, but all my pages dropped and I chickened out. Do you think I could ask Google to manually remove some not-www pages - will this result in the www pages disappearing too?
My not-www and www pages are exactly the same pages - how can I apply robots.txt to one and not the other?
Most website are hosted virtually, ie they share ip address with other sites. Web servers do not confuse content installed on same server by having specific directroy to be served to one or more host header value
Though i am not familiar with apache, i am almost sure you can set it up to serve different content for your www and non-www site, if these have been created in dns. My earlier post would give you an idea how to do that on IIS. Tho that is not my suggestion. mine:
i would not offer a site @ example.org to force everyone to use/link to www.example.org
even if noone links to example.org, bots may crawl it by toolbar-enabled visits, or even mention of none-www in an email address. later sounds strange but Google did that to one site imo
very useful posts by Jim and Pageoneresults, please read them
very useful posts by Jim and Pageoneresults, please read them
Agreed but there is a distinction between Jim's post and GoogleGuy's that is significant.
Jim:
They just "dumb down" the process, set up both domain variants with A records, and keep mum on the subject.
GoogleGuy:
BUT: if people are seeing extreme problems, please drop me a report (either email to webmaster [at] google.com or via a spam report) with the keyword urlcanonicalization as one word.
I have never had a problem with www versus non-www listing and do not employ a 301 redirect for a solution to the mentioned problem. However my dns is setup with canonical (cname) records rather host (A) records.
I just did a DNS lookup on www and non-www versions of the site that I have a problem with and these are the results.
www.domainname.co.uk
Type: A
Class: IN
TTL: 43200
Answer: *.171.193.8
domainname.co.uk
Type: A
Class: IN
TTL: 43200
Answer: *.171.193.8
Exactly the same!
Does this say anything to anyone here or is there something else I should be looking for?
Best wishes
Sid
PS Its a IIs server and I have very little options other than move to another server.
If you want to get into the details, I'd suggest posting in the Microsoft forum here.
Although the causes of the two problems are different, as I see it, the solutions should be essentially the same.
Kaled.
Here is a discussion of how to redirect a domain such as example.com to www.example.com on IIS.
Thanks for that Roger.
The problem is that I have absolutely no way of gaining access at that sort of level. I have a couple of sites on this server which date back to when I had someone working on an asp solution for a couple of projects I had on the go. He was a reseller and I just got him to add a couple of sites really cheaply for me to use in experiments. One of these sites did well in SERPs and started sending my main site significan amounts of referrals, enough to make it well worthwhile moving server to one running Apache which I at least understand to a limited extent.
Those referrals will no doubt now dissapear as the site has dropped from Google so I may as well bite the bullet.
Best wishes
Sid
I have
www.example.com
www.example.co.uk
The DNS for both domains is pointing to my server, and there I have a apache host setting for .co.uk and the same for .com with the ".com" prefix instead of .co.uk
Now obviously both pages are the same,
www.example.co.uk has a PR of 6, and www.example.com has a PR of 5. Am I best off doing the 301 on the .htaccess file for the site, so that Google is told to only goto one address?
Thanks.