Forum Moderators: open
Kind of confused that www.widgetscatchy.com site had a PR5 so checked the incoming links and for some reason when I check the links to this site is shows www.widgets.com's links instead of it's own. Even when listing the site Google states 'Searched for pages linking to AYdabadfa:www.widgets.com/' instead of 'Searched for pages linking to AY4cSZStU-0J:www.widgetscatchy.com/'
The sites are using the same hosting company but they're both two completely seperate accounts and have completely different content.
Why has Google amalgamated these two sites links? I'm just slightly worried that Googlebot will drop the pair of the sites from the index if it decides that the two sites are the same.
Any ideas guys?
Cheers
Chris Holgate
There are plenty of reasons to have a bunch of parked domains - misspellings, alternate languages, synonyms, or related domain names purchased for speculative purposes or future development. If unused for original content, these domains might as well be pointed somewhere as opposed to returning an error (to catch type-in traffic), so Trawler pointed them to his primary domain. Trawler's dilemma was that Google decided to index them all as copies of his primary site, which was highly undesirable.
Hope that clarifies things...
<added>Whoops, inadvertently added troll food... Brett posted before I finished...:)</added>
In the case of whois and logs, could this mean that Google could be harvesting links even if they are not encapsulated in an href tag?
RewriteEngine on
RewriteCond %{HTTP_HOST}!^$
RewriteCond %{HTTP_HOST}!^www\.mydomain\.com [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]
---
We checked the logs and determined that 8 or 10 domains do have old links. Followed them backwards to stale pages
(would you believe most of them 1997 content)
We purchased these domains about 5 years ago and have had them parked ever since. I can see if google followed the stale content pages they might have indexed the 8 or 10 domains with links, but how would this alter the fact that the 226 domains with no links got indexed also.
There is no way we exposed a list of these that a spider could follow. We were very careful in that regard.
Taking this one step further.
We run a search site with a thousand or so domains pointing to it. The site is not search engine friendly, in other words we disallow all spiders as the traffic from the domains is sufficient for our purposes.
Lots of bots visit us, no problem, no-one indexes. UNTILL Dec 6th. Guess who! GOOGLE. Even with a dissalow, the site got indexed. I don't understand it, we ran that site for the past 4 years and never had a problem there with any search engine. Untill now.
I believe that Google has run amuck with regards re-directs.
The only possible explantion that I can think of is google is looking for the dissalow from the entrance domain, not the destination domain. Without it, they index.
A-- backwards in MHO, but then again, what do I know?
I believe it's the whois, and google is experimenting with it just as they are with the serps on the various data centers.
Generally:
Always make sure that you do not confuse the bots, its soo easy. If you have more than one domain, be absolutely sure that you only use one and that all others redirect to this one using a 301 unless you are forced to doing something else for some odd reason - if so, make an effort to change that odd reason. Never link to any of your vanity domains yourself (especially not from the site that this vanity domain links back to).
There's no room in a search engine for two URL's for one website - if there is more, all but one of them have got to go.
DNS redirects
If you make a combined DNS A record, like having several domains pointing at the same IP address, make sure you have those 301's in place, otherwise the results are like a 302 at best and random at worst. I personally prefer to have one A record only, and make the rest CNAME's for that domain - in my experience Google can interpret this similar to a 301.
<minor rant>Please do take these matters seriously. There's no way you would ever want to mess with more than one domain unless you are absolutely confident that you know what you are doing, are utterly consistent, and check-and-doublecheck for errors in your setup as well as Google's handling of it.</minor rant>
-------------------------------------
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com [NC]
RewriteRule ^(.*)$ [mydomain.com...] [R=301,L]
-------------------------------------
Link 1: Moving websites to a new central domain [webmasterworld.com]
Link 2: Site change of URL [webmasterworld.com]
3) Apache Web Server forum: [webmasterworld.com...]
Hope this helps.
/claus
(*)Note: Removed check for an empty HTTP_HOST - i don't understand the point of doing that.
[edited by: claus at 12:37 am (utc) on Dec. 31, 2003]
Perhaps you have it set up so that a domain without a robots.txt is mirroring/proxying the content or something like that? There's a thousand ways to do these things, and then some.
Also, some cases of 302 Google bugs involves 302 redirecting links from redirect scripts on domains that disallow Google (when these links are copied to/mirrored on other domains that allow Googlebot). Such links should get ignored, as they can't be followed - i don't know if gbot does this yet, but it ought to.
/claus
Also, some cases of 302 Google bugs involves 302 redirecting links from redirect scripts on domains that disallow Google (when these links are copied to/mirrored on other domains that allow Googlebot). Such links should get ignored, as they can't be followed - i don't know if gbot does this yet, but it ought to.
----
I think you hit the nail right on the head. This is exactly what I think part of the problem is. But!
No matter how you cut the cake, even with a 302 bug, if the destination site has a disallow, it should not be indexed. No Matter what! That also is a bug.
Even this still doesn't explain how they got the 226 domains that had no links at all. That to me -- walks like, smells like, and most likley is, the whois and their experiments.
There is NO WAY that this problem (to the extent that it now is) existed before Florida. If it was, managing the 5000 or so domains we manage, we would have seen it. For the last five years of my experience, parking re-directs set up with proper scripting on the server, were never indexed. Now, it seems they are.
It's a bug that came with Florida, and one that needs to be fixed fast or they really will have a trashed index.
There is no way I am going to 301 - 5000 domains. As far as I am concerned I am telling them by my disallow to stay the hell out of the site, if they come in, that's their problem.
Why should any webmaster have to set up multiple 301s to appease the gods when one simple dissalow does (should and did in the past) the same?
In reality, we really don't want their indexing anyway.
Google - Go Away!
Why should any webmaster have to set up multiple 301s to appease the gods when one simple dissalow does (should and did in the past) the same?
If using the 301 method detailed above it will redirect ANY of your parked domains. You do not have to redirect them individually. 3 lines of code, thats it.
If using the 301 method detailed above it will redirect ANY of your parked domains. You do not have to redirect them individually. 3 lines of code, thats it.
---
Yes, But On Hundreds of servers!
Why should I? As far as the rest of the search engine world goes disallow means to stay out. Others SE respect it, why should google be any diferent?
They honor it from a straight through link, why not a 302?
And, to my knowledge they always honored the disallow (on a 302) prior to Florida, why not now.
It's their problem not mine.
If they want to place 5000 pages of the same content into their database than that just goes to show you how screwed up they really are.
Others SE respect it
In our experience the other SE's have been very poor with eliminating parked domains however Google was always excellent at it. Since Florida that has not been the case. Our problem stems from some links we created before we knew what we were doing. They will be removed.
>> make the rest CNAME's for that domain - in my experience Google can interpret this similar to a 301
The CNAME thing is not the reason - i always had real 301's on those domains as well, so it's got nothing to do with using C in stead of A records - the 301 rule does it in both cases. I'm sorry i added a bit of confusion here.
>> no way I am going to 301 - 5000 domains
There's a lot of ways you can configure this. As vrtlw wrote, you just have to put the code on one server - the one that holds the destination domain (or the destination IP). All the 4999 others that point to it (if that is the case) should do so by pointing to the same IP (using the DNS). Then, when a redirect domain is hit, the content on the IP from the destination domain is shown in the browser. This means that the redirect rule will get executed and the redirect domain will switch to the destination domain.
Of course, if you want the redirect domain to stay in the address bar in stead of the destination domain, you cannot use this method. But in that case you will be operating 5000 identical sites in stead of one site with 5000 vanity domains - Google tends to believe what the address line says, imho, and then you'll have to have 5000 robots.txt files in stead. Anyway, there are so many ways you can configure things, but that's a separate discussion.
>> how they got the 226 domains that had no links at all
Some of them had old links way back to 1997 you wrote... also, Google has backups of old/past indexes, afaik. I don't believe they use whois/dns across the board, but they might use it in some selected special cases and even run experiments.
/claus
Thank you for your enlightened posts on this subject, I find them most interesting and informative.
Yes there are many ways to structure a re-direct.
One of the largest domain holders in the world (in excess of 125,000 domains)runs a search portal much like what we are doing.
It is my understanding, although I cannot confirm this with first hand information and don't take this for concrete factual info, that they use straight 302 re-directs.
Did they have problems with google after Florida? I believe so. Never before to my understanding, but I believe so after.
Not anymore, as they are using approved SE methods (disallow) on the site that receives the re-direct, they simply informed google to remove the indexed content from their database or face a copyright lawsuit.
In reality, every time google indexes a site that contains disallow they are infringing on another's copyrighted material. After all, they (the SE) are using the content (in their results display) to further their monetary gains.
That's exactly why the disallow method was put in place, it was put in place foremost as a way to protect copyrighted information from being indexed and used by others, and secondarily, to simplify the indexing of the web.
Now if google doesn't understand that, or want to understand that, then perhaps they deserve to get their database trashed and who knows, even get bombarded with lawsuits in the process.
I am not trying to be the devil's advocate here but if I have a disallow on a site, no one, not even the great google gods has a right to index the content, UNDER ANY URL.
It really is as simple as that. 302 bugs aside, that has nothing to do with googles and others blatant disregard for the disallow condition.
If it is a bug, then they have both the moral and legal obligation to correct it. Not the webmasters of this world.
My customer got this one parked domain to make it easier for him to give it to people when he's out on the weekend. He's not a spammer just can't spell his orginal domain after a couple of bottles of Royal Crown. Go figure!
Thanks vrtlw and claus, I think that rewrite fixed the problem.