|Google using Whois data for Custom Search set-up?|
Ok this may not be new, or even very interesting to many, so apologies if it has been discussed before but it took me by surprise.
I own a network of sites, some of which I run admin for other people but all share my whois info. A few of these are registered in WMT and a few are not associated with Google in any way. I don't use GA at all incidentally.
Today I was setting up a Google Custom Search for selected sites. I added a few, set it up and tested it, then went back into Google to add more sites. When I clicked ADD SITE and started typing, a suggestion box dropped down, nothing odd there. What was odd - to me - was that the list of sites it was suggesting were exclusively mine. Not only that but it was a complete list of all the *developed* sites I own.
After the surprise wore off, I thought OK, so somewhere along the line I must have inadvertantly notified Google of each one, however it contains sites I only launched 2 weeks ago and in that list are two particular oddities:
Firstly, one of the sites is not a developed website per se. It is a domain I use for serving banners from a database to save bandwidth. Google must have followed an IMG link because it has attempted to index the non-existant front page (SERP extract shows a title of "Index of /" and description "Apache mod_fcgid/2.3.5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/184.108.40.20635" Server") but I have never used this domain in any other way shape or form other than to store images.
Secondly, the list of sites in the suggest dropdown are all simply root domains, listed like http://www.bluewidgets.com/ (etc). But one of them is a page on a site (http://www.example.com/something.php).
Well we know Google is a registrar I guess so maybe it isn't that surprising. I'd just never seen any obvious flags before now to suggest they mine and utilise that data. I can't think of any other way they could associate all these sites.
As far as network links go I'm not concerned at all - I link only where one site is relevant to users anyway but I am wondering what other impacts this belated 'discovery' might have? Any thoughts?
[edited by: tedster at 1:20 pm (utc) on Sep 13, 2010]
[edit reason] Make the example domains display clearly [/edit]
Something else to add to that for which I dont have a definitive answer.
The sites that showed in the suggestion box were all relevant to the market vertical I was setting up the search engine for. My sites in other verticals were not suggested. I can only put that down to the fact I described the search engine using the market terminology.
One minor note: the one domain in that vertical I have privacy protected in whois didn't show up in the list.
I suggest that you try doing a Google search for the your name (in quotes) exactly as it appears in the registration records. See if any of the Whois pages that contain the registration data show up in the SERPs.
Yes they do. Although I have just noticed another domain I have privacy protected since Day 1 is appearing in the Suggested Sites list.
|As far as network links go I'm not concerned at all - I link only where one site is relevant to users anyway but I am wondering what other impacts this belated 'discovery' might have? Any thoughts? |
I've read that Google allows a limited amount of interlinking between your own sites, but that excessive interlinking of a large number of sites with common ownership could trigger penalties. I don't know for sure if this is the case, or if it is, just how much interlinking is allowed before the sites would be penalized.
|When I clicked ADD SITE and started typing, a suggestion box dropped down, nothing odd there. What was odd - to me - was that the list of sites it was suggesting were exclusively mine. |
You are a lucky dog. :)
When I start typing, the drop down suggestion box lists a competitor. On top of it, when I stop typing and hit 'search' button Google is asking 'Do you mean [same competing website]?'...
It may be whois info included but it is not only that. It might be interlinking though. This competitor is linking to our website on some back door pages!
|It may be whois info included but it is not only that. It might be interlinking though. This competitor is linking to our website on some back door pages! |
It crossed my mind too, but I think if it was interlinking I would have seen at least one site in the list that wasn't mine (from about 15 sites) but there were no exceptions. My interlinking isn't prolific, nor is it likely to identify the complete network I wouldn't have thought.
What's the hosting situation on these sites?
Varied. They are spread over 4 or 5 ISPs, some on dedicated servers, some US based, some UK. A real mashup. We have CoreIX, GoDaddy, Hostgator, Pipex and one or two others in play. Some domains are .co.uk, some are .com and some are .net plus there's one .info and one (the latest) is an experiment with .co
Actually quick update: I just went back in to check on those extensions and noticed the suggestions list has changed - it's shorter. It still primarily consists of my sites, but gmail.com and a site I have never heard of but which is still relevant to the niche also now appear in the list. There are also two more URLS of subpages from one of my sites in there. Oh and I absolutely don't use, link to or reference gmail.com from anywhere by the way
I thought it might be shorter because it had removed sites I added earlier, but some of the suggestions are ones I previously added.