Ok this may not be new, or even very interesting to many, so apologies if it has been discussed before but it took me by surprise.
I own a network of sites, some of which I run admin for other people but all share my whois info. A few of these are registered in WMT and a few are not associated with Google in any way. I don't use GA at all incidentally.
Today I was setting up a Google Custom Search for selected sites. I added a few, set it up and tested it, then went back into Google to add more sites. When I clicked ADD SITE and started typing, a suggestion box dropped down, nothing odd there. What was odd - to me - was that the list of sites it was suggesting were exclusively mine. Not only that but it was a complete list of all the *developed* sites I own.
After the surprise wore off, I thought OK, so somewhere along the line I must have inadvertantly notified Google of each one, however it contains sites I only launched 2 weeks ago and in that list are two particular oddities:
Firstly, one of the sites is not a developed website per se. It is a domain I use for serving banners from a database to save bandwidth. Google must have followed an IMG link because it has attempted to index the non-existant front page (SERP extract shows a title of "Index of /" and description "Apache mod_fcgid/2.3.5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/22.214.171.12435" Server") but I have never used this domain in any other way shape or form other than to store images.
Secondly, the list of sites in the suggest dropdown are all simply root domains, listed like http:/
/www.bluewidgets.com/ (etc). But one of them is a page on a site (http://www.example.com/something.php).
Well we know Google is a registrar I guess so maybe it isn't that surprising. I'd just never seen any obvious flags before now to suggest they mine and utilise that data. I can't think of any other way they could associate all these sites.
As far as network links go I'm not concerned at all - I link only where one site is relevant to users anyway but I am wondering what other impacts this belated 'discovery' might have? Any thoughts?
[edited by: tedster at 1:20 pm (utc) on Sep 13, 2010]
[edit reason] Make the example domains display clearly [/edit]