I've seen lots of newly registered domains appearing in Google too - my guess is that either Google is deliberately visiting new domains, or else there is a site somewhere linking to new registrations and Google picks them up from that.
I'm guessing that Google is simply following links from pages on domain registration sites that sometimes contain lists of newly-registered domains.
>> How did Google find them?
Google toolbar would be the most obvious as I'm too sure what their arrangements might be with overseas registrars.
I would agree with that, the Google spybar(cough) toolbar would seem to be the most obvious.
I've seen sites that categorically have not been visited with a toolbar get in the results, so although this might explain some sites, there must also be another reason.
Sorry i dont know the reality but i think any domain registration company will not provide any list of the new domain registration in the last month or last year or so... If yes then this is very easy to find the page for anybody if this is a static page. If not then google most probably will not find it through its spider.
THERE IS SOMETHING ELSE ;-)
>> categorically have not been visited with a toolbar get in the results
Very possible for .com / .net data as Google is definately using registration data for expired domains and probably new domains too.
I'll drop a line to my local tld registrar and find out if the Men from the Plex have been knocking on their doors for registration data.
experienced, the data cannot come from a registration company (there are way too many to negotiate with). Its being sourced higher up in the food chain or from a specialist third party.
>>I've seen sites that categorically have not been visited with a toolbar get in the results, so although this might explain some sites, there must also be another reason.
>>there is a site somewhere linking to new registrations and Google picks them up from that.
That's exactly what's happening. There's a bot that Googlebot seems to be hot on the trail of within no time. At least that's how it appears to be.
[edited by: Marcia at 11:12 am (utc) on Aug. 12, 2004]
On my last chat with the G-man, he said as much, of course not explicitly. DOn't be looking for links, links and toolbars aren't the only weapons in googlebots arsenal.
Basically they have a philosophy of agressivly finding domains/sites/pages on the net by any means.
|Its being sourced higher up in the food chain or from a specialist third party. |
One with access to the databases.
I registered a new domain two weeks ago - no hosting, just the domain, then did nothing. The URL is now in Google's index and leads to a page starting with the usual "Hey, it worked! The SSL/TLS-aware Apache webserver was successfully installed on this website... etc etc"
I'll now be had for duplicate (multiplicate?) content, I imagine.
>>duplicate (multiplicate?) content
Couldn't be as bad as "Under Construction" with the little guy with the pick-axe. ;)
Patrick, did you visit with the toolbar?
Marcia, no, I didn't, except an hour or so ago (Firefox with PR toolbar) after I read this thread.
I have had the exact thing happen with some of my domains.
I don't have the google tool bar..
I didn't try to search for the domain on g-ogle.
I told nobody of the domains and neither were there any pre exisiting links etc etc to the domains.
The domains didn't have a specific host, that is they only had a domain forwarding service setup as I built them to my "liking".... thus I could see traffic and 404 robot errors.
Within a week after registering the names, G-ogle came by triggering 404s . It sort of pissed me off because the domains were still being built and the information was not "pure" for a spider.
The only thing I could guess at was the registrar was leaking the domains to .. 'somebody' .
OR when attempting to register a domain, the search for availability, would trigger something related to g-ogle... but that doesn't make much sense either.
The company I used to register the domains is a subsidiary of V- sign.
It's a great big conspiracy and Matt Sludge is on the case as we speak ;-))
edited and added this tid bit for cimls post below:
I didn't set up hosting before registering.
Domain forwarding was done at the registrar company. You know the type or registrar... "purchase your domain and you get forwarding and 1 email account" etc etc..
[edited by: kahuna at 1:30 pm (utc) on Aug. 12, 2004]
Maybe the domain registrar has started optimising the site for you. It may be a very special service that they provide.
How nice of them...
Google have had oodles of data on domains for a long time. Register a com/net/org after you've set up the hosting for it, and you'll see Google in the logs (along with server and domain research organisations).
Does anyone remember this?
|<<< GoogleGuy roots around on a messy desk looking for the piece of paper that lists the exhaustive history for all domains in the world. >>> |
>> Google have had oodles of data on domains for a long time
Would have always thought that was the case until Matt Cutts said "domain registry data was too expensive" or something to that effect.
Actually Google *is* known to use the toolbar to discover new sites, as do other companies with their own toolbar - read the privacy disclosures.
If you install the toolbar for Google and other search engines you will find that they eventually show up at new domains that you have not submitted anywhere yet.
Note that you also leave a trail of where you have been on the internet if you do not block referers in whatever browser you use. So if you are on your new website testing it, and you go to another website that publishes its "most recent referers" list - viola, you will be linked on the web for Google to find you.
Last but not least, they don't have to buy the domain name lists from registries, any respectable company can go into contract with the domain registrars directly to download the core database nightly. All they have to do is watch for changes and then check the new sites, it's fairly simple. Several whois databases do this already such as "whois source" who will visit your new domain a few days after it's registered!
So what is the benefit to Google to index (and spider?) URLs that have no web site content now, and may never have web site content?
I've got about a dozen domains that I registered 5-6 years ago and may never develop. Why would Google want those "domain parked here" pages in their index?
Guess I'll go check if G knows about those domains....
Google gets the updated zone files plus updates more than once a day. Most of the search engines have agreements with the big registrars to get the zone files. This is no fuss for them - easy peasy and often free.
This is the way some se's
>So what is the benefit to Google to index
> (and spider?) URLs that have no web site content now,
To get the fresh stuff and to track updates to the domain.
That is especially true of Google who knows when an expired domain has changed hands.
Yes - this really is no mystery for new sites. Google gets the zone files. They aren't that hard to get access to. Anyone that runs an ISP can do so (or at least used to be able to).
There is no reason to use the toolbar info when you can just download the data everyday.
Anybody can get hold of a list of the new registrations of .com..net..org.Google doesn't get all the zone files.;)
Yeah the cctld registrars do not give such free access to their zone files. However doing daily updates on com/net/org/info/biz is trivial. It would take approximately an hour on a desktop PC to gnerate a list of new and deleted cnoib domains. From there, it is a simple question of feeding the list to a small pre-indexing program. It is all just a set of very simple SQL statements but the database size is about 30G. Tracking the transits (domains moving between nameservers) takes a few hours but it all should be easy enough even for the turnip fields of PhDs in Google. :)
|Anybody can get hold of a list of the new registrations of .com..net..org.Google doesn't get all the zone files. |