How did Google find these domains?

Forum Moderators: open

Message Too Old, No Replies

How did Google find these domains?

Matt_James

9:57 am on Aug 12, 2004 (gmt 0)

About 2 months ago, I registered the domains for a new company: .net .com .co.uk .org etc
The domains are not yet in use, but hosting is set up. There will be no content for at least another 2 months. They all produce a 403 error, but are all listed on google when searching for the company name. How did Google find them?

Receptional Andy

10:09 am on Aug 12, 2004 (gmt 0)

I've seen lots of newly registered domains appearing in Google too - my guess is that either Google is deliberately visiting new domains, or else there is a site somewhere linking to new registrations and Google picks them up from that.

ToneLeMoan

10:19 am on Aug 12, 2004 (gmt 0)

I'm guessing that Google is simply following links from pages on domain registration sites that sometimes contain lists of newly-registered domains.

shri

10:22 am on Aug 12, 2004 (gmt 0)

>> How did Google find them?

Google toolbar would be the most obvious as I'm too sure what their arrangements might be with overseas registrars.

glitterball

10:31 am on Aug 12, 2004 (gmt 0)

I would agree with that, the Google spybar(cough) toolbar would seem to be the most obvious.

Receptional Andy

10:34 am on Aug 12, 2004 (gmt 0)

I've seen sites that categorically have not been visited with a toolbar get in the results, so although this might explain some sites, there must also be another reason.

experienced

10:35 am on Aug 12, 2004 (gmt 0)

Hi,

Sorry i dont know the reality but i think any domain registration company will not provide any list of the new domain registration in the last month or last year or so... If yes then this is very easy to find the page for anybody if this is a static page. If not then google most probably will not find it through its spider.

THERE IS SOMETHING ELSE ;-)

EXP...

shri

11:00 am on Aug 12, 2004 (gmt 0)

>> categorically have not been visited with a toolbar get in the results

Very possible for .com / .net data as Google is definately using registration data for expired domains and probably new domains too.

I'll drop a line to my local tld registrar and find out if the Men from the Plex have been knocking on their doors for registration data.

shri

11:03 am on Aug 12, 2004 (gmt 0)

experienced, the data cannot come from a registration company (there are way too many to negotiate with). Its being sourced higher up in the food chain or from a specialist third party.

Marcia

11:10 am on Aug 12, 2004 (gmt 0)

>>I've seen sites that categorically have not been visited with a toolbar get in the results, so although this might explain some sites, there must also be another reason.

There is.

>>there is a site somewhere linking to new registrations and Google picks them up from that.

That's exactly what's happening. There's a bot that Googlebot seems to be hot on the trail of within no time. At least that's how it appears to be.

[edited by: Marcia at 11:12 am (utc) on Aug. 12, 2004]

killroy

11:10 am on Aug 12, 2004 (gmt 0)

On my last chat with the G-man, he said as much, of course not explicitly. DOn't be looking for links, links and toolbars aren't the only weapons in googlebots arsenal.

Basically they have a philosophy of agressivly finding domains/sites/pages on the net by any means.

Marcia

11:14 am on Aug 12, 2004 (gmt 0)

shri

Its being sourced higher up in the food chain or from a specialist third party.

One with access to the databases.

Patrick Taylor

11:39 am on Aug 12, 2004 (gmt 0)

I registered a new domain two weeks ago - no hosting, just the domain, then did nothing. The URL is now in Google's index and leads to a page starting with the usual "Hey, it worked! The SSL/TLS-aware Apache webserver was successfully installed on this website... etc etc"

I'll now be had for duplicate (multiplicate?) content, I imagine.

Marcia

11:58 am on Aug 12, 2004 (gmt 0)

>>duplicate (multiplicate?) content

Couldn't be as bad as "Under Construction" with the little guy with the pick-axe. ;)

Patrick, did you visit with the toolbar?

Patrick Taylor

12:17 pm on Aug 12, 2004 (gmt 0)

Marcia, no, I didn't, except an hour or so ago (Firefox with PR toolbar) after I read this thread.

Patrick

kahuna

12:31 pm on Aug 12, 2004 (gmt 0)

I have had the exact thing happen with some of my domains.

I don't have the google tool bar..
I didn't try to search for the domain on g-ogle.
I told nobody of the domains and neither were there any pre exisiting links etc etc to the domains.

The domains didn't have a specific host, that is they only had a domain forwarding service setup as I built them to my "liking".... thus I could see traffic and 404 robot errors.

Within a week after registering the names, G-ogle came by triggering 404s . It sort of pissed me off because the domains were still being built and the information was not "pure" for a spider.

The only thing I could guess at was the registrar was leaking the domains to .. 'somebody' .
OR when attempting to register a domain, the search for availability, would trigger something related to g-ogle... but that doesn't make much sense either.

The company I used to register the domains is a subsidiary of V- sign.

It's a great big conspiracy and Matt Sludge is on the case as we speak ;-))

edited and added this tid bit for cimls post below:

I didn't set up hosting before registering.
Domain forwarding was done at the registrar company. You know the type or registrar... "purchase your domain and you get forwarding and 1 email account" etc etc..

[edited by: kahuna at 1:30 pm (utc) on Aug. 12, 2004]

bts111

12:42 pm on Aug 12, 2004 (gmt 0)

Maybe the domain registrar has started optimising the site for you. It may be a very special service that they provide.

How nice of them...

he he

ciml

1:12 pm on Aug 12, 2004 (gmt 0)

Google have had oodles of data on domains for a long time. Register a com/net/org after you've set up the hosting for it, and you'll see Google in the logs (along with server and domain research organisations).

Does anyone remember this?

[webmasterworld.com...]

<<< GoogleGuy roots around on a messy desk looking for the piece of paper that lists the exhaustive history for all domains in the world. >>>

shri

2:22 pm on Aug 12, 2004 (gmt 0)

>> Google have had oodles of data on domains for a long time

Would have always thought that was the case until Matt Cutts said "domain registry data was too expensive" or something to that effect.

amznVibe

2:43 pm on Aug 12, 2004 (gmt 0)

Actually Google *is* known to use the toolbar to discover new sites, as do other companies with their own toolbar - read the privacy disclosures.

If you install the toolbar for Google and other search engines you will find that they eventually show up at new domains that you have not submitted anywhere yet.

Note that you also leave a trail of where you have been on the internet if you do not block referers in whatever browser you use. So if you are on your new website testing it, and you go to another website that publishes its "most recent referers" list - viola, you will be linked on the web for Google to find you.

Last but not least, they don't have to buy the domain name lists from registries, any respectable company can go into contract with the domain registrars directly to download the core database nightly. All they have to do is watch for changes and then check the new sites, it's fairly simple. Several whois databases do this already such as "whois source" who will visit your new domain a few days after it's registered!

pleeker

5:24 pm on Aug 12, 2004 (gmt 0)

So what is the benefit to Google to index (and spider?) URLs that have no web site content now, and may never have web site content?

I've got about a dozen domains that I registered 5-6 years ago and may never develop. Why would Google want those "domain parked here" pages in their index?

Guess I'll go check if G knows about those domains....

Brett_Tabke

5:26 pm on Aug 12, 2004 (gmt 0)

Google gets the updated zone files plus updates more than once a day. Most of the search engines have agreements with the big registrars to get the zone files. This is no fuss for them - easy peasy and often free.

This is the way some se's

>So what is the benefit to Google to index
> (and spider?) URLs that have no web site content now,

To get the fresh stuff and to track updates to the domain.

That is especially true of Google who knows when an expired domain has changed hands.

Chris_R

6:21 pm on Aug 12, 2004 (gmt 0)

Yes - this really is no mystery for new sites. Google gets the zone files. They aren't that hard to get access to. Anyone that runs an ISP can do so (or at least used to be able to).

There is no reason to use the toolbar info when you can just download the data everyday.

cabbie

6:34 pm on Aug 12, 2004 (gmt 0)

Anybody can get hold of a list of the new registrations of .com..net..org.Google doesn't get all the zone files.;)

jmccormac

4:02 am on Aug 20, 2004 (gmt 0)

Anybody can get hold of a list of the new registrations of .com..net..org.Google doesn't get all the zone files.

Yeah the cctld registrars do not give such free access to their zone files. However doing daily updates on com/net/org/info/biz is trivial. It would take approximately an hour on a desktop PC to gnerate a list of new and deleted cnoib domains. From there, it is a simple question of feeding the list to a small pre-indexing program. It is all just a set of very simple SQL statements but the database size is about 30G. Tracking the transits (domains moving between nameservers) takes a few hours but it all should be easy enough even for the turnip fields of PhDs in Google. :)

Regards...jmcc