How Did Google Find Me?

Forum Moderators: open

Message Too Old, No Replies

How Did Google Find Me?

Ariel

7:36 pm on Sep 28, 2002 (gmt 0)

I have a domain with one page and almost no content. It is not linked in any fashion to any other site, and it has not been submitted, yet googlebot found it. I thought Google mainly follows links? How did it find this domain/page?

heini

7:39 pm on Sep 28, 2002 (gmt 0)

Toolbar installed?

Ariel

7:42 pm on Sep 28, 2002 (gmt 0)

Yes, but how is my browser connected to a server/domain somewhere?

JonB

7:47 pm on Sep 28, 2002 (gmt 0)

maybd when you typed url in toobar google send this info to google and they sent googlebot.

GoogleGuy

8:05 pm on Sep 28, 2002 (gmt 0)

Could be a lot of ways, Ariel. We're always working on improving our breadth and freshness. You might also find a few suggestions here:
[google.com...]

heini

8:31 pm on Sep 28, 2002 (gmt 0)

Ariel, do you have any links from that domain/page going out?
GG, did I get that right: the explanation you're pointing to assumes Ariel's page links to some other page, where his page turns up in the logfile, which accidentally is open to crawling and got crawled by one of Google's bot, which caused the bot to follow the link back to Ariel's page?

Air

8:41 pm on Sep 28, 2002 (gmt 0)

Welcome to WebmasterWorld Ariel!

As you can see by the responses there are a number of ways your site could have been found by Googlebot, also a possibility is that your domain name may have had some pre-existing links (if you recently registered it).

heini

8:49 pm on Sep 28, 2002 (gmt 0)

Oops, where are my manners: Welcome to the board, Ariel!

Sasquatch

1:57 am on Sep 29, 2002 (gmt 0)

I noticed that googlebot crawled my site within a week after the hosting company transfered it. If it is your own domain I would bet that they just work their way through all the domains and check the home pages.

Ariel

3:42 am on Sep 29, 2002 (gmt 0)

Thank you everyone for your responses. Just a few notes to add. This domain/page only had the words "coming soon..." in a table. There were no meta tags, no title tag, nothing.

This was not a pre-owned domain either.

It had no links coming in or going out - NONE! Truthfully, I totally forgot about the domain or those few words I put on it. It was only when I was using a new weblog program that I discovered google found it.

I must conclude, that googlebot must be crawling web SERVERS and jumping from folder to folder and not just link to link.

Also, I must wonder whether Google strategicaly crawls the folders of domain name hosting companies as well. They get the dns info and check the webserver directories for that domain.

From the cradle to the grave, Google watches each domain. Anyhow, if it were possible, I think this would be a plausible scenario to keep it "fresh" as GG likes to call it.

Thank you all for the kind reception, I am impressed by the traffic and calibre of responses.

[edited by: Ariel at 3:54 am (utc) on Sep. 29, 2002]

Air

3:50 am on Sep 29, 2002 (gmt 0)

>They get the dns info and check the webserver directories for that domain.

I haven't seen Google do this, but I have seen Fast/Alltheweb follow up with a vist shortly after delegating DNS, same with visits from Cyveillance, so certainly there are some possibilities outside of what's been mentioned so far for picking up new sites that apparantly have no exposure to the world.

GoogleGuy

9:22 am on Sep 29, 2002 (gmt 0)

All of these approaches are possible, plus there's always the form to add a url to Google. Just curious, was this a .com/.net/.org, or a "foreign" domain?

gsx

9:32 am on Sep 29, 2002 (gmt 0)

Google toolbar (with the PageRank indicator) sends information to Google about the page you are viewing in IE. That's every page you visit. You agreed to it when you choose the toolbar with the PR, they track your every move. When you think about they have to - it must send the information back to Google so it can send the PR back to your toolbar. What they do with that information, we don't fully know - perhaps they use it for spidering, maybe for calculating PR for very high traffic sites, measure the length of a persons visit (and it could be done with the toolbar).

If you install the toolbar without the PR indicator, no information is sent back to Google (apparently).

Visit Thailand

9:38 am on Sep 29, 2002 (gmt 0)

The google toolbar could be one but the obvious reason is in GoogleGuys link in that as soon as you go somewhere else on the web from that page their could be a refferer log, which is then crawled and voila ! There are no secrets on the web !

Ariel

4:41 am on Oct 2, 2002 (gmt 0)

This was a .net domain.

No secrets? What about ignorance? I am trying to figure this out and currently admit ignorance on how googlebot could find a page that has no links, was not submitted via the submit url form, nor was it referred to in log files, for I never visited there until AFTER it sat dormant and after I used a new log file analyzer and THEN discovered months after googlebot paid a visit.

Let me give another example, for that was not the first. I have another domain, a .com, which has NOTHING there, NO FILES, yet google found it. Somehow, through an email (cgi-bin) if that makes any sense?

I would like to ask once again, does anyone here have a working knowledge that googlebots jump from DIRECTORY to DIRECTORY and file to file including CGI-BIN, etc? This would include the directories on domain name registrars and web hosting companys.

Thank you for your attention.

Sasquatch

7:22 am on Oct 2, 2002 (gmt 0)

Your domain name is a public record. Working through all the domains once a month to see if they have appeared would be quite trivial.

Jumping between directories on a server without following links would be considered rude, and would be of little value to google.

nutsandbolts

9:55 am on Oct 2, 2002 (gmt 0)

Well, I have the toolbar installed. Google has visited my new site just the once but it has no links going to it as it's still in development and it wasn't an old domain name. In fact, it's now got a password on the site.

But, call me daft. I don't mind if the Toolbar finds sites through my surfing - saves me submitting it!

ciml

11:04 am on Oct 2, 2002 (gmt 0)

Ariel:
> ...working knowledge that googlebots jump from DIRECTORY to DIRECTORY and file to file including CGI-BIN, etc...

Google cannot navigate the internal file system of your Web server.

We know that they do find URLs by:

* Links (including another site's 'referrer' report)
* The Google submission form

We know that they could find URLs by:

* Google Toolbar with 'advanced features'
* Domain registries' WHOIS data

If your hosting provider has a customer list, or if you have content at example.com/directory/whatever.html and example.com/directory/ is a standard directory contents listing, then Google may find your page via the links.

Brett_Tabke

11:13 am on Oct 2, 2002 (gmt 0)

Browsers leak referrals. If you are viewing your web page and then type in a new link, click a bookmark, or click a personal links bar link to Google, there is a slight chance that your browser leaked a referring string. In turn, it is known that Google crawls it's own referring logs from time to time.

mikeputnam

12:36 pm on Oct 2, 2002 (gmt 0)

I wonder if Google ever harvests websites via broadcast port 80 pings?

nmap -sP *.*.*.* � nmap -p 80 *.*.*.* > add to crawl list...

Ariel

5:22 pm on Oct 3, 2002 (gmt 0)

O.K., good responses. I believe we have enough evidence that google is searching the public whois databases and following those up for a time. (this is important for webmasters to know so they can put up at least a title tag on that domain the day it is purchased.)

Next topic: what more does googlebot need to do other than just read log files? That is, if it wants to be "fresh?"

So my questions are:

1. Do we know for a fact that googlebot routinely reads log files? (this can help webmasters determine who is inflating thier log files - that's really not a suggestion.)

2. Does googlebot, or any of the others for that reason, monitor how much traffic a page gets? And if so, how if not by the log files?

Thanks in advance.

Trisha

6:36 pm on Oct 4, 2002 (gmt 0)

Google just found a site of mine that has no links to it. Oh well, I knew it might happen eventually, but was hoping it wouldn't be found until it was really ready to be seen. I check the toolbar everyday, hoping that it will still be grey - but today it was all white, with PR0!

My best guess is that it was found through log files also.