Forum Moderators: open
This was not a pre-owned domain either.
It had no links coming in or going out - NONE! Truthfully, I totally forgot about the domain or those few words I put on it. It was only when I was using a new weblog program that I discovered google found it.
I must conclude, that googlebot must be crawling web SERVERS and jumping from folder to folder and not just link to link.
Also, I must wonder whether Google strategicaly crawls the folders of domain name hosting companies as well. They get the dns info and check the webserver directories for that domain.
From the cradle to the grave, Google watches each domain. Anyhow, if it were possible, I think this would be a plausible scenario to keep it "fresh" as GG likes to call it.
Thank you all for the kind reception, I am impressed by the traffic and calibre of responses.
[edited by: Ariel at 3:54 am (utc) on Sep. 29, 2002]
I haven't seen Google do this, but I have seen Fast/Alltheweb follow up with a vist shortly after delegating DNS, same with visits from Cyveillance, so certainly there are some possibilities outside of what's been mentioned so far for picking up new sites that apparantly have no exposure to the world.
If you install the toolbar without the PR indicator, no information is sent back to Google (apparently).
No secrets? What about ignorance? I am trying to figure this out and currently admit ignorance on how googlebot could find a page that has no links, was not submitted via the submit url form, nor was it referred to in log files, for I never visited there until AFTER it sat dormant and after I used a new log file analyzer and THEN discovered months after googlebot paid a visit.
Let me give another example, for that was not the first. I have another domain, a .com, which has NOTHING there, NO FILES, yet google found it. Somehow, through an email (cgi-bin) if that makes any sense?
I would like to ask once again, does anyone here have a working knowledge that googlebots jump from DIRECTORY to DIRECTORY and file to file including CGI-BIN, etc? This would include the directories on domain name registrars and web hosting companys.
Thank you for your attention.
Jumping between directories on a server without following links would be considered rude, and would be of little value to google.
But, call me daft. I don't mind if the Toolbar finds sites through my surfing - saves me submitting it!
Google cannot navigate the internal file system of your Web server.
We know that they do find URLs by:
* Links (including another site's 'referrer' report)
* The Google submission form
We know that they could find URLs by:
* Google Toolbar with 'advanced features'
* Domain registries' WHOIS data
If your hosting provider has a customer list, or if you have content at example.com/directory/whatever.html and example.com/directory/ is a standard directory contents listing, then Google may find your page via the links.
Next topic: what more does googlebot need to do other than just read log files? That is, if it wants to be "fresh?"
So my questions are:
1. Do we know for a fact that googlebot routinely reads log files? (this can help webmasters determine who is inflating thier log files - that's really not a suggestion.)
2. Does googlebot, or any of the others for that reason, monitor how much traffic a page gets? And if so, how if not by the log files?
Thanks in advance.
My best guess is that it was found through log files also.