Forum Moderators: open
This was just a rumor/theory though.
I did some updates to a site I work on yesterday and voted for the pages, still not in the index and no fresh tags...So I believe this rumor to be false.
WBF
I had the same experience. An unfinished site with no inbound links, which I visited with the toolbar on, had a few pages that were indexed after the last deep crawl.
referral statistics pages. perhaps your site has outgoing links, you click one to test it. Many sites keep their stats page in a place where they can be crawled. The log is crawled by google, which then follows the "backlink" to your site.
If you don't want your pages indexed, put up a robots.txt file in the web root directory of your site with
User-agent: *
Disallow: /
When you're ready to go live, remove the robots.txt file, or better yet, change it to
User-agent: *
Disallow:
However, even with a Disallow: / directive in robots.txt, if Googlebot finds a link to a page, it may list the page by URL in the search results. No title, no description, just the URL. The page won't come up for any keyword searches, but it will come up in the "More results from <yourdomain>" listing.
If you want to stop that, you can do one of two things:
1) Put a <meta name="robots" content="noindex,nofollow"> tag on each page you don't want indexed or followed (You can also put a variant of that tag up for the various index/follow combinations).
2) Where that would be unwieldy - say for a large site under development - a relatively simple solution is to make a special page for robots, and put the <meta name="robots" content="noindex,nofollow"> tag on it. Then transparently redirect all robot requests for all pages on your site to that special robots page. When you're ready to go live, remove the redirect, and remove the special robots page. Note for the suspicious: Yes, this is cloaking, but there is no intent to deceive visitors - since there shouldn't be any visitore yet!
The meta robots tag approach is required to tell Google and Ask Jeeves/Teoma "don't mention this page at all." In my experience, most other 'bots will treat a robots.txt Disallow as a "don't mention it at all" directive, but Google and AJ/Teoma intepret the Standard for Robots Exclusion literally; As directed, they don't fetch the page, but if they find a link to it, the URL of the page will be listed in their results. There are good and bad points to either approach. But like it or not, that's how it works.
The specification: A standard for Robot Exclusion [robotstxt.org].
Validate your robots.txt file here [searchengineworld.com].
HTH,
Jim