Msg#: 3540946 posted 8:17 am on Jan 5, 2008 (gmt 0)
How is it possible that without indexing a site it is crawled by Google? My site is not submitted to any of the search engines. This site is not having any incoming links, still it i crawled by Google and hence cached... Why does this happen? What should I do now, if I don't want Google to crawl this site again?
The consensus is that you need to block your site on the server if it's under development or if you don't want it indexed. Lack of inbound links is not enough, as publicly available server logs are likely to get spidered by Googlebot. Use password protection or the no-index robots meta tag on pages you want to block.
Msg#: 3540946 posted 2:43 pm on Jan 5, 2008 (gmt 0)
"This site is not having any incoming links" is not always possible to say with certainty. Maybe you didn't put any links but others might without your knowledge and it is not always possible to find these links through a search engine index. The biggest mistake you committed is not used the robots.txt file to block Google. No damage done, though. You can still tell G to go away and eventually it will drop all pages.
Msg#: 3540946 posted 5:39 pm on Jan 5, 2008 (gmt 0)
Do you have the Google Tool Bar installed showing you page rank? If so that's one way to surely get Googlebot's attention. You need to use your robots.txt to block bots for sure, or throw a redirect to an error page until you are ready to open for business. I wouldn't recommend the error page though unless you want to throw Google for a loop, it may never come back!
To block all (good) bots just make a robots.txt file and enter: