Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- amazonaws.com plays host to wide variety of bad bots


Pfui - 11:49 pm on Feb 15, 2009 (gmt 0)


I'm not sure I understand your point, sorry.

Technically, there's nothing in robots.txt that prevents any bot from doing whatever the heck its runners program it to do.

But a blanket "Disallow: /" means Do Not Crawl Here. Go Away. Now. And that Disallow includes the root page because it's in the /rootdir. Even if the root page retrieval is basically simultaneous with that of robots.txt (as is often the case), there still should be no caching or referencing of the root page's data.

Yeah. And if wishes were horses... :)


Thread source:: http://www.webmasterworld.com/search_engine_spiders/3828718.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com