Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- And Now Google's Doing It. JS Stats Show GoogleBot


g1smd - 12:43 am on May 15, 2011 (gmt 0)


Googlebots doing their stuff without reference to robots.txt
Or using a previously cached copy, perhaps.

Several years ago a comment was made here that if you want a new URL to not be crawled, make sure it is disallowed at least 24 hours before the URL is live and/or linked to.

You'd think that if a link pointing to a new URL is discovered within a site, the robots.txt file would be fetched some time between discovery and attempted spidering.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4312058.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com