- Search Engines
-- Search Engine Spider and User Agent Identification
---- And Now Google's Doing It. JS Stats Show GoogleBot
- 12:43 am on May 15, 2011
Googlebots doing their stuff without reference to robots.txt
Or using a previously cached copy, perhaps.
Several years ago a comment was made here that if you want a new URL to not be crawled, make sure it is disallowed at least 24 hours before the URL is live and/or linked to.
You'd think that if a link pointing to a new URL is discovered within a site, the robots.txt file would be fetched some time between discovery and attempted spidering.
Brought to you by