Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- And Now Google's Doing It. JS Stats Show GoogleBot


Samizdata - 11:42 pm on May 14, 2011 (gmt 0)


Googlebots doing their stuff without reference to robots.txt

Or using a previously cached copy, perhaps.

To borrow a line from Johnny Depp, the rules of the pirate code (Robots Exclusion Protocol) are only guidelines. There are no sanctions for misbehaviour.

It's all something of a charade, we just do what we can to get the right files in the index (and keep the wrong ones out). To quote keyplyr again:

I'm talking real world. G, Y, M$ all crawl disallowed files.

There are worse bots to worry about - ones that offer no potential benefits.

...


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4312058.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com