Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- MSN's many cloaked bots. Again.


AlexK - 10:58 am on Jul 5, 2011 (gmt 0)


Every unchanged Status 200 page is given a 304.

The blocked pages in these reports are *not* given a 200. They are given a 403 (fast scrape) or a 503 (slow scrape), plus a tiny text notice. Blocked bots are refused for a week (bit bigger text notice). If they attempt another fast-scrape again during this time, the timer is reset to zero.

Periodicity varies according to the bot. G has been daily for 9 years (and well-behaved until yesterday). Others vary.

I've investigated throttling rapid access, but have not activated (apart from the bot-blocker, which is a rapid-access blocker, of course). Fast bots run out of pages to request pretty quick, anyway.

I would suggest that the main difference between my & (most?) other sites is that mine tests for, and then blocks-records-reports, abusive activity. If you do not test for it, how can you know whether it is happening or not?


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4182830.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com