Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- MSNBot has become a constant Fast-Scraper


AlexK - 1:48 am on Dec 29, 2011 (gmt 0)


tangor, I do not think that you have thought this through. I've been stopping abuse from bots for several years now, and reporting it for 18 months. Here are some recent facts from the last year for you to consider:

30/40/50+ bots trying to scrape my site each day used to be typical. That has finally dropped dramatically - Dec 28 was just 7 bots, and *that* is now typical (human hits remain unchanged).

The fastest bot caught was Technicolor [forums.modem-help.co.uk], at 403 pages / second. Imagine if that started happening on your site 50 times a day; consider also that fibre & Gigabit networks are rolling out daily - it is not quite so unlikely a prospect as we may think.

The worst in terms of volume was a BigPond bot [forums.modem-help.co.uk]. It tried to take 137,836 pages before it finally stopped. How long before that sucks all your (supposedly) unlimited bandwidth if 50 of those hit you each day?

The average number of pages stopped by my site's routines across 18 months is 4,500 pages each day (now dropping).

According to AWStats, last November humans took 74.16 GB from my site, whilst bots took 818.49 GB. Please allow me to remind you that the Modem-Help site has defence in depth against abusive scrapers: the top-25 worst ASNs are blocked at the firewall, and both fast- & slow-scrapers get blocked before they can barely start. Yet, bots took 11 times as much bandwidth as humans in October & November.

My site barely creeps into being 'medium-size' in terms of visits. With all due respect to yourself, I would suggest that there are blinkers on your vision that you would do well to drop. This is far more of an issue than you seem to consider it to be.

It is bad enough having to cope with script-kiddies, bored corporate desk-jockeys & spam-criminals trying to download my million-page site each day. To then have to factor in supposedly reputable Search-Engines as equivalent abusers just takes the biscuit.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4401159.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com