Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- MSNBot has become a constant Fast-Scraper


AlexK - 1:26 am on Dec 30, 2011 (gmt 0)


g1smd:
if individual IPs are exceeding specified limits

That has come to be the topic of this thread.

The point is that there are *no* specified limits. `Crawl-Delay' is now null & void; I think that, in hindsight, it was a good idea. But the bots ignored it, so it has fallen into the bin-bucket.

My site is built for humans, not bots. I do not mind bots crawling it if they behave themselves, and it helps if there is a payback for the site. At the moment, that is true only for Google. In terms of payback, all the rest are a waste of time on my site. In spite of that, there are no restrictions if they behave themselves.

So, the site is built for humans. No human can view 3 pages / second. No human can even obtain 3p/s without software assistance, in which case they become a bot. Hence, the trip parameter for abuse is set at 3p/s, and that seems reasonable to me as a base parameter for behaviour when browsing my site. But is it accurate?

tangor has stated an attitude to this, and I understand it totally at one level. Without the SEs my income is toast, and I'm sure that that is true for many. My site needs worldwide exposure, and I cannot summon the marketing finances at that level. Even WebmasterWorld has been forced to co-exist with the bots, and I'm certain that most sites are in the same position. The question then comes as to the nature of that co-existence. Are we to say: "sure honey; anything you like", or are there acceptable limits to behaviour? And if so, what are they?


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4401159.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com