Forum Moderators: open

Message Too Old, No Replies

services or appliances to block rogue bots - enterprise level?

bot blocker

         

kb73

4:30 am on Oct 20, 2011 (gmt 0)

10+ Year Member



Searching through the forums here I see that there are a lot of generic scripts or custom solutions being used to block 'rogue bots' from entering sites. We have our own custom solution that does this and, for the most part, it works ok. However, it's becomming very onerous to manage and is far from perfect.

We have trialed a 'web application firewall' with our hosting service and it has limitations. There is virtually no reporting (ie, we have to take their word that they are blocking bad bots and letting all the good guys through) and we have no proof that it actually works.

Also, we have looked at other services like [siteblackbox.com...] which seems to be ok on the surface of things.

If anyone has any experience or knows of any alternative solutions, either SAS or appliance it would be very interesting hear your thoughts.

Thanks, Karl

NB: for what it's worth we regularly serve a few million page views/day to humans and can serve upwards of 10 million page views/day to bots (both search engines and other).

incrediBILL

4:20 pm on Oct 23, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If anyone has any experience or knows of any alternative solutions, either SAS or appliance it would be very interesting hear your thoughts.


I'm not sure why anyone would want a SaaS bot blocker, especially with the volume of traffic you claim to have. There's certain things that need to be done server side to keep up with the sheer speed of requests IMO. Would have to know more about the implementation details to make any real recommendations.

Depending on how the SaaS bot blocker is implemented, you'll either have a huge database of details saved you your server for speed, or be waiting on a remote server to respond before servicing the page request which can be a performance killer, or you'll initially be giving away a few free pages waiting on a delayed SaaS request to respond which can easily be exploited to scrape a site.

I'd opt for complete server side protection either as an appliance (don't know any) or a plug-in script.