Forum Moderators: phranque

Message Too Old, No Replies

Banning bots/scrapers through real-time logfile analysis?

Any tools available, or do I write my own?

         

MichaelBluejay

11:09 pm on Nov 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have a 1000-page static site that I'd like to prevent serving to useless bots. I can't ban by client because many bots naturally don't announce themselves as bots. I'd also rather not check my web stats the next day to find what IP's hit my site hard, because by then the damage has already been done.

My feeling is that to deal with this problem in semi-realtime, I set up a Perl script to analyze my logfile to identify hungry bots, and set the script to run automatically every 5-10 minutes with a crontab. When the script identifies hungry bots from the logfile, it bans them by writing a DENY command for that IP to my .htaccess file. After some period of time (a day, week, or month) I have another script lift the ban, so as not to ban legitimate users on the same ISP as the bot owner.

Am I on the right track for how to identify & ban bots & scrapers quickly? If so, do good tools for this already exist, or am I stuck with the daunting task of writing my own? I'm wary of parsing logfiles.

bill

1:41 am on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's quite a bit of reading on this topic:
Those are just a few...

MichaelBluejay

1:52 am on Nov 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks, but those threads seem to have nothing to do with what I was actually asking about.