Forum Moderators: phranque
My feeling is that to deal with this problem in semi-realtime, I set up a Perl script to analyze my logfile to identify hungry bots, and set the script to run automatically every 5-10 minutes with a crontab. When the script identifies hungry bots from the logfile, it bans them by writing a DENY command for that IP to my .htaccess file. After some period of time (a day, week, or month) I have another script lift the ban, so as not to ban legitimate users on the same ISP as the bot owner.
Am I on the right track for how to identify & ban bots & scrapers quickly? If so, do good tools for this already exist, or am I stuck with the daunting task of writing my own? I'm wary of parsing logfiles.