Forum Moderators: coopster
I needed a way to block these inconsiderate bots, many of which identified themselves as standard browsers, so an .htaccess black list wasn't helping. Besides, this would need to be kept up to date every time a bad bot was spotted.
I came up with a small bit of PHP code to put at the start of a script that detects rapid multiple accesses from a particular ip address, and then blocks that ip until the bombardment stops...
$itime = 10; // Minimum number of seconds between visits
$ipenalty = 60; // Seconds before visitor is allowed back
$imaxvisit = 42; // Maximum visits
$iplogdir = "/sites/my.site.com/iplog/";
$ipfile = substr(md5($_SERVER["REMOTE_ADDR"]), -2);
$oldtime = 0;
if (file_exists($iplogdir.$ipfile)) $oldtime = filemtime($iplogdir.$ipfile);
$time = time();
if ($oldtime < $time) $oldtime = $time;
$newtime = $oldtime + $itime;
if ($newtime >= $time + $itime*$imaxvisit)
{
touch($iplogdir.$ipfile, $time + $itime*($imaxvisit-1) + $ipenalty);
header("HTTP/1.0 503 Service Temporarily Unavailable");
header("Connection: close");
header("Content-Type: text/html");
echo "<html><body><p><b>Server under heavy load</b><br>";
echo "Please wait $ipenalty seconds and try again</p></body></html>";
exit();
}
touch($iplogdir.$ipfile, $newtime);
Notes...
$iplogdir needs to be a directory that's writable to by the web server. $itime is the minimum number of seconds between visits on average over $itime*$imaxvisit seconds. So in the above example, a visitor isn't blocked if they visit the script multiple times in the first 10 seconds, as long as they don't visit more than 42 times within 420 seconds. $ipenalty is the number of seconds a visitor has to wait before they are allowed back. How it works...
For each visitor, an MD5 hash is made of their ip address and the last 2 hex digits of this are taken to generate one of a possible 256 filenames. If this is a new visitor, or a visitor who hasn't been seen for a while, the timestamp of the file is set to the current time, otherwise they must have been a recent visitor and the time stamp is increased by
$itime. If they start loading the script more rapidly than $itime seconds per visit, you can see that the time stamp on their ips hashed filename will be increasing faster than the actual time is increasing. If the time stamp gets too far ahead of the current time, then they're branded as a bad visitor and the penalty is applied by increasing the time stamp on their file even further. $itime, $ipenalty, $imaxvisit can be tweaked to fit your own traffic patterns.
Hope someone else finds my script useful. :) If you have any questions, ask away...
________________________
updated version
Blocking badly behaved bots [webmasterworld.com]
[edited by: jatar_k at 10:21 pm (utc) on Mar. 14, 2005]
[edit reason] added link [/edit]