Page is a not externally linkable
maximillianos - 8:10 pm on Mar 15, 2010 (gmt 0)
[edited by: phranque at 6:27 am (utc) on Mar 16, 2010]
Here is the meat of it:
Cronjob runs the following command (every 15 minutes for me):
tail -50000 /etc/httpd/logs/access_log | grep 'GET /filename1.cgi\|GET /filename2.cgi\|GET /filename3.cgi' | awk '{print $1}' | sort | uniq -c | sort -n | tail -50 | /root/jobs/ip_scan_nomail.pl | /root/jobs/ddos_scan.pl
The -50000 says to look at the last 50,000 lines of the access_log. You can tweek it to your tastes.
The filename1.cgi, filename2.cgi, etc are the web pages I wanted to filter my search on in the log files. I wasn't interested in hits to non-content pages, etc. so I specified my big ticket pages that scrapers are always after.
The tail -50 says we want to analyze the top 50 ip addresses found in the log files.
Since I already had my ip_scan script, I used it to generate the output for my ddos_scan script. You can easily combine both scripts if you wish. Both sources are included below (minus the sendmail functions which are typically server specific). The GEO IP file used I get from maxmind.com and have a cron job run monthly to pull the lastest file.
ip_scan_nomail.pl:
#!/usr/bin/perl
#-- take input of top IPs and do a lookup... report suspicious activity via e-mail alert
require '/var/www/html/constants.pl';
# DNS lookup:
use Socket;
# Display some geo info
use Geo::IP;
my $gi = Geo::IP->open("/usr/local/share/GeoIP/GeoLiteCity.dat", GEOIP_STANDARD);
while (defined($line = <STDIN>)) {
$hostname = "";
chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
@data = split(/ /, $line);
my $record = $gi->record_by_name($data[1]);
$iaddr = inet_aton($data[1]);
$hostname = gethostbyaddr($iaddr, AF_INET);
$output = "Lookup: $hostname\nBot IP: $data[1]\nReads: $data[0]\nCountry: " . $record->country_name . "\n\n";
print $output;
}
exit;
ddos_scan.pl:
#!/usr/bin/perl
#-- take input of the 'free' command and report memory problems via email...
require '/var/www/html/constants.pl';
$cnt = 0;
$bad_lookup_cnt = 0;
while (defined($line = <STDIN>)) {
chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
$line =~ s!\s+!g;
@data = split(/: /, $line);
# Track the number of non-US sources...
if($data[0] eq "Country" && $data[1] ne "United States" && $data[1] ne "Canada"){
$cnt = $cnt + 1;
$country_list = $country_list . $data[1] . "\n";
}
#Also check for scrapers... store lookup from this batch
if($data[0] eq "Lookup"){
$tmp_lookup = $data[1];
$tmp_lookup =~ tr/A-Z/a-z/;
# If by chance our ip lookup service is acting up... let's not ban ip's right now... so keep track of bad lookups
if(!$data[1] || $data[1] eq " "){
$bad_lookup_cnt = $bad_lookup_cnt + 1;
}
}
if($data[0] eq "Bot IP"){
$tmp_ip = $data[1];
}
# If not a big search bot, warn if reads are high...
if($data[0] eq "Reads" && $data[1] > 200 && $bad_lookup_cnt < 30){
if($tmp_lookup !~ /(google)|(msn)|(yahoo)|(amazon)|(ask)/){
# Block the scraper for now... and email admin
system("/usr/local/sbin/apf", "-d", "$tmp_ip");
if ( $? == -1 )
{
$result = "APF Command failed: $!\n";
}
else
{
$result = "APF block executed: $tmp_ip";
}
# Send text msg
&send_text_2;
$tmp_lookup = "";
$result = "";
$tmp_ip = "";
}
}
}
# If high number of non-US sources hitting the site, send an alert... potential problem/ddos
if($cnt > 20){
$msg = "High number ($cnt) of international bots hitting the server right now...";
&send_mail;
&send_text;
}
exit;
[edit reason] disabled graphic smileys ;) [/edit]