Page is a not externally linkable
- WebmasterWorld
-- Webmaster General
---- Neat little unix/linux command to list top IP's accessing your site


maximillianos - 8:10 pm on Mar 15, 2010 (gmt 0)


Here is the meat of it:

Cronjob runs the following command (every 15 minutes for me):

tail -50000 /etc/httpd/logs/access_log | grep 'GET /filename1.cgi\|GET /filename2.cgi\|GET /filename3.cgi' | awk '{print $1}' | sort | uniq -c | sort -n | tail -50 | /root/jobs/ip_scan_nomail.pl | /root/jobs/ddos_scan.pl


The -50000 says to look at the last 50,000 lines of the access_log. You can tweek it to your tastes.

The filename1.cgi, filename2.cgi, etc are the web pages I wanted to filter my search on in the log files. I wasn't interested in hits to non-content pages, etc. so I specified my big ticket pages that scrapers are always after.

The tail -50 says we want to analyze the top 50 ip addresses found in the log files.

Since I already had my ip_scan script, I used it to generate the output for my ddos_scan script. You can easily combine both scripts if you wish. Both sources are included below (minus the sendmail functions which are typically server specific). The GEO IP file used I get from maxmind.com and have a cron job run monthly to pull the lastest file.

ip_scan_nomail.pl:

#!/usr/bin/perl

#-- take input of top IPs and do a lookup... report suspicious activity via e-mail alert

require '/var/www/html/constants.pl';

# DNS lookup:
use Socket;

# Display some geo info
use Geo::IP;

my $gi = Geo::IP->open("/usr/local/share/GeoIP/GeoLiteCity.dat", GEOIP_STANDARD);

while (defined($line = <STDIN>)) {

$hostname = "";

chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
@data = split(/ /, $line);

my $record = $gi->record_by_name($data[1]);

$iaddr = inet_aton($data[1]);
$hostname = gethostbyaddr($iaddr, AF_INET);

$output = "Lookup: $hostname\nBot IP: $data[1]\nReads: $data[0]\nCountry: " . $record->country_name . "\n\n";
print $output;
}

exit;




ddos_scan.pl:

#!/usr/bin/perl

#-- take input of the 'free' command and report memory problems via email...

require '/var/www/html/constants.pl';

$cnt = 0;
$bad_lookup_cnt = 0;

while (defined($line = <STDIN>)) {

chomp($line);
$line =~ s/^\s+//; #remove leading spaces
$line =~ s/\s+$//; #remove trailing spaces
$line =~ s!\s+!g;
@data = split(/: /, $line);


# Track the number of non-US sources...
if($data[0] eq "Country" && $data[1] ne "United States" && $data[1] ne "Canada"){
$cnt = $cnt + 1;
$country_list = $country_list . $data[1] . "\n";
}


#Also check for scrapers... store lookup from this batch
if($data[0] eq "Lookup"){
$tmp_lookup = $data[1];
$tmp_lookup =~ tr/A-Z/a-z/;

# If by chance our ip lookup service is acting up... let's not ban ip's right now... so keep track of bad lookups
if(!$data[1] || $data[1] eq " "){
$bad_lookup_cnt = $bad_lookup_cnt + 1;
}
}

if($data[0] eq "Bot IP"){
$tmp_ip = $data[1];
}

# If not a big search bot, warn if reads are high...
if($data[0] eq "Reads" && $data[1] > 200 && $bad_lookup_cnt < 30){
if($tmp_lookup !~ /(google)|(msn)|(yahoo)|(amazon)|(ask)/){

# Block the scraper for now... and email admin
system("/usr/local/sbin/apf", "-d", "$tmp_ip");
if ( $? == -1 )
{
$result = "APF Command failed: $!\n";
}
else
{
$result = "APF block executed: $tmp_ip";
}

# Send text msg
&send_text_2;

$tmp_lookup = "";
$result = "";
$tmp_ip = "";

}
}

}

# If high number of non-US sources hitting the site, send an alert... potential problem/ddos
if($cnt > 20){
$msg = "High number ($cnt) of international bots hitting the server right now...";
&send_mail;
&send_text;
}

exit;

[edited by: phranque at 6:27 am (utc) on Mar 16, 2010]
[edit reason] disabled graphic smileys ;) [/edit]


Thread source:: http://www.webmasterworld.com/webmaster/4071661.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com