Forum Moderators: open
But I'd say that a well-mannered spider should read robots.txt every time. In any case I just banned Ask Jeeves from my site (brings very little traffic anyway).
You want an automated script, that you ban in robots.txt, but link to invisibly.... and if any bad guys run the script, BAM- they are banned?
Yep, I got that.
Here is the code for the script:
#!/usr/local/bin/perl
# Name this script trap.pl, upload it in ASCII mode to your cgi-bin and set the file permissions to CHMOD 755.
# This is the only variable that needs to be modified. Replace it with the absolute path to your root directory.
$rootdir = "/path/to/root/dir";
# Grab the IP of the bad bot
$visitor_ip = $ENV{'REMOTE_ADDR'};
#WAP's read it all, we do not want to ban them; send a polite page!
if ($visitor_ip =~ /^216\.239\.3([3]\.5)$216\.239\.35\.4$/) {
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Forward On</title>\n";
print "</head>\n";
print "<body>\n";
print "<p><b>Please <A HREF=\"http://www.yourdomain.com/\">Click Here</A> to continue!</b></p>\n";
print "</body>\n";
print "</html>\n";
exit;
}
else {
$visitor_ip =~ s/\./\\\./gi;
# Set Date
$date = scalar localtime ( time );
# Open .htaccess file
open(HTACCESS,"".$rootdir."/\.htaccess") Ζ die $!;
@htaccess = <HTACCESS>;
close(HTACCESS);
# Write banned IP to .htaccess file
open(HTACCESS,">".$rootdir."/\.htaccess") Ζ die $!;
print HTACCESS "SetEnvIf Remote_Addr \^".$visitor_ip."\$ ban\n# $date\n";
foreach $deny_ip (@htaccess) {
print HTACCESS $deny_ip;
}
close(HTACCESS);
# Close
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Access Denied!</title>\n";
print "</head>\n";
print "<body>\n";
print "<p><b>Access Denied!</b></p>\n";
print "<A HREF=\"http://www.imdb.com/harvest_me/\"> </A>\n";
print "</body>\n";
print "</html>\n";
exit;
################END OF SCRIPT
I found this script on this site... and I have modded this for my own use- I added the bit so it would NOT ban WAPs- if you find anyone else getting banned that should not be, add there IP to that section (and let me know!) I also added a time stamp- found that helpfull for cleaning it out every week. (I would recommend emptying it weekly or so... .htaccess slows down the server) I also spotted IMDB.com's spider trap... it's fun, so I Linked the bad spider to that. More fun!
OK, there HAS to be an .htaccess file in your root directory, and this cgi file HAS to go into the root directory, and you HAVE to be able to execute CGI in your root.
Put this in your .htaccess:
<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>
#################END htaccess
Save the script as xxxxx.cgi Add the name of the script to the Dissallow section of robots.txt. (You might want to then wait a week before uploading the script, some spiders do not always read the robots.txt file every access.)
Then just put an invisible link or two to the script, and away you go. I have a couple other tricks for suckering them bad spiders, sticky me for those!
dave
<?php
$fd = fopen(".htaccess","r");
$file = "";
$line = fgets($fd, 4096);
while ((substr ($line, 0, 10) != "allow from") and (!feof ($fd))) {
$file = $file . $line;
$line = fgets($fd, 4096);
}
$file = $file . $line;
$line = fgets($fd, 4096);
while ((substr ($line, 0, 9) == "deny from") and (!feof ($fd))) {
$file = $file . $line;
$line = fgets($fd, 4096);
}
$file = $file . "deny from " . $_SERVER ["REMOTE_ADDR"];
$file = $file . $line;
while (!feof ($fd)) {
$line = fgets($fd, 4096);
$file = $file . $line;
}
fclose ($fd);
$fd = fopen(".htaccess","w");
fwrite ($fd, $file);
fclose ($fd);
$message = "PHP-file: " . $_SERVER["PHP_SELF"] . "\n";
$message = $message . "IP-Adress: " . $_SERVER ["REMOTE_ADDR"] . "\n";
$message = $message . "HTTP_REFERER: " . $_SERVER ["HTTP_REFERER"] . "\n";
$message = $message . "User agent: " . $_SERVER ["HTTP_USER_AGENT"] . "\n";
$postdt = date ("j.n.Y - H:i");
$message = $message . $postdt . "\n";
mail("your$email.address", "Web site attack", $message);
?>