Welcome to WebmasterWorld Guest from 54.161.106.81

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

modified "bad-bot" script blocks site downloads

Explains how we've blocked FrontPage from downloading content.

     
9:07 pm on Dec 28, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


Back in June, "Key_Master" posted a script for banning malicious bots:

Ban malicious visitors with this Perl script [webmasterworld.com]

I followed up a few weeks ago, asking some additional questions and receiving some additional coding:

bad-bot script: follow-up? [webmasterworld.com]

This is the "trap.cgi" script that we ended up using:


#!/usr/bin/perl

$basedir = $ENV{DOCUMENT_ROOT};
$htafile = "/\.htaccess";
$termsfile = "/file_to_send_instead_of_what_they_want\.htm";

# Form full pathname to .htaccess file
$htapath = "$basedir"."$htafile";

# Form full pathname to terms.htm file
$termspath = "$basedir"."$termsfile";

# Get the bad-bot's IP address, convert to regular-expressions
#(regex) format by escaping all periods.
$remaddr = $ENV{REMOTE_ADDR};
$remaddr =~ s/\./\\\./gi;

# Get User-agent & current time
$usragnt = $ENV{HTTP_USER_AGENT};
$date = scalar localtime(time);

# Open the .htaccess file and wait for an exclusive lock. This
# prevents multiple instances of this script from running past
# the flock statement, and prevents them from trying to read and
# write the file at the same time, which would corrupt it.
# When .htaccess is closed, the lock is released.
#
# Open existing .htaccess file in r/w append mode, lock it, rewind
# to start, read current contents into array.
open(HTACCESS,"+>>$htapath") die $!;
flock(HTACCESS,2);
seek(HTACCESS,0,0);
@contents = <HTACCESS>;

# Empty existing .htaccess file, then write new IP ban line and
# previous contents to it
truncate(HTACCESS,0);
print HTACCESS ("SetEnvIf Remote_Addr \^$remaddr\$ getout \# $date $usragnt\n");
print HTACCESS (@contents);

# close the .htaccess file, releasing lock - allow other instances
# of this script to proceed.
close(HTACCESS);

# Write html output to server response
if (open(TERMS,"< $termspath"))
{
# Copy the terms.htm file as output here.
print ("Content-type: text/html\n\n");
seek(TERMS,0,0);
@contents = <TERMS>;
print (@contents);

# close the terms.htm file.
close(TERMS);
}
else
{
# if we can't open terms.htm, output a canned error message
print "Content-type: text/html\n\n";
print "<html><head><title>Fatal Error</title></head>\n";
print "<body text=\"#000000\" bgcolor=\"#FFFFFF\">\n";
print ("SetEnvIf Remote_Addr \^$remaddr\$ getout \# $date $usragnt\n");
print (@contents);
print "<p>Fatal error</p></body></html>\n";
}

# trying to send an e-mail message
open(MAIL, "/usr/sbin/sendmail -t") die
"Content-type: text/text\n\nCan't open /usr/sbin/sendmail!";
print MAIL "To: myname\@mydomain\.com\n";
print MAIL "From: myname\@mydomain\.com\n";
print MAIL "Subject: You caught another one!\n";
print MAIL "The ip address \^$remaddr\$ has been banned on $date \n";
print MAIL "The associated user agent was $usragnt\n";
close(MAIL);

exit;

Note that, in addition to banning the web-site-snagger, it also sends me an e-mail, letting me know of the "hit". On my server, CGI-perl scripts must be named as "*.cgi", not as "*.pl"; your server's protocol may differ.

Note also that your "sendmail" directory may be different.

FrontPage (my particular nemesis) does not appear to follow CGI calls, so placing a false CGI link in the page was not effective. Instead, we uploaded a 11 transparent GIF and hyperlinked it to a nonexistant "decoy_false_page_name.htm" page. To help protect the innocent, the link is configured as:


<a href="decoy_false_page_name.htm" onmouseover="window.status='Burglar Alarm'; return true;" onclick="return false;">
<img src="../images_folder/oddly_named_graphic.gif" alt border="0" WIDTH="1" HEIGHT="1"></a></td>

The only reason anybody would be following the link is because he is using site-grabbing software. When he does, the grabber tries to follow this first link on the page, and is redirected via the .htaccess file. The relevant coding in the .htaccess file is:


# Block bad-bots using lines written by bad_bot.pl script above
SetEnvIf Request_URI "^(/403.*\.htm/robots\.txt/file_instead_of_what_they_want\.htm)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>

Redirect /decoy_false_page_name.htm [mydomain.com...]
Redirect /lower_directory/decoy_false_page_name.htm [mydomain.com...]

Also, the following was added to the robots.txt file:


User-Agent: *
Disallow: /decoy_false_page_name.htm

The structure is:


public_html
.htaccess
robots.txt
403_error_message.htm
file_to_send_instead_of_what_they_want.htm
cgi-bin_directory/trap.cgi
images_folder/oddly_named_graphic.gif
lower_directory/protected files containing decoy_false_page_name.htm call

As a test, I tried to download pages from my site using FrontPage, and could get only the text of the particular page I aimed at. As soon as FrontPage tries to follow links, it encounters the trap and is banned. I receive an e-mail shortly thereafter. So everything seems to be working properly.

Thanks to all who helped, and special thanks to "Key_Master" for providing the original script.

[edited by: jatar_k at 5:00 am (utc) on Dec. 29, 2002]

2:41 am on Dec 29, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


stapel,

Nice write-up!

Are the two unindented lines in the code snippet below intended "for testing only"?


.# if we can't open terms.htm, output a canned error message
. print "Content-type: text/html\n\n";
. print "<html><head><title>Fatal Error</title></head>\n";
. print "<body text=\"#000000\" bgcolor=\"#FFFFFF\">\n";
.print ("SetEnvIf Remote_Addr \^$remaddr\$ getout \# $date $usragnt\n");
.print (@contents);
. print "<p>Fatal error</p></body></html>\n";

Jim

2:53 am on Dec 29, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


You're good! <smile> My husband had put those in for testing purposes, and didn't notice until after we posted that they were still in there. We've deleted the lines since.

By the way, the script went "live" late this morning, and I just caught the first bum (from Tacoma, Washington) trying to download my site. Ha!

4:59 am on Dec 29, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Stapel,

I just caught the first bum ...trying to download my site. Ha!

Fun, huh? :)

Having an automated method to stymie troublemakers has saved me a lot of time - After installing secure forms-based e-mail and the bad-bot trap script on my sites, I realized I had a lot more free time to do more constructive things, rather than "standing guard" over the sites all the time.

A note for new users: Install the robots.txt exclusion described above several days (even a week) before "going live" with the script. Many legitimate robots don't always read a new copy of robots.txt every time they access your site; Give them some time to find out that they shouldn't swallow the bait.

Jim

4:41 pm on Dec 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Another note for users:

Make sure you edit the script and change all broken vertical pipe "" characters to solid vertical pipe characters (the one on your keyboard). The WebmasterWorld posting software seems to modify them.

Jim

5:21 pm on Dec 31, 2002 (gmt 0)

New User

10+ Year Member

joined:Dec 10, 2002
posts:24
votes: 0


Well it works, if I goto the "html" file it bans me BUT if you download the site with black widow you do NOT get banned, even with it set to follow links.
5:31 pm on Dec 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Check the log files and see if black widow swallowed the bait - If not, give it something it likes. :)

Jim

5:35 pm on Dec 31, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


I don't have access to the actual log files anymore (with the switch to a new server, the directory structures and permissions have been changed). What would BlackWidow "like"?

(Please pardon my ignorance.) Thank you for your help!

2:56 pm on Feb 10, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 10, 2003
posts:4
votes: 0


Hi. Thank you for pointing me to this thread. I have
two remarks and a question:

1. The 1x1 graphics you use is 807 bytes with a 256 palette.
This is unnecessary. A b/w one takes just 43 bytes.

2. With that approach, .htaccess must be set to the world
write permission. Is it not dengerous? Have you addressed
that issue?

3. It seem that in my case a redirect to a cgi is being
ignored. I use a lot of redirection, mostly with
RedirectMatch. All work fine. But this one to a cgi script
never seems to have any effect. What might be the matter?

Thank you,
Alexander

3:50 pm on Feb 10, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


1) True. I have updated the image with one that is 42 bytes.

2) Actually, I have .htaccess set to 644, which means only "owner" has write permissions. And the .htaccess file contains the following lines, which prevent viewing in any case:


<Files .htaccess>
order deny,allow
deny from all
</Files>

3) I have no idea why Redirect isn't working for you, especially if other Redirects are working. I've had many problems with commands not be respected on my host's servers, but I've never had a problem with Redirects. Try emulating the language in my original post:

Redirect /decoy_false_page_name.htm http://www.mydomain.com/cgi-bin/trap.cgi
8:12 pm on Feb 10, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 10, 2003
posts:4
votes: 0


2) Yap, 644 works fine.

3) The redirection does not work for whatever reason.
It's not just to cgi. I posted a question to my hosts.
For now, I link directly to the cgi script. Cought myself
a couple of times, too.

Great!

9:04 pm on Feb 11, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 10, 2003
posts:4
votes: 0


3) Do not exactly know why my redirection


Redirect /decoy_false_page_name.htm http://www.mydomain.com/cgi-bin/trap.cgi

did not work, but the following


RedirectMatch decoy_false_page_name http://www.mydomain.com/cgi-bin/trap.cgi

does. I am past this hurdle now.

The whole trick provides protection from "bad bots." Now, what about the good ones? Is there anything that could be done to prevent any kind of bulk downloading?

Thank you,
Alexander Bogomolny

9:06 pm on Feb 11, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 10, 2003
posts:4
votes: 0


(By good bots I mean those that consult robots.txt.)



ammendment added by jatar_k for jdMorgan

Some users have reported that the script posted above causes errors on Apache server, due to the fact that the script appends a comment containing the timestamp and user-agent to the line of code it writes to .htaccess

Some servers are configured such that having a comment on a line containing an .htaccess directive causes server errors.

In order to prevent this problem, change the third line of the script snippet as shown below. This prints the date and user agent in a stand-alone comment line first, and then prints the SetEnvIf directive on the next line. This will eliminate the problem. Alternatively, you can simply omit the comment portion entirely.

# Empty existing .htaccess file, then write new IP ban line and
# previous contents to it
truncate(HTACCESS,0);
print HTACCESS ("\# $date $usragnt\nSetEnvIf Remote_Addr \^$remaddr\$ getout\n");
print HTACCESS (@contents);

Jim

[edited by: jatar_k at 4:12 pm (utc) on Aug. 20, 2003]

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members