Forum Moderators: coopster & phranque

Message Too Old, No Replies

Modified "bad-bot" perl script from stapel/jdMorgan/Key_Master

Some of the features doesn't work

         

Maleville

5:35 pm on Jun 27, 2003 (gmt 0)

10+ Year Member



Back to the modified "bad-bot" of stapel at [webmasterworld.com...] I have some troubles with it.
What I modified?

- I made a simple html page named "terms.htm" with a only text error message for the user.
- My error 403 file is named "erreur403.php"
- The "no file" is named "nowaylink.htm"

My .htaccess file is as below:

# Begin Block bad-bots
SetEnvIf Request_URI "^(/erreur403.*\.php¦/robots\.txt¦/terms\.htm)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
# End block bad-bots
Options +FollowSymLinks -Indexes
ErrorDocument 401 /erreur401.php
ErrorDocument 403 /erreur403.php
ErrorDocument 404 /erreur404.php
ErrorDocument 405 /erreur405.php
# Begin ban bots by USER_AGENT
AuthPAM_Enabled off
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^-?$ [OR]
RewriteCond %{HTTP_USER_AGENT} .*almaden* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} .*eCatch* [NC,OR]
RewriteCond %{REMOTE_ADDR} ^194\.7\.191\.8$ [OR]
#
#...........and so on until:
#
RewriteCond %{REMOTE_ADDR} ^217\.167\.123\.15$
RewriteRule ^.*$ [other-domain...] [R,L]
# End ban bots by USER_AGENT
# Begin block direct links
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?my-domain\.net/ [NC]
RewriteRule \.(rm¦ram¦au¦mid¦wav¦js¦jar¦cgi¦pl)$ [other-domain...] [NC,R,L]
# End block direct links
# Begin Redirect false nowaylink.htm to trap.cgi
RewriteRule ^nowaylink\.htm$ /cgi-bin/trap\.cgi [L]
# End Redirect false nowaylink.htm to trap.cgi

I puted the "RewriteRule ^nowaylink..." at the end of the .htaccess because I wanted to ban guys who doesn't have a permanent IP and are using bad-bot as FronPage to download one time only my site. For us, they have a error 403 returned with the RewriteRule.
For those who have a software with an USER_AGENT unrecognized or false they are traped by "bad-bot perl".

In the "stapel" script I changed only:

$termsfile = "/terms\.htm";

and I kept:

$htadir = $ENV{DOCUMENT_ROOT};
$termspath = "$htadir"."$termsfile";

from jdMorgan

and the end of the script is:

open(TERMS,"< $termspath")
print ("Content-type: text/html\n\n");
seek(TERMS,0,0);
@contents = <TERMS>;
print (@contents);
close(TERMS);

exit;

But this doens't work.

With this, it works:

print ("Content-type: text/html\n\n");
print ("<html><head><title>Fatal Error</title></head>\n");
print ("<body text=\"#000000\" bgcolor=\"#FFFFFF\">\n");
print ("<p align=\"center\"><u><font size=\"5\" color=\"#FF0000\"><strong>NO WAY</strong></font></u></p>\n");
print ("<p><strong>blah blah blah</strong></p>\n");
print ("</body></html>\n");

exit;

What is wrong?

jatar_k

5:00 pm on Jun 28, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



can anyone offer a little insight to Maleville?

When you say "But this doens't work" does it give you errors or any other info or does it do nothing?

Key_Master

5:24 pm on Jun 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Never mind.

jdMorgan

5:46 pm on Jun 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maleville,

The version you are trying to modify has lots of "bells and whistles" and is therefore more complicated to get working.

I suggest you test the original version [webmasterworld.com] first.

HTH,
Jim

Maleville

1:42 am on Jun 29, 2003 (gmt 0)

10+ Year Member



I am sorry. I forgot to say than when I add:

open(TERMS,"< $termspath")
print ("Content-type: text/html\n\n");
seek(TERMS,0,0);
@contents = <TERMS>;
print (@contents);
close(TERMS);

I have an internal error 500

O.K. I am going to test the original version but I would prefered that the one I puted on my web site works.

jdMorgan

2:15 am on Jun 29, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Might be an invalid TERMS path, or it may have written something invalid to your .htaccess file.
Modify the code to print the TERMS path and check it.
Examine your .htaccess file, and see if the first line in it, written by this script, is valid.

Jim

Maleville

4:29 pm on Jun 29, 2003 (gmt 0)

10+ Year Member



Thank you Jim for your answer.

1)The TERMS path is $termsfile = "/terms\.htm";
same as .htaccess and erreur403.php: root directory.
trap.pl is in cgi-bin/trap.pl

What I don't know, is the meaning of "TERM" in open(TERMS,"< $termspath") because it's the first time I am looking in a PERL/CGI script. But in all manners there is always a first time.

trap.pl (cgi) doesn't write any thing in .htaccess because the script is stopped when it arrives at "open(TERMS,"< $termspath")" and so on.

As I said, with:
print ("Content-type: text/html\n\n");
and so on at the end, the script works.

The only thing wrong is I receive 200 mails when a bad bot is visiting my web site because my erreur403.php contains a mail which warn me. It seems the fault of line above:
SetEnvIf Request_URI "^(/erreur403.*\.php¦/robots\.txt¦/terms\.htm)$" allowsome

2) Instead of:

print ("Content-type: text/html\n\n");
print ("<html><head><title>Fatal Error</title></head>\n");
print ("<body text=\"#000000\" bgcolor=\"#FFFFFF\">\n");
print ("<p align=\"center\"><u><font size=\"5\" color=\"#FF0000\"><strong>NO WAY</strong></font></u></p>\n");
print ("<p><strong>blah blah blah</strong></p>\n");
print ("</body></html>\n");

is it possible to include the mail-script php I have in erreur.php?

3) I have an other question, don't be afraid, it is a little question. When you write a URL in .htaccess, is it obliged to put \. when at any place in the script there is an extention, for exemple:

RewriteRule ^.*$ [members\.lycos\.fr...] [R,L]

instead of:
RewriteRule ^.*$ [members.lycos.fr...] [R,L]

other exemple:
ErrorDocument 403 /erreur403\.php
instead of
ErrorDocument 403 /erreur403.php

Maleville

12:10 pm on Jul 5, 2003 (gmt 0)

10+ Year Member



Please.

Maleville

5:44 am on Jul 21, 2003 (gmt 0)

10+ Year Member



Jim, are you on vacation?

Please, it should be kind if you could post a very short answer.

jdMorgan

6:28 am on Jul 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Maleville,

A quick answer...

Actually, I *am* on vacation - almost. I haven't left yet, though. Basically, I have missed seeing this thread on the active list, and so I didn't know it was still active...

When you write a URL in .htaccess, is it obliged to put \. when at any place in the script there is an extention, for exemple:
RewriteRule ^.*$ [members\.lycos\.fr...] [R,L]

instead of:
RewriteRule ^.*$ [members.lycos.fr...] [R,L]


No, you must not do that. You only put the "\" in front of "special" characters which have meaning to the regular-expressions parser. The "\" tells the regex parser to NOT interpret the special character as a "command", but rather to treat it as a literal. Example: In regular expressions, "." means "match any single character." So ^a.c$ matches "abc" or "aac" or "acc", etc. So, if you are looking to match a period, you must "escape" the period with a slash, as in "a\.c". This would then match the string "a.c".

Now, the reason you don't need to escape periods in the "target" URL is that the target URL is not parsed by the regular-expressions parser - It only looks at the pattern on the left side of the RewriteRule. The target URL is called the substitution in the mod_rewrite documentation, and is not examined by the regex parser.

Therefore, the changed version won't work. Just use:
RewriteRule .* [members.lycos.fr...] [R,L]

The same goes for this:


other exemple:
ErrorDocument 403 /erreur403\.php
instead of
ErrorDocument 403 /erreur403.php

The ErrorDocument directive does not make use of regular-expressions, and therefore, you should not "escape" the periods.

Ref: Regular Expressions tutorial [etext.lib.virginia.edu]

HTH,
Jim

Maleville

6:36 am on Jul 22, 2003 (gmt 0)

10+ Year Member



Thank you very much for your answer, clear as usual.