Forum Moderators: open

Message Too Old, No Replies

207.42.75.170

         

DavidT

5:00 am on May 20, 2003 (gmt 0)

10+ Year Member



I have a number of links pointing to my site from giftnet.org, periodically I get referrals from there just taking index.html from each page where my link is placed, assume they are link checking.

Same thing today four straight hits to index from 4 different pages where my link is but 20 mins later the same IP came back and ripped through all main pages taking up to 10 pages a second. Don't know what they are playing at if it really is from them. And I wish I could figure out that spider trap thingy......

jdMorgan

5:23 am on May 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DavidT,

If you have questions about that spider trap thingy, ask them. The author and many users inhabit these forums, and attacks like you describe last no more than 2 or 3 GETs on our sites.

Jim

DavidT

12:59 pm on May 21, 2003 (gmt 0)

10+ Year Member



Much of the process of setting up the spider trap I can follow ( gathered from here: [webmasterworld.com...] and also here: [webmasterworld.com...] but I hesitate with the changes to .htaccess. I understand that something like this goes in:

# Block bad-bots using lines written by bad_bot.pl script above
SetEnvIf Request_URI "^(/403.*\.html¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>

My present htaccess file begins like this:
RewriteEngine On
Options -Indexes
Options +FollowSymlinks

Then follows a large number of redirects and RewriteCond's. The final lines of the document are:
RewriteCond %{REMOTE_ADDR} ^65\.102\.17\.(3[2-9]¦[4-6][0-9]¦7[0-1]¦8[89]¦9[0-5]¦10[4-9]¦11[01])$
RewriteRule ^.* - [F,L]

ErrorDocument 404 [mysite.com...]

I am wondering where exactly do I put the spider trap lines and what if any effect this will have on the existing entries and whether it will all 'go together'.

jdMorgan

2:14 pm on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DavidT,

> I am wondering where exactly do I put the spider trap lines and what if any effect this will have on the existing entries and whether it will all 'go together'.

Just put that new code you quoted at the beginning of your .htaccess file. It can actually go anywhere, as long as it's not inserted between a RewriteCond and a following RewriteRule; It's just neater at the top. I note a problem with the order of your Options and RewriteEngine directives, so here's the whole thing in order, with the Options combined as well.


# Block bad-bots using lines written by bad_bot.pl script above
SetEnvIf Request_URI "^(/403.*\.html¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
Options -Indexes +FollowSymlinks
RewriteEngine on

The script will insert additonal lines of code preceding the code above, and that is both OK and necessary. You will end up with something like this after the first 'bot is trapped and the script modifies your .htaccess:

# Fri Mar 28 04:59:03 2003 Opera/5.02 (Windows 98; U) [en]
SetEnvIf Remote_Addr ^203\.152\.30\.38$ getout
#
# Block bad-bots using lines written by bad_bot.pl script above
SetEnvIf Request_URI "^(/403.*\.html¦/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
Options -Indexes +FollowSymlinks
RewriteEngine On

Subsequent 'bot catches will cause new SetEnvIf lines to be added before the one shown above. Each can set the environment variable "getout" if the IP address matches. The subsequent deny code then tests the getout variable, and denies access if it is set.

If you wish, you can temporarily change the deny line to


deny from env=nevermind

for testing. Then, the whole package will function up to the very last step, and you can observe how it works by using your own browser to request trap objects on your site, checking to see what gets written to .htaccess each time. But since the variable tested by deny is temporaily renamed, the code won't actually ban you. When you are done testing, manually remove the SetEnvIf lines you caused to be created by testing, and then change the environment variable tested by deny back to "getout."

Keep a backup copy of your original .htaccess; If you ban yourself while testing or something else goes wrong, just re-upload the backup .htaccess using FTP, and that will get you running again.

However, after correct installation and set-up, this thing works and works very well.

HTH,
Jim

wilderness

4:34 pm on May 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim,
Your sticky mailbox FULL
Don

DavidT

7:55 pm on May 21, 2003 (gmt 0)

10+ Year Member



Not quite there yet. I'll preface all this with the apology that I only bought my first computer 18 months ago and was forced to ask the men who brought it how to TURN IT ON so I appreciate you bearing with me.

Using the 'nevermind' directive to test things clicking the link to the disallowed file brings up my custom 404 error page and no entry is written in .htaccess.

What I have done is create an unobtrusive link to /about.cgi?id=13.

In htaccess is this line: RedirectPermanent /about.cgi?id=13 [mysite.com...]

I am using trap.cgi as the filename because when using trap.pl the icon in the upload program doesn't seem right as if it doesn't recognise the extension. Despite this I tried a few times with trap.pl as the name but same result.

I'm wondering particularly about this line in trap.pl
# Form full pathname to .htaccess file
$htapath = "$htadir"."$htafile";

Is there anything I need to change here given that key_masters version specifies a full path like this:
# This is the only variable that needs to be modified. Replace it with the absolute path to your root directory.
$rootdir = "/home/www/your_root_directory";

DavidT

3:05 am on May 22, 2003 (gmt 0)

10+ Year Member



Actually much of the above can probably be disregarded, had another go, but now clicking the link to the disallowed about.cgi file calls a 500 error page.

Could be any number of reasons I suppose.

(error log files say 'Premature end of script headers' for trap.cgi)

jdMorgan

4:44 am on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> In htaccess is this line: RedirectPermanent /about.cgi?id=13 [mysite.com...]

That will generate an external 301 redirect, which is not what you want. Use mod_rewrite to do a "silent" internal redirect instead:


RewriteCond %{QUERY_STRING} ^id=13$
RewriteRule ^about\.cgi /cgi-bin/trap.cgi [L]

The documentation in the .pl file is correct, you should not have to change anything except where noted. Do heed the warnings about editing the broken "¦" characters, though! You must change them to solid vertical pipes or PERL will not accept them. That can cause the problem you are seeing, among many other things. However, the code I posted with the file-locking mods was validated and tested before I posted it, and I am running the exact same version on several sites. Stapel's version adds e-mail notification, but I did not test that version. key-master's original version was also thoroughly tested.

HTH,
Jim