Forum Moderators: coopster & phranque

Message Too Old, No Replies

Blocking bad bots that masquerade as ordinary browsers

         

Bluestreak

1:38 am on Jun 6, 2002 (gmt 0)

10+ Year Member



I have a bottrap and a banlist using mod_rewrite currently set up on my site. This has been pretty effective stopping malicious bots by user_agent name, but hasnt done diddly for annoying crawlers that come in using an ordinary broswer name in its user agent field.

To counter this I've created a directory which is expressly disallowed in the robots.txt file. What I want to do is install a script that will monitor when that directory is attempting to be accessed, and immediately ban the offending visitor. Unfortunately, I use a hosting service so I dont have access to the server, meaning I cant install perl modules or mess with ipchains, etc. Im wondering if there's a possible standalone script available that could accomplish this, or am I ---- outta luck? :)

One workaround I was thinking about is to password protect the directory, and then use a script to ban a visitor when that directory returns a 401 error. That might be an alternative solution in stopping deep crawlers. What do you think? If this is viable, what password protection script would you recommend? Im not looking for a major password management script, just a simple script to accomplish what I outlined above.

Thanks in advance for any advice!

Key_Master

1:44 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>One workaround I was thinking about is to password protect the directory, and then use a script to ban a visitor when that directory returns a 401 error.

Why even do that. Make up a directory name, disallow it in robots.txt, and ban any IP that requests that directory. The directory doesn't even need to exist.

Use $ENV{'REQUEST_URI'} instead of $ENV{'DOCUMENT_URI'} to grab the URL the browser requested.

Bluestreak

1:56 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Wow how much more simpler can you get. The code you gave, do I put that in my htaccess file?

Keep in mind I know perl and php like I know women, which is zilch :D

mdharrold

2:02 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Yes,
RewriteEngine on
Options +FollowSymlinks
RewriteBase /
RewriteRule %{REQUEST_URI} ^/blocking_directory/$ /ban_page.html [L]

Right, Keymaster?

Still learning.

Key_Master

2:56 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep mdharrold, that would ban the request but unfortunately (without a server side script of some sort) the IP can't be permanently banned from the site.

Bluestreak, maybe somebody will come along and give you some info on obtaining a script that will automatically add banned IPs to your .htaccess file. Then you can figure out all sorts of creative ways to ban the bad guys. Or maybe if you know enough Perl you can put your own together.

Bluestreak

2:58 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Are you kidding me? All I needed was one line in htaccess? Man, spent the last week and a half searching and searching, reading about modules and ipchains and complex scripts, this that and the other thing, and all I needed was one line in htaccess??? Yeesh! :D

Let me confirm one thing though, does that line simply ban access to that particular directory, or does it ban access to all of the site when a crawler attempts forbidden access?

Bluestreak

3:21 am on Jun 6, 2002 (gmt 0)

10+ Year Member



I dont mind if I cant automate permanent banning, I just need a temporary solution for now, and I can always do a permanent ban manually, assuming I understood correctly. Can you confirm if this commandline pervents further access to the site, or does it simple ban that particular "forbidden" directory?

Key_Master

3:28 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're referring to mdharrold's suggestion, it would only ban the request to that directory. The rest of your site would continue to remain vulnerable.

Key_Master

3:36 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



By the way, what type of bot trap are you using? It may be possible to configure it to ban a request.

Bluestreak

3:46 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Just using the plain old bottrap Wpoison for spoofing fake emails.

So Im assuming I can't ban access to the whole site based on a crawler's attempt to access the forbidden directory? Looks like I'm back to square one.

Key_Master

3:54 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cheer up! I'll write a Perl script and post it here. It won't be fancy or elaborate but it'll get the job done. Give me an hour or two to get it ready.

Bluestreak

4:30 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Wow, I appreciate your effort. If you have Paypal I could reimburse you for your time. :D

mdharrold

4:52 am on Jun 6, 2002 (gmt 0)

10+ Year Member



I am staying up late to see how this turns out.
I have submitted my guess to KeyMaster, but I am going to wait for the final answer.

Giddy as a geek, mdharrold.

Key_Master

5:28 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The script can be found here.

[webmasterworld.com...]

>>If you have Paypal I could reimburse you for your time.

It's public domain software- public domain software is free. Appreciate the thought though. :)

mdharrold

5:33 am on Jun 6, 2002 (gmt 0)

10+ Year Member



I was right and can now go to bed.
Thanks Keymaster, I am so tired.

Brett_Tabke

7:03 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Or, if you have the tech, put a 1char link, or 1px link on the page that leads to a verbotin directory and ban all unauthorised visitors that hit that directory. It's the primo usage for cloaking.