To counter this I've created a directory which is expressly disallowed in the robots.txt file. What I want to do is install a script that will monitor when that directory is attempting to be accessed, and immediately ban the offending visitor. Unfortunately, I use a hosting service so I dont have access to the server, meaning I cant install perl modules or mess with ipchains, etc. Im wondering if there's a possible standalone script available that could accomplish this, or am I ---- outta luck? :)
One workaround I was thinking about is to password protect the directory, and then use a script to ban a visitor when that directory returns a 401 error. That might be an alternative solution in stopping deep crawlers. What do you think? If this is viable, what password protection script would you recommend? Im not looking for a major password management script, just a simple script to accomplish what I outlined above.
Thanks in advance for any advice!
Why even do that. Make up a directory name, disallow it in robots.txt, and ban any IP that requests that directory. The directory doesn't even need to exist.
Use $ENV{'REQUEST_URI'} instead of $ENV{'DOCUMENT_URI'} to grab the URL the browser requested.
Bluestreak, maybe somebody will come along and give you some info on obtaining a script that will automatically add banned IPs to your .htaccess file. Then you can figure out all sorts of creative ways to ban the bad guys. Or maybe if you know enough Perl you can put your own together.
Let me confirm one thing though, does that line simply ban access to that particular directory, or does it ban access to all of the site when a crawler attempts forbidden access?
[webmasterworld.com...]
>>If you have Paypal I could reimburse you for your time.
It's public domain software- public domain software is free. Appreciate the thought though. :)