Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
I have also tried the main .htaccess with the full path to the 403 document, eg; ErrorDocument 403 http://www.mydomain.org/403.shtml Still no deal. I know I'm doing something wrong. I know this has to be re-written, but I'm stuffed if I know how. :) I have an idea from the code on this discussion, but I prefer not to 500 my entire site by stuffing up my .htaccess. ;) Ideas and suggestions welcome.
Read this tread, but still have a problem. I've:
ErrorDocument 401 /error/errorbot.php3?error=401
ErrorDocument 403 /error/errorbot.php3?error=403
ErrorDocument 404 /error/errorbot.php3?error=404
ErrorDocument 500 /error/errorbot.php3?error=500
How do i call tot document 403 using the errorobot.php3?error=403 in rewriterule:
{....}
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule ^.* - [F,L]
Tryed some of the suggestions, but stil get a:
"Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request."
Can I and how change the rewriterule incorperating the errordocument?
grz, roel
Think about this, you've decided to ban these bots or IPs from viewing ANY file on your site...now you want to serve them your custom error file...a file they are banned from seeing.
It can be done, and 'how' was discussed elsewhere in this thread (I believe...although I can't find it now).
BUT....the theme of this thread has been how to get bad bots off of your site as quickly and efficiently as possible...minimizing the load on your server and your bandwidth.
SO....why do you want to serve up a custom error page? I also have custom error pages (pretty ones....complete with my navigation links) which I serve up to mis-guided humans who may need and benefit from a little help. But, who thinks that bad bots are actually reading their 'helpful' error pages. :-) Why be 'friendly' to them, and waste YOUR resources? Why not dipatch them as quickly as possible? :-)
Hello ALL!
Very helpful thread! I did much of this before ever discovering this forum....so, naturally I did a few things a little different. I'll give some examples of what I've done, and perhaps you'll give me some feedback on doing things one way vs. another.
I've seen this condition frequently, in this forum:
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
I use this instead:
RewriteCond %{HTTP_USER_AGENT} ^(.*)WebBandit(.*) [NC,OR]
The (.*) says I don't care where in the UA string it appears, it's gone! The [NC] says nothing in the string is 'case-sensitive'.
I'm curently using this rule:
RewriteRule ^(.*) - [F]
But, I have played around with this one:
RewriteRule ^(.*) [127.1.1.1...] [R=permanent,L]
I know this last rule will take longer to return an error to a browser, but will either rule move the load off my server quicker than the other? I realize that if the banned UA came from a machine running a server, this rule 'might' create a problem on that machine....but, gee, that's their problem, right? :-)
Any thoughts, pro or con would be appreciated!
"Sugarplum employs a combination of Apache's mod_rewrite URL rewriting rules and perl code. It combines several anti-spambot tactics, includling fictitious (but RFC822-compliant) email address poisoning, injection with the addresses of known spammers (let them all spam each other), deterministic output, and "teergrube" spamtrap addressing.
Hi,
i'm using an other way to fool spammers spider:
<snip>
This is _not_ a guestbook - you have been warned ;-)
greetings from germany,
Marcel.
A minor aside, some time ago as I was first tackling mod_rewrite, and thought I'd discovered a minor bug, (can't remember what it was, except that it was a bug in my logic), total naivety, I sent the good man Ralf S. Engelschall an email, only to get a "Mail delivery failed: returning message to sender". The reason being <sigh>Mr Engelschall was getting to much spam</sigh>.
P.S.
I liked andreasfriedrich's "If you care about freedom be permissive, if you are paranoid be restrictive."
And a tiny bit of irony, including self irony, the definition of an expert: x is an uncertain quantity and spurt is a drip under pressure ;-).
Another post closer to being a "preferred member".
Mouse over it - this crowd will just love this url [diveintomark.org]! Thanks Mark.
1) What are the bot companies doing with all the data they take?
2) Isn't it illegal?
3) I've checked my webstats but all I see are lists of IP addresses and unknown names. How can I tell what is good and what is bad? None of the names published here seem to be in my list. (Some obvious search engines are though.)
4) To reiterate a previous post, what's to stop all robots announcing themselves as legitimate browsers? (See point 2!)
5) It's only a matter of time before bots can decipher Javacript URLS too. Is it even worth trying to fight them when extra bandwidth is fairly cheap? So long as it doesn't impact the genuine user?