Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
Bute there's a new issue: anyobody ever heard of a user-agent calling itself "GraphicBrain.com"?
This special agent seems to download the whole site (which - in theory - I don't mind) but it produces such long logfile entries that my logfile analyzer crashes :-(
Example of ONE(!) logfile line:
212.113.xx.yy - - [01/Aug/2002:06:13:40 +0200] "GET / HTTP/1.0" 200 7931 "-" "GraphicBrain.com" "visitid=3D48ADCB000031EB604E6FEE; KeyWordCookie=GIFTS%2CFLOWERS%2CTRAVEL; ASPSESSIONIDGGGGQRGV=GAOLJNCCJGCBKNPENHBGALON; ASPSESSIONIDGGGQGGDP=HKPPOOGDHBLFPPCILOFEAOHK; ASPSESSIONIDGQGQGNFK=ENFBNHFAEGFHDBNAAAINIKPO; ASPSESSIONIDGQQQGVUY=OJDHAPPBPLCDABCNFPGBJNAL; ASPSESSIONIDGGGQGVUY=HBDOOEECBEBPBADJPMFACJLD; ARPT=IQKKVWSINT3CKMYJ; ASPSESSIONIDQGGGQHOQ=NLAMCPEADENDOOMECNBCAPDO; CFGLOBALS=HITCOUNT%3D1%23LASTVISIT%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23 TIMECREATED%3D%7Bts+%272002%2D07%2D31+23%3A53%3A13%27%7D%23; CFID=426530; CFTOKEN=39863655; ASPSESSIONIDQGQGQMGG=MIHOHNGDKMOANCPDMCJNKKKE; ASPSESSIONIDQGQGGLCG=HEJLGBDCEGMHCKMEEIECDOBB; ASPSESSIONIDQGQGGWUC=EKDLICBAJCHGOCJIADFCHDLP; RQFW={9762A7AC-D44A-4B43-AA6D-6688B4D7C48B}; ASPSESSIONIDGGQQQMTK=DFNEFLLBHKAPIBACMJOKOCBH; ASPSESSIONIDQGGQGOBG=GFFNPGMCMGBKDOJICDCGMEMF; WEBTRENDS_ID=212.113.82.197-2086124832.29505808; EGSOFT_ID=212.113.82.197-591707536.29505809; SappiUserID=471577; ASPSESSIONIDQQQQQJCO=MBPICOPCHEHEAFOJHMPBLFBE; ASPSESSIONIDGGQQGOOY=OCPLHOPCPHDLHOGHKCAJLECE; ASPSESSIONIDQQQGQGAB=HGMMAHPBONOHNEGIJICNGLCL"
[edited by: jatar_k at 5:07 pm (utc) on Nov. 15, 2002]
[edit reason] fixed side scroll [/edit]
in reply to message number 161 [webmasterworld.com...]
Well, as soon as I have the .htaccess in the subdirectory of the virtual server, Apache won't reload the config or restart - it exists with error.
The requested URI should be on the same virtual server. Actually configwise I've taken the default config of Apache and all my modifications mostly were in the virtual hosts section.
I'm a bit reluctant to post uncensored configfiles and logfile exceprts here on this public space, but up to my best knowledge (which may not be much) I think I made it right.
I guess it's only one little configuration routine which is faulty or missing.
Since I have all root priviliges, I'm not limited to htaccess but can make changes to other parts of the config as well. As I mentioned in another post I'm only trying to block email harvesters.
So what would be your recommendation?
[edited by: jatar_k at 5:10 pm (utc) on Nov. 15, 2002]
[edit reason] fixed link and sidescroll [/edit]
Andreas found out, that I had a directive:
<Files index.html>
Options -FollowSymLinks +Includes
</Files>
in my httpd.conf. Even though according to the documentation the "Options"-line should be ignored, it actually isn't.
After removing the "-FollowSymLinks" from the statement, everything works as supposed.
If you're lucky enough to run your own mailserver under your own control, you can add a second line of defense: the use of realtime blacklists (somtimes also called realtime blocklist or RBL's) in your mailserver allows you to block potential spam when the spammer tries to deliver it to you. On EACH incoming email, the mail-server checks at least one of these RBL's. If the senders IP-address tests positive on this list, email delivery is instantly cancelled even BEFORE the mail-data is transferred to your server. There's a multitude of RBL's out there. Our server checks EACH incoming message against 5 different RBL's. Some of our users - including myself - post-check their messages again against other RBL's. I - for example - have all messages coming from Russia/China/Korea/Malaysia etc. tagged with the prefix "**SPAM**". This second (and third) line of defense makes life a lot esier!
And now the $1.000.000 prize question is: what hinders a programmer of these bugs to "steal" the user-agent string of - say - IE5.0?
Am I right in thinking that a bot camouflaging itself as IE50 would be COMPLETELY invisible to .htaccess rewrite rules?
I've been reading through this thread and having used htaccess to secure areas of other websites I thought I'd test out the concepts on a dormant web site on my server.
But when I add the file which contains :-
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
I find I can't get into any of the pages.
The site is a virtual site on a server I have root access to and I've checked the httpd.conf to see that rewriteengine is on in each of the virtual sites.
Can anybody suggest what I'm doing wrong?
Andy
Later update
I've checked my server logs and I'm getting the message:-
RewriteEngine not allowed here
So now I'm really confused
Later still update
Solved it, I needed to amend the access.conf to allow overide on fileinfo
I'm very impressed with the knowledge shown in this thread! I've read it at least once but I still have a question.
What if you want to ban certain countries using ReWriteCond? How do I do that?
Right now I'm using:
deny from .at
deny from .bg
etc...
The problem with that is that it even denies my error pages so I'd like to switch over to ReWriteCond instead so that I can give them a page with a reason why they can't reach my site.
Another question... does anyone know how I test to see if the country ban is working correctly? wannabrowser.com works great for referrers but has no provisions for testing from offshore or from a specific IP location.
Thanks for the help.
Cheers,
Dennis
SetEnvIf Remote_Addr ^12\.40\.85\. getout
SetEnvIfNoCase User-Agent ^Microsoft.URL getout
<Limit GET POST>
order allow,deny
allow from all
deny from env=getout
</Limit>
This is working fine but how can I show a custom error message without implementing this all using mod_rewrite? Also how can I do a redirect if getout is set? Thanks.