Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
The trap.pl script blocks thier ip address from further access to your website. This is very safe since "_vti_inf.hmtl" is only requested by Frontpage.
Works great!
"Edge" said:
When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested.
I'm still waiting to hear from my host (this being the weekend) about whether "mod_rewrite" is available to me, but, in the meantime, I know that "Redirect" works. So could I do a Redirect, something like:
Redirect /_vti_inf.hmtl [purplemath.com...]
...to get rid of the FrontPage bums?
-----ten minutes later-----
I just tried the above line in my .htaccess file, and FrontPage was still able to download whatever it wanted from Purplemath into one of my other "webs". *sigh*
So tell me more about this script thingy...?
First, install Apache::BlockAgents for each VH, and have them all point to the same bad_agent.txt- thus you only have one file to update for all hosts. (Note that all copies of this I have found on the web have perl errors in them- you will have to tweak that code to make it work at all.)
Then, make a copy of BlockAgents, modify the code a bit to handle IP's instead of agents, rename it BlockIP (or something!)and make a master bad_ip.txt file.
Third, get that trap.pl script, and modify that to write to bad_ip.tx rather than .htaccess. I further modified trap to it day/time stamps each entry, so I can clean it out every week.
This method is REALLY fast, and painless once set up (although set-up is a B***H!) It will work across all your VH, and if someone gets to onbe VH, they get locked out of all of them!
dave
won't work always. had this one today, grabbed some hundred pages from my beloved site:
p5084d1b1.dip.t-dialin.net - - [25/Sep/2002:13:34:40 +0200] "GET /_omitted.htm HTTP/1.0" 200 2373 www.mydomain.net "_omitted.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" "-"
So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
Besides, HTTrack seems to respect robots.txt
So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
You are right in mentioning that the matching should be case insensitive (the NC flag). The '.*', however, is not neccessary, since in
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR] the pattern is not anchored anywhere (startŠend of string). The engine will try to match the pattern anywhere in the string. With '^httrack' the pattern is anchored at the beginning, with httrack$ at the end of the string. When you anchored your pattern at the start and end you would need the '.*' if you wanted to match httrack in a string that is not just 'httrack'. Your pattern would need to look like this: '^.*httrack.*$'. Note that this pattern does not make sense, unless you would want to grab the substring before and after the httrack.
To sum up, here is a chart of the four options mentioned above. NA = not anchored; BA = anchored at beginning; EA = anchored at end; MS = modified suggestion.
1. achttrackac (NA: +; BA: -; EA: -; MS: +;)
2. htTracKacac (NA: +; BA: +; EA: -; MS: +;)
3. aacaHttrack (NA: +; BA: -; EA: +; MS: +;)
Note that '.*' will match the '' string, since the quantifier * greedily (rather more than less) matches 0 or more times.
Andreas
I'm trying to ban one site from getting to me. I want to redirect to a page called /robots.php
So I tried this:
rewriteEngine On
rewriteCond {HTTP_REFERER} ^http://(www\.)?domain.com [NC,OR]
RewriteRule ^.*$ /robots.php [L]
but that seems to block everyone. What am I doing wrong?