Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.
Feel free to use this on your own site and start blocking bots too.
(the top part is left out)<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
As someone else stated in a previous post if I recall....This has been one hell of a read. I read every single post in one sitting tonight. To hell with books!
I just wanted to humorously/seriously note that all these attempts at getting rid of the bad bots/spammers etc.... they (the bad people) may all end up reading this post after they try a search on google to find out why their "system" isn't working any more, LOL. All the efforts you've all put into this post will possibly be read and this will help all the spammers. However the harder you make something to detour begginners, and robots that wouldn't read these forums anyway (unless robots get so smart that they can read forums), the better.
Maybe this forum should be encrypted in some arabic language that no one can read and only " good" people get the de-encryption software to read it. But to figure who who is bad and who is good, we are back to square one again...lol.
I find this forum possibly the most interesting and mind excersizing forum I've ever visited!
So you only need to put the .htaccess in your root directory. What if the robots enter from another area?
If it's entering, say, directly at www.example.com/deep/path/to/some/file.html the server will look for an .htaccess file in each directory, starting at the root, process any information it finds, before sending the page.
XBitHack on
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} iaea\.org [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "DTS Agent" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Fetch API Request" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "Indy Library" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "LINKS ARoMATIZED" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "^DA \d\.\d+" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^Download" [OR] # OD
RewriteCond %{HTTP_USER_AGENT} "^Internet Explore" [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/4.0$" [OR] # dumb bot
RewriteCond %{HTTP_USER_AGENT} "^Mozilla/\?\?$" [OR] # formmail attacker
RewriteCond %{HTTP_USER_AGENT} "compatible ; MSIE 6.0" [OR] # spambot (note extra space before semicolon)
RewriteCond %{HTTP_USER_AGENT} "efp@gmx\.net" [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} "mister pix" [NC,OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^Atomz [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/\d\.\d+ [OR] # OD
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlickBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^FrontPage [OR] # stupid user trying to edit my site
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^IE\ \d\.\d\ Compatible.*Browser$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR] # IE’s "make availableoffline" mode
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^NG [OR] # unknown bot
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR] # NameProtect spybot
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^PersonaPilot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sqworm [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$ [OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} anarchie [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} crescent [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector¦magnet¦reaper¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} express [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} extractor [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} flashget [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} getright [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} go.?zilla [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} grabber [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} imagefetch [OR] # rude bot
RewriteCond %{HTTP_USER_AGENT} net.?(ants¦mechanic¦spider¦vampire¦zip)[NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} nicerspro [NC,OR] # spambot
RewriteCond %{HTTP_USER_AGENT} ninja [NC,OR] # Download Ninja OD
RewriteCond %{HTTP_USER_AGENT} offline [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} snagger [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} tele(port¦soft) [NC,OR] # OD
RewriteCond %{HTTP_USER_AGENT} vayala [OR] # dumb bot, doesn’t know how tofollow links, generates lots of 404s
RewriteCond %{HTTP_USER_AGENT} web.?(auto¦bandit¦collector¦copier¦devil¦downloader¦fetch¦hook¦mole¦miner¦mirror¦reaper¦sauger¦sucker¦site¦snake¦stripper¦weasel¦zip) [NC,OR] # ODs
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$" [OR] # Cyveillance spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR] # NameProtect spybot
RewriteCond %{REMOTE_ADDR} ^64\.140\.49\.6([6-9])$ [OR] # Turnitin spybot
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule!err_¦robots\.txt - [F,L]
[edited by: jatar_k at 4:36 pm (utc) on April 1, 2003]
[edit reason] sidescroll, had to shrink a line [/edit]
The meaning of the "F" and "L" flags was discussed (much) earlier in this post. The "!err_¦robots.\txt" means, that the redirection is valid for ALL files EXCEPT for files beginning with "err_" (in my cas those are my error documents like err_403.html) and robots.txt (in order to give a bot a chance to see where it is not wanted).