Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- A Close to perfect .htaccess ban list - Part 2


wkitty42 - 4:28 am on Jun 17, 2003 (gmt 0)

my bot list is rather large... i don't know how accurate it is, though... i can say that i don't have the problems, today, that i had a while back...

as for posting it, i'm not sure of the best way to make it available... i could link it from my site or i could just post it in a message... i'm sure there are plenty of corrections or optimizations that could be made to it, though... hummm...

ok, take it with the understanding that you have to determine what bots you want to allow access to your site... some of these i have blocked, you may want to allow on... others, you may want to block... i can't say that these are all-inclusive or that i haven't messed something up somewhere along the lines... also note that some of this and the associated comments are by others that have posted here and on other forums... i am thankful for their contributions but, sadly, i don't have any notes as to who they were ;-(

===== snip =====

Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# this ruleset is to "stop" stupid attempts to use MS IIS expolits on us
# NIMDA
RewriteCond %{REQUEST_URI} /(cmd¦root¦shell)\.exe$[NC,OR]
RewriteCond %{REQUEST_URI} /(admin¦httpodbc)\.dll$[NC]
RewriteRule .* /cgi-bin/nonimda.cmd [L,E=HTTP_USER_AGENT:NIMDA_EXPLOIT,T=application/x-httpd-cgi]

# CODERED
RewriteCond %{REQUEST_URI} /default\.(ida¦idq)$[NC,OR]
RewriteCond %{REQUEST_URI} /.*\.printer$[NC]
RewriteRule .* /cgi-bin/nocode-r.cmd [L,E=HTTP_USER_AGENT:CODERED_EXPLOIT,T=application/x-httpd-cgi]

# this ruleset is for formmail script abusers...
RewriteCond %{REQUEST_URI} formmail\.(pl¦cgi)$[NC,OR]
RewriteCond %{REQUEST_URI} mailto\.(exe¦cgi)$[NC]
RewriteRule .* /cgi-bin/nofrmml.cmd [L,E=HTTP_USER_AGENT:FORMMAIL_EXPLOIT,T=application/x-httpd-cgi]

# Cyveillance is a spybot that scours the web for copyright violations and “damaging information” on
# behalf of clients such as the RIAA and MPAA. Their robot spoofs its User-Agent to look like Internet
# Explorer, and it completely ignores robots.txt. I have
# banned it by IP address.
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$"
RewriteRule .* - [F]

# There is another email harvester which always claims to be referred from http://www.iaea.org/.
# You may have seen this in your own referrer pages.
# I have banned it by referrer.
RewriteCond %{HTTP_REFERER} iaea\.org[NC]
RewriteRule .* - [F]

# NameProtect peddles their “online brand monitoring” to unsuspecting and gullible companies
# looking for people to sue. Despite the claims on their robot information page, they do not
# respect robots.txt; in fact, they spoof their User-Agent in multiple ways to avoid detection.
# I have banned them by User-Agent and IP address.
RewriteCond %{REMOTE_ADDR} ^12\.148\.196\.(12[8-9]¦1[3-9][0-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^12\.148\.209\.(19[2-9]¦2[0-4][0-9]¦25[0-5])$ [OR]
RewriteCond %{HTTP_USER_AGENT} NPBot[NC]
RewriteRule .* - [F]

# this ruleset is for unwanted useragents... possibly email harvesters
RewriteCond %{HTTP_USER_AGENT} ^[A-Z]+$[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Browse\s[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Eval[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.Surf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LWP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*prospector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} AsiaNetBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ASSORT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} attache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ATHENS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autohttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bew [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} curl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} devsoft's\ http\ component [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Deweb[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digimarc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Digger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} digout4uagent[NC,OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCo[NC,OR]
RewriteCond %{HTTP_USER_AGENT} dloader(NaverRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Educate\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf[NC,OR]
RewriteCond %{HTTP_USER_AGENT} EO\ Browse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures[NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} fastlwspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FEZhead[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Fetch[NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Franklin\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Full\ Web\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Getleft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetWebPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Gozilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go-ahead-got-it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTML\ Works [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IBM_Planetwide [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker[NC,OR]
RewriteCond %{HTTP_USER_AGENT} IncyWincy[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Industry\ Program[NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Explore\ 5\.x [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Irvine [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} KWebGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leech[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MCspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missauga\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Monster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla.*NEWT[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3\.0\.\+Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/3.Mozilla\/2\.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla\/4\.0$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozzilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netattache [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetCarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpaL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Openfind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OpenTextSiteCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PackRat [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pavuk[NC,OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Plucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Production\ Bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Program\ Shareware [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PushSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ReGet[NC,OR]
RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rover[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Rsync[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ScoutAbout [NC,OR]
RewriteCond %{HTTP_USER_AGENT} searchterms\.it [NC,OR]
RewriteCond %{HTTP_USER_AGENT} semanticdiscovery[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Shai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Spegla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpiderBot[NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SurfWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tarspider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Templeton[NC,OR]
RewriteCond %{HTTP_USER_AGENT} UtilMind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w3mir[NC,OR]
RewriteCond %{HTTP_USER_AGENT} web.by.mail [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebBandit[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger[NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSnake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webvac [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webwalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [NC,OR]
# RewriteCond %{HTTP_USER_AGENT} wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WhosTalking [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow[NC,OR]
RewriteCond %{HTTP_USER_AGENT} WUMPUS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} www\.pl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider[NC,OR]
RewriteCond %{HTTP_USER_AGENT} XGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus.*Webster[NC]
#RewriteCond %{HTTP_USER_AGENT} test[NC]
RewriteCond %{REQUEST_URI}!^/badUA\.html [NC]
RewriteRule .* /badUA.html [L,E=HTTP_USER_AGENT:BAD_USER_AGENT]

# this ruleset is to stop blank user agents with blank referrers
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* /cgi-bin/noagent.cmd [L,T=application/x-httpd-cgi]

===== snip =====

there're quite a few in there... watch out for hosing your server... i got mine caught in endless loops several times while adjusting this from site wide (internal to httpd.conf) to per directory (.htaccess)... was glad i run my own server :wink:

a final note... watch for missing spaces... ther should be a space before every [ and the ¦ must be replaced by the verticle pipe on your keyboard... this site strips out extra spaces and tabs and replaces the split verticle pipe by a solid one... you'll have to watch these things...

FWIW: the above is taken directly, with no modification, from one of my main site .htaccess files... this site is live and online at this time with the above...

HTH


Thread source:: http://www.webmasterworld.com/apache/205.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com