Forum Moderators: phranque
Does anyone have an updated .htaccess file for bad spiders and bots etc....?
The last post to "A Close to perfect .htaccess ban list - Part 3" was in April of 2003... a tad outdated to say the least.
I know toolman and superman were very helpul in that thread... which was awesome.....
You can PM me if you like instead of posting the list here to save some bandwidth...
slats
[edited by: jdMorgan at 3:50 am (utc) on Nov. 4, 2004]
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} larbin [OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} Widow [OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} Zeus
RewriteRule .* - [F,L]
I appreciate all the effort that has gone into the previous discussions on this [webmasterworld.com] but I was wondering if someone can sticky me - or post if it's allowed - an up-to-date list of bad and useless bots?
I guess a more attractive alternative to the actual mod_rewrite code would be definitions / descriptions of currently active bots?
RewriteCond %{HTTP_USER_AGENT} !EmailProtect [NC]RewriteCond %{HTTP_USER_AGENT} ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Green\ Research¦Harvest¦HLoader¦http.?generic¦Industry.?Program) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦NICErsPRO¦NPBot¦PlantyNet_WebRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Production.?Bot¦Program.?Shareware¦Teleport.?Pro¦TurnitinBot¦TE¦VOBSUB¦VoidEYE) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(WebBandit¦WebCopier¦Websnatcher¦Website\ Extractor¦WEP.?Search¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector¦extractor¦magnet¦reaper¦search¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} \.\.\.\.\.\..?¦Educate.?Search¦Full.?Web.?Bot¦Indy.?Library¦IUFW.?Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Cowbot¦Downloader¦httrack¦larbin¦NaverRobot¦QuepasaCreep¦Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} efp@gmx\.net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^P\.Arthur\ 1\.1$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Miss.*g.*.?Locat.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.?URL.?Control [NC,OR]
# Phoney User_Agents used by email harvesters
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.0;\ Windows\ NT\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/6\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.2\)$ [NC,OR]
RewriteCond %{REQUEST_URI} (MSOffice/cltreq\.asp¦_vti_bin/owssvr\.dll¦_vti_bin/_vti_aut/fp30reg\.dll¦_mem_bin¦MSADC¦sumthin) [NC,OR]
# RewriteCond %{REQUEST_URI} ~\!\^~\!\^~\!\.html [OR]
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC]
# Above is last condition ^
RewriteRule - [F]
As has been mentioned before (aforementioned ban list threads), you will need to retype the broken pipes into solid pipes before posting these directives. All of the above directives are on their own continuous lines, but were word wrapped by this Forum. Carriage returns are only allowed when starting a new condition, rule, comment, or blank line. Comments beginning with # and should be on separate lines from the directives, to avoid possible 500 server errors.
I have also left out my personal allowance for blocked agents to view my custom error pages and other files to which I might redirect them, such as poison or banning scripts. These allowances would go on the last line, before the - [F] command, as in:
RewriteRule !^(docs/403\.html¦robots\.txt¦other-allowed-files) - [F].
Wiz
[edited by: jdMorgan at 3:38 pm (utc) on Oct. 21, 2004]
[edit reason] Fixed side-scroll [/edit]
RewriteCond %{HTTP_USER_AGENT}!EmailProtect [NC,OR]
wouldn't let me get to my site. Gives me a "Forbidden".
RewriteCond %{HTTP_USER_AGENT} !EmailProtect [NC] Wiz
# Phoney User_Agents used by email harvesters
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.0;\ Windows\ NT\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/6\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.2\)$ [NC,OR]
Are these user agents ALWAYS email harvestors? I read somewhere that these may be people who fake their browser user agent to get around what they consider to be annoying Javascript browser detection on websites. I haven't had any luck finding any good threads on this topic on Webmasterworld.
# Phoney User_Agents used by email harvesters
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.0;\ Windows\ NT\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/6\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.2\)$ [NC,OR]
Most people do not modify their browser's ID string, or even know that this is possible. Those that possess this knowledge and use it are cloaking their activities, for some reason.
I added these UA's after analyzing my logs by user agent and what they requested. My conclusion is that if these are human visitors they have either purposely mis-identified their browser, or they are using a program that has such a UA. In either case, I don't want them wasting my bandwidth.
Wiz