Forum Moderators: phranque
I appreciate all the effort that has gone into the previous discussions on this [webmasterworld.com] but I was wondering if someone can sticky me - or post if it's allowed - an up-to-date list of bad and useless bots?
I guess a more attractive alternative to the actual mod_rewrite code would be definitions / descriptions of currently active bots?
RewriteCond %{HTTP_USER_AGENT} !EmailProtect [NC]RewriteCond %{HTTP_USER_AGENT} ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Green\ Research¦Harvest¦HLoader¦http.?generic¦Industry.?Program) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦NICErsPRO¦NPBot¦PlantyNet_WebRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Production.?Bot¦Program.?Shareware¦Teleport.?Pro¦TurnitinBot¦TE¦VOBSUB¦VoidEYE) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(WebBandit¦WebCopier¦Websnatcher¦Website\ Extractor¦WEP.?Search¦Wget¦Zeus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector¦extractor¦magnet¦reaper¦search¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} \.\.\.\.\.\..?¦Educate.?Search¦Full.?Web.?Bot¦Indy.?Library¦IUFW.?Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Cowbot¦Downloader¦httrack¦larbin¦NaverRobot¦QuepasaCreep¦Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} efp@gmx\.net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^P\.Arthur\ 1\.1$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Miss.*g.*.?Locat.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.?URL.?Control [NC,OR]
# Phoney User_Agents used by email harvesters
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.0;\ Windows\ NT\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/6\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.2\)$ [NC,OR]
RewriteCond %{REQUEST_URI} (MSOffice/cltreq\.asp¦_vti_bin/owssvr\.dll¦_vti_bin/_vti_aut/fp30reg\.dll¦_mem_bin¦MSADC¦sumthin) [NC,OR]
# RewriteCond %{REQUEST_URI} ~\!\^~\!\^~\!\.html [OR]
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC]
# Above is last condition ^
RewriteRule - [F]
As has been mentioned before (aforementioned ban list threads), you will need to retype the broken pipes into solid pipes before posting these directives. All of the above directives are on their own continuous lines, but were word wrapped by this Forum. Carriage returns are only allowed when starting a new condition, rule, comment, or blank line. Comments beginning with # and should be on separate lines from the directives, to avoid possible 500 server errors.
I have also left out my personal allowance for blocked agents to view my custom error pages and other files to which I might redirect them, such as poison or banning scripts. These allowances would go on the last line, before the - [F] command, as in:
RewriteRule !^(docs/403\.html¦robots\.txt¦other-allowed-files) - [F].
Wiz
[edited by: jdMorgan at 3:38 pm (utc) on Oct. 21, 2004]
[edit reason] Fixed side-scroll [/edit]