Forum Moderators: phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list - Part 4

Anyone have an update of the actual list please

         

fish_eye

3:23 am on Oct 5, 2004 (gmt 0)

10+ Year Member



It's been some time since I updated my banned robots list and was wondering if much has changed in the last 12 months. I assume it has.

I appreciate all the effort that has gone into the previous discussions on this [webmasterworld.com] but I was wondering if someone can sticky me - or post if it's allowed - an up-to-date list of bad and useless bots?

I guess a more attractive alternative to the actual mod_rewrite code would be definitions / descriptions of currently active bots?

Umbra

9:13 am on Oct 12, 2004 (gmt 0)

10+ Year Member



Rather than slog through that massive thread... is there a one-stop comprehensive up-to-date source for this evolving ban list? (Either one particular message or perhaps a website?)

fish_eye

1:01 pm on Oct 12, 2004 (gmt 0)

10+ Year Member



It would be nice, yes, but it is prone to abuse I guess.

I did find more info in the robots.txt forum and also in the search engine spiders forum but they only go part of the way (of identifying the good guys not the bad ones).

Wizcrafts

3:14 pm on Oct 21, 2004 (gmt 0)

10+ Year Member



This is what I am currently using, after studying my server logs and those of others:

RewriteCond %{HTTP_USER_AGENT} !EmailProtect [NC]

RewriteCond %{HTTP_USER_AGENT} ^(BlackWidow¦Crescent¦Disco.?¦ExtractorPro¦HTML.?Works¦Franklin.?Locator) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Green\ Research¦Harvest¦HLoader¦http.?generic¦Industry.?Program) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(IUPUI.?Research.?Bot¦Mac.?Finder¦NetZIP¦NICErsPRO¦NPBot¦PlantyNet_WebRobot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Production.?Bot¦Program.?Shareware¦Teleport.?Pro¦TurnitinBot¦TE¦VOBSUB¦VoidEYE) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(WebBandit¦WebCopier¦Websnatcher¦Website\ Extractor¦WEP.?Search¦Wget¦Zeus) [NC,OR]

RewriteCond %{HTTP_USER_AGENT} cherry.?picker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e?mail.?(collector¦extractor¦magnet¦reaper¦search¦siphon¦sweeper¦harvest¦collect¦wolf) [NC,OR]

RewriteCond %{HTTP_USER_AGENT} \.\.\.\.\.\..?¦Educate.?Search¦Full.?Web.?Bot¦Indy.?Library¦IUFW.?Web [NC,OR]

RewriteCond %{HTTP_USER_AGENT} Cowbot¦Downloader¦httrack¦larbin¦NaverRobot¦QuepasaCreep¦Siphon [NC,OR]

RewriteCond %{HTTP_USER_AGENT} efp@gmx\.net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^P\.Arthur\ 1\.1$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Miss.*g.*.?Locat.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.?URL.?Control [NC,OR]
# Phoney User_Agents used by email harvesters
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible\ ;\ MSIE.? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.0;\ Windows\ NT\)$ [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\ \(compatible;\ MSIE\ 5\.00;\ Windows\ 98$ [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/6\.0\ \(compatible;\ MSIE\ 6\.0;\ Windows\ NT\ 5\.2\)$ [NC,OR]

RewriteCond %{REQUEST_URI} (MSOffice/cltreq\.asp¦_vti_bin/owssvr\.dll¦_vti_bin/_vti_aut/fp30reg\.dll¦_mem_bin¦MSADC¦sumthin) [NC,OR]

# RewriteCond %{REQUEST_URI} ~\!\^~\!\^~\!\.html [OR]
RewriteCond %{HTTP_REFERER} q=guestbook [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC]
# Above is last condition ^
RewriteRule - [F]


There are a lot of user agents from the "Close To Perfect Ban List" that are not in my list, because they haven't visited my websites, or are captured by my wildcard terms, or which I don't consider to be a problem or threat. Conversely, there are some in my list that others may not wish to ban at all, such as NaverBot.

As has been mentioned before (aforementioned ban list threads), you will need to retype the broken pipes into solid pipes before posting these directives. All of the above directives are on their own continuous lines, but were word wrapped by this Forum. Carriage returns are only allowed when starting a new condition, rule, comment, or blank line. Comments beginning with # and should be on separate lines from the directives, to avoid possible 500 server errors.

I have also left out my personal allowance for blocked agents to view my custom error pages and other files to which I might redirect them, such as poison or banning scripts. These allowances would go on the last line, before the - [F] command, as in:
RewriteRule !^(docs/403\.html¦robots\.txt¦other-allowed-files) - [F].

Wiz

[edited by: jdMorgan at 3:38 pm (utc) on Oct. 21, 2004]
[edit reason] Fixed side-scroll [/edit]