homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
My Updated Bad Bot List - 103
bat bot list for htaccess
florida_guy




msg:1520717
 1:42 am on Jan 17, 2003 (gmt 0)

Here's my latest bad bot list. Post if there are more, or if any shouldn't be there.

I know there's a couple that are duplicates with the [NC} option inplace...

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bullseye.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BDFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent.*Internet.*ToolPak.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*[Ee]mail.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^microsoft.url [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NEWT.*ActiveX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SurfWalker.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport*28 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^TV33_Mercator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Cc]ollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb.*[Mm]ole [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Ww]easel [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [NC,OR]
RewriteCond %{REQUEST_URI} formmail.* [NC,OR]
RewriteCond %{REQUEST_URI} FormMail.*
RewriteRule ^.*$ byebye.php [L]

 

jdMorgan




msg:1520718
 2:13 am on Jan 17, 2003 (gmt 0)

Floridaguy,

Since the following are not end-anchored, only the first one is needed to match any of the three UAs.

RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite [NC,OR]

Same thing for the "Crescent" entries and others which which follow.

Note too that using a construct like [Ww]eb [NC,OR] is redundant; The [Ww] allows a match on an uppercase "W" or a lowercase "w", and then the [NC] flag says, "match the pattern regardless of case." So the [Ww] part is unnecessary.

You can compress these rules down quite a bit. Here's an example of all of your listings which are start-anchored with "Web" and are not end-anchored:

RewriteCond %{HTTP_USER_AGENT} ^Web(Bandit¦Collector¦.*Mole¦Weasel¦EMailExtrac¦\ Image\ Collector¦\ Sucker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web(Auto¦Copier¦Fetch¦Reaper¦Sauger¦site\ eXtractor¦Stripper¦Whacker¦ZIP) [NC,OR]

You could do it all on one line, too. I only broke it into two lines to prevent wrapping problems here in the forum. More info here in this Regular expressions tutorial [etext.lib.virginia.edu].

Note that you must replace all the "¦" characters above with the one on your keyboard.

Jim

florida_guy




msg:1520719
 2:29 am on Jan 17, 2003 (gmt 0)

I'm a newbie to mod_rewrite, thanks for the lesson. I see how the 'Web' example works, quite logical actually. And, I now understand the end-anchor too. The '[Ww]eb [NC,OR]' example I found just after I posted.

Thanks again for your post!
Doug

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved