Forum Moderators: phranque

Message Too Old, No Replies

htaccess and backward slashes

htaccess and backward slashes

         

Om108

5:34 pm on Sep 11, 2004 (gmt 0)

10+ Year Member



When banning offline browsers whose name has a space, how is that name typed?

For example with Website Extractor, if the ban looks like this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^(Website\ Extractor名get后eus) [NC,OR]

or this:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Website\ Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /ban.shtml [L,R=301]

This is what happens: In my control panel under hotlink protection, it shows other names, that do NOT have a backward slash and only shows the first part of the name in which a backward slash is used.

The ban looks like this for Website Extractor:

^Website\

Instead of: ^Website\ Extractor

If there is NO space following the backward slash, it will list the entire name as: ^Website\Extractor

So, if it shows as: ^Website\ only, does it still ban Website Extractor?

Thanks.

wilderness

10:07 pm on Sep 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Actually, you can save yourself some future unwanted spidering by other grabbers shortening the line to:

RewriteCond %{HTTP_USER_AGENT} ^Web [OR]

That will deny access to any phrase that begins with Web.

Om108

10:17 pm on Sep 11, 2004 (gmt 0)

10+ Year Member



Actually, banning website extractor does not work~ not sure what I am doing wrong.

Winderness, what would you recommend for shortening or correcting my list:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^1ClickPicGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^1Click\ ImageExtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^1Click\ Image\ Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^1KeyTools\ GetEmAll\ KybIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Aaron's\ WebVacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^Aarons\ WebVacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^Aha [OR]
RewriteCond %{HTTP_USER_AGENT} ^AmiPic [OR]
RewriteCond %{HTTP_USER_AGENT} ^AmiPic\ Lite [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Pic\ Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackStreet [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bookmark\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Check&Get [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent\ Internet\ ToolPak [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Dart [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Disco\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Express [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Master [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^E-conomiser [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Exposed! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Exposed\! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ Web\ Image\ Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST\-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^FreshDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetEmAll\ KybIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grab-a-Site [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTP\ Weazel [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTPWeazel [OR]
RewriteCond %{HTTP_USER_AGENT} ^iaea [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Index.dat [OR]
RewriteCond %{HTTP_USER_AGENT} ^Index.dat\ Viewer [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^iSoloWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JoBo [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kazaa [OR]
RewriteCond %{HTTP_USER_AGENT} ^KaZaA [OR]
RewriteCond %{HTTP_USER_AGENT} ^KybIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^KybIE\ GetEmAll [OR]
RewriteCond %{HTTP_USER_AGENT} ^lachesis [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^iGette [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Leech [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^LIKSE\ HTML\ Viewer [OR]
RewriteCond %{HTTP_USER_AGENT} ^LightningDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link\ Valet [OR]
RewriteCond %{HTTP_USER_AGENT} ^linkextractorpro [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MemoWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo\ Web [OR]
RewriteCond %{HTTP_USER_AGENT} ^MetaProducts\ Download\ Express [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^miixpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^MM3-WebAssistant [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*WebCapture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DreamPassport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*AspTear [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mr.\ Hot [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^MyGetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAttache [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nitro\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Oe\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline [OR]
RewriteCond %{HTTP_USER_AGENT} ^offlinedownloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Commander [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^OS-or-CPU [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pic\ Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^PicHunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Picture\ Ace [OR]
RewriteCond %{HTTP_USER_AGENT} ^Power\ Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^prowebwalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Rip\ Clip [OR]
RewriteCond %{HTTP_USER_AGENT} ^RipClip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robofox [OR]
RewriteCond %{HTTP_USER_AGENT} ^SavePicNoAsk [OR]
RewriteCond %{HTTP_USER_AGENT} ^SavePicNoAsk\ PRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Shareaza [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Site-Thief [OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} ^slysearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SQ\ Webscanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stamina [OR]
RewriteCond %{HTTP_USER_AGENT} ^Star\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^steeler [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SurfOffline [OR]
RewriteCond %{HTTP_USER_AGENT} ^SurfSaver [OR]
RewriteCond %{HTTP_USER_AGENT} ^SurfSaver\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^takeout [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Telesoft [OR]
RewriteCond %{HTTP_USER_AGENT} ^TFTP\ Server [OR]
RewriteCond %{HTTP_USER_AGENT} ^titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^TV33_Mercator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ultra-Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^UtilMind\ HTTPGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Cloner [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Leech [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebShare\ Meta\ Finder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ VCR [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ ZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web2Map [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAssistant [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webbandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCloner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebDownloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webleech [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebRecorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSnake [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebShare [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebTunnel [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webupd [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebVac [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebVacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebVCR [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^www-collector-e [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^xenu\ link\ sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ REDIRECT-TO-URL [L,R=301]

I downloaded website extractor, and I can still extract the files from my site even though the name is included.

Other sites, like Link Valet, are not allowed in~ so I am not quite sure what is wrong. Any help would be greatly appreciated. Thanks.

~Angel

SkyDog

12:28 am on Sep 12, 2004 (gmt 0)

10+ Year Member



It may be better to set an environmental varible then ban the user agent rather than using mod_rewrite, eg:
#sets the environmental variable "banme" for all useragents that begin with "Web"
BrowserMatchNoCase ^web banme
Deny from env=banme

Om108

12:59 am on Sep 12, 2004 (gmt 0)

10+ Year Member



SkyDog~ can you give me an example of how I would do what you suggested~ and how it would look like in my htaccess file?

I have no clue what you are talking about! ((laughs))

wilderness

1:21 am on Sep 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You may remove 36 lines that begin with "Web" using the line I previously provided.

When you use the extractor on your own site, does the UA begin with Web?
If so, than you have other errors in your htaccess preventing the deny.

You have plenty of unnecessary duplication in your lines.

The following removes much replication.
There are some other syntax errors as well. In some places

RewriteCond %{HTTP_USER_AGENT} ^1Click [OR]
RewriteCond %{HTTP_USER_AGENT} ^1KeyTools [OR]
RewriteCond %{HTTP_USER_AGENT} ^Aaron [OR]
RRewriteCond %{HTTP_USER_AGENT} ^Aha [OR]
RewriteCond %{HTTP_USER_AGENT} ^AmiPic [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^Advanced\ Pic\ Hunter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Back [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bookmark\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Check [OR]
RewriteCond %{HTTP_USER_AGENT} ^Cherry [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Dart [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DittoSpyder [OR]
RRewriteCond %{HTTP_USER_AGENT} ^Disco\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download [OR]
RRewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^E-conomiser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Email [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Exposed [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express [OR]
RewriteCond %{HTTP_USER_AGENT} ^Extractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST\-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^FreshDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Get [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grab-a-Site [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^hloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^iaea [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archive [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Index [OR]
RewriteCond %{HTTP_USER_AGENT} ^Inter [OR]
RewriteCond %{HTTP_USER_AGENT} ^iSoloWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JoBo [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Kazaa [OR]
RewriteCond %{HTTP_USER_AGENT} ^KybIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^lachesis [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^iGette [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Leech [OR]
RewriteCond %{HTTP_USER_AGENT} ^LIKSE\ HTML\ Viewer [OR]
RewriteCond %{HTTP_USER_AGENT} ^LightningDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link\ Valet [OR]
RewriteCond %{HTTP_USER_AGENT} ^link [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MetaProducts\ Download\ Express [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^miixpc [OR]
RewriteCond %{HTTP_USER_AGENT} ^MM3-WebAssistant [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*WebCapture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DreamPassport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*AspTear [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mr.\ Hot [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^MyGetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Nitro\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^NPBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Oe\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline [OR]
RewriteCond %{HTTP_USER_AGENT} ^offline [OR]
RewriteCond %{HTTP_USER_AGENT} ^openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^OS-or-CPU [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pic [OR]
RewriteCond %{HTTP_USER_AGENT} ^Power\ Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^prowebwalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^Real [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Rip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robofox [OR]
RewriteCond %{HTTP_USER_AGENT} ^Save [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Shareaza [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Site [OR]
RewriteCond %{HTTP_USER_AGENT} ^Slurp [OR]
RewriteCond %{HTTP_USER_AGENT} ^slysearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SQ\ Webscanner [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stamina [OR]
RewriteCond %{HTTP_USER_AGENT} ^Star\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^steeler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Super [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surf [OR]
RewriteCond %{HTTP_USER_AGENT} ^szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^takeout [OR]
RewriteCond %{HTTP_USER_AGENT} ^Tele [OR]
RewriteCond %{HTTP_USER_AGENT} ^TFTP\ Server [OR]
RewriteCond %{HTTP_USER_AGENT} ^titan [OR]
RewriteCond %{HTTP_USER_AGENT} ^turingos [OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^TV33_Mercator [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ultra-Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^UtilMind\ HTTPGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web [OR]
RewriteCond %{HTTP_USER_AGENT} ^WEB [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^www-collector-e [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^xenu\ link\ sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
end of quote.

Some of your lines are most confusing, others even may be the reasons your file not working.

Some basic understandings which you seem to lack understanding of are "^ begins with" and "$ends with" and "contains".

There are some pointer links in the forums charter which provide links to use of these. REQUIRED reading.

Your Mozilla lines are confusing to me. I would just use the "contian" and omit the Mozilla and other junk.

A good link and very long reading for methods is:
[webmasterworld.com...]

Om108

3:01 am on Sep 12, 2004 (gmt 0)

10+ Year Member



Wilder-ness~ Thanks. It worked. About time too! lol

wilderness

4:05 am on Sep 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Om,
You still have some work to do.

In this line:
RewriteCond %{HTTP_USER_AGENT} ^FAST\-WebCrawler [OR]

You have the dash/hyphen escaped.
If that's is the correct method?
Than ALL the following need corrections as well:
writeCond %{HTTP_USER_AGENT} ^E-conomiser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grab-a-Site [OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-trivial [OR]
RewriteCond %{HTTP_USER_AGENT} ^MM3-WebAssistant [OR]
RewriteCond %{HTTP_USER_AGENT} ^OS-or-CPU [OR]
RewriteCond %{HTTP_USER_AGENT} ^www-collector-e [OR]

The following is also incorrect: The slash shoul precede the period (rather than follow)
RewriteCond %{HTTP_USER_AGENT} ^Mr.\ Hot [OR]

Jim and many of the others here are more knowledgable about these expressions than myself. Perhaps they may assist.

Om108

5:00 am on Sep 12, 2004 (gmt 0)

10+ Year Member



The "Fast\-Webcrawler" was a typo. It was not supposed to have the backward slash.

But that does bring up another interesting question. Especially with Mr. Hot and Mr. Cool ~ The names show the "mr." with a period in it. How to ban a name with a period in it like "Mr. Hot" or "Mr. Cool"?

Thanks again, Wilderness.

wilderness

9:53 am on Sep 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any line that begins with Mr

wilderness

11:12 am on Sep 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Om,
I've looked at my files and I do have the hyphen/dash escaped.
Addiotionally, I have the Exclamation point escaped, which you do utilize in some lines.

Here's the charter link to Expressions:
[etext.lib.virginia.edu...]

SkyDog

2:59 pm on Sep 12, 2004 (gmt 0)

10+ Year Member



SkyDog~ can you give me an example of how I would do what you suggested~ and how it would look like in my htaccess file?

Just change all the re-write conditions to BrowserMatch. Then add the space and "banme", this sets an environmental variable "banme" if the useragent matches the given regular expression. In your directory block (change the path-name here to the actual system path of your document root), if the variable "banme" is set it will deny access. I believe this uses less overhead than mod_rewrite. Here is an example that will block all user agents that begin with "web", eg:


BrowserMatchNoCase ^web banme
<Directory path-name>
Order deny,allow
Deny from env=banme
</Directory>

jdMorgan

5:20 pm on Sep 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Angel,

Welcome to WebmasterWorld!

The reason your control panel stats display is only showing partial user-agent strings has nothing to do with the code in your .htaccess file. It is a property of the stats display code itself.

Hyphens and underscores do not need to be escaped. The following characters must be escaped if you wish to match them literally, otherwise, they have special meaning to regular-expressions.

^$%*.+(){}[]¦\?"

! must be escaped if it begins an unanchored pattern.

In your access restriction list, you may have anchoring and case-matching problems. If you wish to match a user-agent that does not start with the pattern you provide, then omit the start anchor. For example, if the real user-agent starts with Mozilla, but contains "Mr. Pix", then your pattern should be "Mr\.\ Pix", not "^Mr\.\ Pix", otherwise it won't match because the "^" requires the user-agent to start with "Mr" rather than "Mozilla".

Furthermore, that pattern won't match the user-agent "Mr. pix", because the match is case-sensitive. If you wish to use a case-insensitive match, then add the [NC] flag to the RewriteCond, as in


RewriteCond %{HTTP_USER_AGENT} Mr\.\ Pix [NC,OR]

Regular expressions pattern matching is a flexible and very powerful tool that must be mastered along with mod_rewrite. Wilderness has provided a link to a concise tutorial on the subject which includes both the special character descriptions and the pattern-anchoring information.

Since your goal is to get what you have working, I suggest you stick with it, rather than changing methods. In most cases, the mod_access method is entirely equivalent to the mod_rewrite method, so it's your choice. But if one doesn't work, the other won't either, given the same pattern to match.

One more comment: This access restriction code will be executed for every HTTP request to your server. That means for every page, and for every image, script, CSS stylesheet, etc. that page contains that is requested from your server. For this reason, it's a good idea to keep the list as short as possible. After running it for awhile you may wish to delete the lines for user-agents that never visit your site. A pragmatic view of the balance between efficiency and access control is needed.

Jim

Om108

5:51 am on Sep 13, 2004 (gmt 0)

10+ Year Member



Thank You Wilderness, Skydog and jdMorgan. Thanks for all the help.

I only recently became aware of Offline Broswers downloading my entire site. So it has been a crash course in learning how to limit this activity. Going on 30gb of monthy transfer! But this may also be the sheer volume of visitors I get which is very high and the fact my site is primarily an images/pictures site.

Anything to save bandwidth! ((laughs))~~ Thanks Guys.

Om108

6:03 am on Sep 13, 2004 (gmt 0)

10+ Year Member



jdMorgan~ what would you suggest for the Mozilla ban? I have:

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*WebCapture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DreamPassport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*AspTear [OR]

Is there a way to shorten this list? Is it configured properly? I'd appreciate any suggestions.

Would you also suggest the [NC,OR] in place of the [OR] especially considering there is really no way for me to know if I have the correct case on the banned sites?

Thanks again.

jdMorgan

3:54 pm on Sep 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Number one priority is, "Does what you have work?" After that, you can worry about tweaking it long-term. A lot of suggestions will come down to a matter of personal style or preference. In that light, your Mozilla user-agent patterns look OK. You can add [NC] to your patterns as you wish. Again this is a trade-off between your convenience and the server's processing time of each HTTP request.

As to case, pattern-anchoring, and user-agent validity questions, one way to check all these patterns is to search for them by name, adding "user-agent" to refine the search. You'll find other sites' logs and stat files, and pages on discussion forums mentioning the user-agents. By carefully checking what the actual raw user-agent string variations are, you'll be able to determine case and anchoring requirements. Generally, stats files (like what you see in your control panel) are much less useful than raw server logs which show the actual logged requests.

The problem with lists like our Close to perfect .htaccess ban list [webmasterworld.com] is that they contain regular-expressions patterns and not full user-agent strings. You therefore rely on whoever posted the code to have got the anchoring and case correct. This is not always true, however, and "majority rule" is not useful, either; most of these access control lists posted on the Web contain errors. So, if access control is important to you, the best approach is to verify each pattern yourself. As in my previous post, it's best to hit the high-runners first, and those vary by site.

Jim

Wizcrafts

11:13 pm on Sep 13, 2004 (gmt 0)

10+ Year Member



Om108 asked;

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*WebCapture [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DreamPassport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*DnloadMage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*AspTear [OR]

Is there a way to shorten this list?


Yes, there is. Try this shorthand method to group similar names into a single line (this assumes that you want to require that the word Mozilla is in the User-Agent, followed by unknown characters and/or spaces, then any of the names within the parenthesis):

RewriteCond %{HTTP_USER_AGENT} ^Mozilla.?(newt圬nloadmage安eb圬reampassport地sptear) [OR,NC]

The above is shown word-wrapped, but should be on only one line in .htaccess!

If you don't need the Mozilla at the beginning of the User-Agent, simply list the group-rule like this:


RewriteCond %{HTTP_USER_AGENT} newt圬nloadmage安eb圬reampassport地sptear [OR,NC]

Remember to change the broken pipes to solid pipes. You can add more banned names to this group list by adding a pipe, then the name. If a name contains a space, list it like this:

Educate.?Search刎ull.?Web.?Bot匈ndy.?Library

IHTH, Wiz