homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 243 message thread spans 9 pages: 243 ( [1] 2 3 4 5 6 7 8 9 > >     
A Close to perfect .htaccess ban list
toolman




msg:441824
 3:30 am on Oct 23, 2001 (gmt 0)

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

 

TheLynxEffect




msg:441825
 3:51 am on Oct 23, 2001 (gmt 0)

Nice! Thanks for sharing that really cool info toolman. I can't spot any other bots at the moment.

Sticky

Brett_Tabke




msg:441826
 7:24 pm on Oct 23, 2001 (gmt 0)

Very nice TM. How much speed difference can you notice on each page view?

toolman




msg:441827
 8:14 pm on Oct 23, 2001 (gmt 0)

>>>How much speed difference can you notice on each page view.

Couldn't say I notice any at all. The part above this though could determine that...if I run everything through the php parser I expect a hit. Usually I run AddHandlers for for ssi's and have never noticed a slow down.

BTW I pieced this together from snippets others posted here on the board.

sugarkane




msg:441828
 8:26 pm on Oct 23, 2001 (gmt 0)

Another one might be

RewriteCond %{HTTP_USER_AGENT} .*almaden.* [OR]

ggrot




msg:441829
 8:51 pm on Oct 23, 2001 (gmt 0)

I use .htaccess to remap third level domains to various directories based on HTTP_HOST. What happens it two rewritecond's apply to two separate rewrite rules (ie: I place some of these blocking lines above my third level domain remaps in my .htaccess file)?

Beagle




msg:441830
 11:21 am on Oct 24, 2001 (gmt 0)

Hi Toolman nice compilation of nasty bots! Have you tried sticking the re-writer in httpd.conf? It would run fastest there, although you noted that there was no noticeable speed difference as it is.

Thanks again for sharing it with us!

toolman




msg:441831
 2:37 pm on Oct 24, 2001 (gmt 0)

I found another UA for InternetSeer

RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]

Not sure what the difference is but this one is the one that comes by every fifteen minutes as my competition tries to fool me into thinking I have more traffic than I do. Now it's easily filtered as a 403.

Long live mod_rewrite :)

idiotgirl




msg:441832
 3:24 pm on Oct 24, 2001 (gmt 0)

toolman- have you been looking over my shoulder at 2 am? I thought *I* had some kind of unhealthy fixation with .htaccess. Guess not. And it may even be healthy, after all.

I've been going back and forth from a kind of banbot.cgi that reads a banned.txt file, to just drawing a line in the sand and doing the full-on mod_rewrite at the top level to initiate a trickle down effect on the sub domains I host.

What I've been toying with is a combination of my banned.txt file automatically updating my .htaccess file - using grep to insert/add/delete lines depending on what is in banned.txt. It's pretty easy to update my banned.txt file either by hand or with a little interface program I wrote - but I'm 'grappling with grep' to insert my lines in the correct place in the .htaccess file. I'm in the dark with grep. Grep vexes me. Grep makes my stomach hurt.

Has anyone else considered this, or is it too much work? I thought it would give me some flexibility, and kill two birds with one stone. In fact, at 2 am I think it's a brilliant idea. Then again, I don't get out much.

franklin dematto




msg:441833
 6:04 am on Oct 30, 2001 (gmt 0)

toolman, mind translating that for those of us are mod_rewrite impaired ?

mivox




msg:441834
 10:09 pm on Jan 8, 2002 (gmt 0)

Dredging this thread out of the depths of time.

Could someone please translate this line:

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Just wondering exactly what's happening there....

idiotgirl




msg:441835
 12:46 am on Jan 9, 2002 (gmt 0)

RewriteRule !^http://[^/.]\.your-site.com.* - [F]
is shorthand for "Get the hell out and don't come back 'cuz you aren't viewing a darned thing from this (my domain) today and as far as I'm concerned you get the big 'F' meaning - I (my domain) does not exist to you."

At least, that's my understanding. Apache has all that neat stuff posted. I forget most if it - always have to refer back.

"I'm not a smart man, Jenny" - Forrest Gump aka idiotgirl
<added>not a sig - just how I feel today,</added>

toolman




msg:441836
 1:43 am on Jan 9, 2002 (gmt 0)

mivox thats blocking that screen scraper from iaea.org

pshea




msg:441837
 5:57 pm on Jan 9, 2002 (gmt 0)

Thank you Toolman for the list.

I have added these to my htaccess which I have never really fooled around with before. Having now added these, can you tell me what I can expect?

Will its effect be a "lack of" data, meaning if these bots are excluded, my (a) logs will be smaller and (b) fewer email harvesters leading to less junk email and (c) less usage on the server. Have I got its' benefits right?

toolman




msg:441838
 6:07 pm on Jan 9, 2002 (gmt 0)

You can expect a slight performance hit on your server...nothing major.

I really don't worry too much about email harvesters as I don't put email addresses on my site. The ones that iritate me are the site rippers. This is the latest version.

I know it could be shortened so if you're a unix geek please quit snickering and help us on the regex stuff. Thanks for your support ;)

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yo-do-main.net.* - [F]

rcjordan




msg:441839
 6:22 pm on Jan 9, 2002 (gmt 0)

>expect

I installed TM's htaccess about 2 months ago, along with a trial run of a script to email me when one of these tripped an error code. Luckily, I decided to run it on a single site rather than 40 of them. I was deluged by error notifications, I had to repoint it to an error form to save my inbox. Expect to be surprised.

BTW, I now have it on all sites and server performance does seem to be slightly improved.

bird




msg:441840
 6:58 pm on Jan 9, 2002 (gmt 0)

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

  • ! If the requested URL is NOT of the following form:

    1. ^ directly at the beginning of the string
    2. http:// this string literally
    3. [^/.] one character that is not a slash or a dot (probably meant to read [^/.]+ for "one or more of those")
    4. \. a literal dot (escaped)
    5. your-site.com this string literally (almost, as the unescaped dot will match any arbitrary character)
    6. .* any trailing characters (or none)

  • - dont't rewrite the URL
  • [F] return a "403 forbidden" to the client

This means that the rule would theoretically be applied to all requests that ask your server for a page from from a different domain than "your-site.com", given that they show the www.iaea.org referrer. In other words, the pattern probably doesn't do what its author had in mind.

Reality, however, is slightly different. ;) The string passed to the RewriteRule only contains the path component of the URL without the hostname. This is the reason why the technically pointless pattern still gives the desired result and simply denies any request where the RewriteCond matches. The rule will by definition never see a string that starts with "http://", but only strings that start with a "/".

If in doubt, I'd simply lump the RewriteCond for iaea together with the others in the upper list and get rid of the second RewriteRule. The "^.*" of the first RewriteRule acheives the same result in a much simpler was, by saying "apply this rule to URLs that contain any sequence of characters, or none".

Crazy_Fool




msg:441841
 12:00 am on Jan 12, 2002 (gmt 0)

toolman
i found a few new UAs in my logs for the last couple of months. don't know much about them but you might like to keep an eye on them in case they are pests. i've posted the list in the spider identification forum at [webmasterworld.com...]

DrOliver




msg:441842
 11:00 am on Mar 5, 2002 (gmt 0)

Hi all, this is my first post, and it is a question...

I still don't get it. Do I have to replace "your-site.com" and/or "http://www.iaea.org" with my actual URL or do I leave this as it is?

This is a snippet of the code:
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

I hope I will be able to deliver some solutions to other topics in return soon, as I am mostly a designer and quite good in X/HTML and CSS, rather than in programming and server technologies.

So I'd be happy if anyone could blow away the fog

Brett_Tabke




msg:441843
 9:21 am on Mar 6, 2002 (gmt 0)

Leave it as is. That iaea referrer is part of some abusive bot that we've all banned. It uses iaea as a referrer. You will find it coming in from all kinds of ip's in south east asia - easiest to ban the referrer.

DrOliver




msg:441844
 2:52 pm on Mar 6, 2002 (gmt 0)

RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Thanx for the info. Still I am not quite sure about the quoted line above. Do I leave this also as it is or do I replace "your-site.com" with my actual URL? Sorry if it sounds like I'm stupid...

gethan




msg:441845
 2:55 pm on Mar 6, 2002 (gmt 0)

your-site.com is replaced by your domain name.

Welcome to WmW

Brett_Tabke




msg:441846
 2:57 pm on Mar 6, 2002 (gmt 0)

Yep, I see - that was a 2 parter. Thanks Gethan.

DrOliver




msg:441847
 3:32 pm on Mar 6, 2002 (gmt 0)

Okay, got that. Thanx for the welcome. I do feel better now as I have my first of what I hope to be more useful posts in the "Browsers, HTML, and Web Page Design"-forum.

Superman




msg:441848
 5:27 am on Mar 19, 2002 (gmt 0)
I see most people send their bots to a 403 error page, but since I am more concerned with the so called "Offline Browsers" or "Sitegrabbers," I redirect them to a gay site that used to SPAM me ...

Here is my current .htaccess file with all the Offline Browsers I've come across so far ...

[small]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Image\Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule /*$ http://www.yourdomain.com [L,R][/small]

The RewriteRule line can be changed to send the bot to any site you want ...

Feel free to copy it or give me suggestions if there is anything I need to add or remove ...

brotherhood of LAN




msg:441849
 5:32 am on Mar 19, 2002 (gmt 0)

OK I see you are all keen on your HT Access files

I sooooo want to copy and paste what you have all wrote so far......

"where" does a .htaccess file GO? I dont run my own server, but wouldnt mind getting up to scratch, I recognise some of those user agents from my stats

keyplyr




msg:441850
 7:17 am on Mar 19, 2002 (gmt 0)

>"where" does a .htaccess file GO?

The file is created as a text file. Name it htaccess.txt and upload to your root directory. Then use your FTP client to rename it .htaccess (notice it starts with a dot)

gethan




msg:441851
 9:14 am on Mar 19, 2002 (gmt 0)

brotherhood_of_LAN - you will need to be running Apache with mod_rewrite enabled to benefit from any of this code. HTH

bird




msg:441852
 4:27 pm on Mar 19, 2002 (gmt 0)

I highly recommend anyone trying to implement something like this to read up on regular expressions, and to study the mod_rewrite documentation very carefully. I see lots of advice and many examples here that will never work, often repeated even after having been corrected. Both topics are not trivial at all, and putting up incorrect rewriting rules may do your site more harm than putting up correct ones will help it. One thing to remember is not to confuse regular experssions with shell wildcards. Those two things work very differently, even if they serve similar purposes.

Just a few examples:

RewriteCond %{HTTP_USER_AGENT} ^Offline\Explorer [OR]

The "\E" sequence is meaningless. What was probably meant is "\ E", with a space between the backslash and the E. The format of the RewriteCond entries is whitespace delimited. This means, that if your pattern includes any whitespace, then you need to escape that. The sequence "\ " (backslash-space) does exactly this, and avoids the normal "end of pattern" meaning of the whitespace.

RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]

The "^" always matches the beginning of the search string. So this rule matches any UA that starts with "Siphon...". However, the real UA that you want to catch here starts with "EmailSiphon...", which will not get caught with the above pattern. In short, if you want to match a substring out of the middle of the UA string, don't use the "^".

RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]

The UA that you want to catch here is "NetZIP" (or at least you also want to catch that one). However, in the normal case, the RewriteCond pattern will perform a case-sensitive match. If you want to get case-insensitive matches, use the NoCase flag: [NC,OR] instead of just [OR]

RewriteRule /*$ [yourdomain.com...] [L,R]

The "/*" sequence has the meaning of "zero or more slashes". Is this really what was intended? The correct pattern for this situation has been outlined several times in this thread.

And finally, I'd like once again to emphasize the most important advice that I can give in this context: Don't use any rewrite rules on your site that you don't understand yourself in all their consequences. Mod_rewrite is a very powerful tool, but also a very dangerous one.

Superman




msg:441853
 9:54 pm on Mar 19, 2002 (gmt 0)

Bird,

Nice catch on the \space ... you are right of course. I converted it from having a . in place of the whitespace and left out the trailing space after the \

As far as the rewrite ... this format works perfectly. It redirects the Offline Browser to the new page every time ... maybe it is technically incorrect, but it does what it is intended to.

The /* is used in every RewriteRule I've ever seen when redirecting to another site, so it must be there for a reason? ... the $ is sometimes left out. I don't know the technicalities of it all, but I know it works.

Perhaps the confusion is in the fact that the page I was redirecting to was changed by the moderator ... it should not be "www.yoursite.com", it should be "www.site-you-are-sending-the-bot-to.com"

I've tested this .htaccess with many of the Offline Browsers on the list. For the ones I did not download and install, I tested the UA name in Teleport Pro's Agent spoofer field .. It blocks and redirects as advertised.

Obviously the list of UA's can be modified/replaced with whatever Agent's you wish ... these are just the one's that have shown up in my logs.

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Copier [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_AGENT} ^Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZip [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon
RewriteRule /*$ [site-you-are-sending-the-bot-to.com...] [L,R]

This 243 message thread spans 9 pages: 243 ( [1] 2 3 4 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved