homepage Welcome to WebmasterWorld Guest from 23.20.149.27
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 243 message thread spans 9 pages: < < 243 ( 1 [2] 3 4 5 6 7 8 9 > >     
A Close to perfect .htaccess ban list
toolman




msg:441824
 3:30 am on Oct 23, 2001 (gmt 0)

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

 

bird




msg:441854
 10:59 pm on Mar 19, 2002 (gmt 0)

The /* is used in every RewriteRule I've ever seen when redirecting to another site, so it must be there for a reason?

You might want to look at "every RewriteRule" again. The standard pattern for matching everything is "^.*$", where the "." stands for "any arbitrary character, and ".*" stands for "zero or more arbitrary characters". The "^" always matches the beginning of the test string, and the "$" the end. You are right that "/*" (= "zero or more slashes") has the same effect in the end, but semantically it doesn't make much sense. You're not confusing the computer here, you're confusing yourself... ;)

For the ones I did not download and install, I tested the UA name in Teleport Pro's Agent spoofer field .. It blocks and redirects as advertised.

The pattern "^Siphon" will NOT match "EmailSiphon". It can't. And this is one of the most popular address harvesters around, so you really want to get it right. There may be other examples like this in your list, but for most of them you have defined redundant rules that still catch them.

Just for your information, you have several entries in your list that block legitimate search engine spiders or human operated web browsers. Others are for software that people use to check if the links to your site that they have placed on their pages are still valid. Not everything that doesn't start with "Mozilla..." is automatically a rogue robot.

Key_Master




msg:441855
 11:37 pm on Mar 19, 2002 (gmt 0)

>>>Not everything that doesn't start with "Mozilla..." is automatically a rogue robot.

In fact some of the WORST bots I have encountered use normal Mozilla agents.

The days when rouge bots identify themselves are dwindling. These types of agents are being blocked more often as webmasters learn what these bots are up to and how easy it is to block them. Software developers are catching on and quietly making their products more evasive.

Superman




msg:441856
 11:49 pm on Mar 19, 2002 (gmt 0)

Bird, thanks for the clarification. I will change the rewrite.

Siphon showed up in my logs as just that "Siphon", so that's what is blocked. I will add EmailSiphon as well.

oilman




msg:441857
 12:33 am on Mar 20, 2002 (gmt 0)

I see lots of advice and many examples here that will never work, often repeated even after having been corrected. Both topics are not trivial at all, and putting up incorrect rewriting rules may do your site more harm than putting up correct ones will help it.

bird - while I appreciate your help could you maybe point out the other mistakes as well? you say there are many and you make and example of a few but leave the rest of the many just hanging out there. I've learned a ton about the rewrite rules so far and would love to round out my knowledge some more.

keyplyr




msg:441858
 1:23 am on Mar 20, 2002 (gmt 0)

>Just for your information, you have several entries in your list that block legitimate search engine spiders or human operated web browsers

Bird, would you specify which on Superman's list are "legitimate" or "human operated browsers"

Thanks

bird




msg:441859
 3:40 am on Mar 20, 2002 (gmt 0)

bird - while I appreciate your help could you maybe point out the other mistakes as well? you say there are many and you make and example of a few but leave the rest of the many just hanging out there.

I didn't really count them, so maybe I should have said "several" instead of "many". I think I already commented on the biggest misunderstandings, maybe I'll have a more systematic go at it. Not today though...

But as I already said: I would strongly advise against using any mod_rewrite rules just because someone (including myself) told you they worked in a certain way. This thing is a swiss army knife on steroids and with nuclear power (luckily I have first hand training with original swiss army knifes myself, just not with the nuclear bit). What works on one site may render the next one inaccessible. You don't need to understand all its quirks or every regular expression. Just make sure that you really understand those you use.

would you specify which on Superman's list are "legitimate" or "human operated browsers"

That will teach me to keep vague statements to myself in the future...
I just went through that list to check those I hadn't heard about yet, so I had to scan my logs for the names. About half of them never visited any of my sites, some turned up with one or two accesses. Then there are the well known offenders, and this is what remains for me:

Bot mailto:craftbot@yahoo.com Reads and respects robots.txt, although it fetches the pages a bit fast. At least it tries to be well behaved, though its purpose is unclear.

larbin A research tool used by several sites, some more legitimate than others. Might be useful to block "@unspecified.mail", as this is the default signature. There are several threads around here about it.
Wget an all purpose download tool. make up your own mind or check your logs for IPs.
SpaceBison I *think* this is Proximotron, a personal filtering proxy.

Pockey typical access pattern of a normal browser
WebWhacker Either a normal browser, or a single-page downloader (with images). The pattern from superman's list won't match this one, btw. the full UA is "Mozilla/3.0 (WebWhacker)"

WebSauger
FileHound
FlashGet all signs of download managers (only fetch special files)

RealDownload
SmartDownload download managers/plug-ins (only fetch PDF files)

Obviously, those are simply the conclusions I draw from what I find in my own log files. Your mileage may vary. I was too lazy to dig for old thread that might contain information about some of them, so if in doubt, better do that yourself... ;)

Superman




msg:441860
 5:43 am on Mar 20, 2002 (gmt 0)
I've now pretty much verified everything on my list, tweaked it, and removed the redundancies. Ninety percent of the list are Offline Browsers.

A few things were left unidentified, such as craftbot, but I put it there because nobody anywhere can say what it does ... I don't like such things poking around my site. A few months ago it was hitting me every day, although I have not seen it in awhile.

larbin is a bot that can be downloaded and used by anybody, and it can be configured to do bad things. It often shows up followed by any number of different email addresses.

WebWhacker is an offline browser for sure; my logs have it id'd as "WebWhacker"

WebSauger is a German version of HTTrack, which is a full-on Offline Browser.

Some things on the list are truly evil ... VoidEYE is a hacker tool to exloit vulnerabilities in cgi scripts.

tAkeOut was hard to ID ... the only references to it were on Japanese pages ... I finally got a screenshot of it, and it looks just like an Offline Browser.

I'm not going to go through all of them, but everything on the list was researched at the following sites, and if you have a question about something you will find it on one of them:

http://www.jafsoft.com/searchengines/webbots.html

http://www.zdnet.com/downloads/

http://webmasterfolder.com/learningfolder/bandwidth.phtml

http://google.com

One thing to remember about blocking the Offline Browsers is that virtually all of them let you change the Agent type to any number of different things, so blocking them can be easily circumvented. 90 percent of the time surfers are only interested in getting images with the offline browsers, so it's still a good idea to block them, especially if you have a site with a lot of images.

This is my htaccess as it now stands:

[1]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ http://www.site-where-you-want-to-send-the-bot [L,R][/1]

keyplyr




msg:441861
 6:32 am on Mar 20, 2002 (gmt 0)
>RewriteRule ^.*$ http://www.site-where-you-want-to-send-the-bot [L,R]

It's not my intention to marathon this thread, but would some one remind me why I would wish to lay these pests on some one else?

Superman




msg:441862
 7:01 am on Mar 20, 2002 (gmt 0)

I do it because the Offline Browser people irritate me ... they are bandwidth hogs simply out for pictures. Plus I send them to a gay site that used to spam me all the time ... the irony being that the spammer gets hit, and the Offline Browser losers get pics of nude men. Whatever, it amuses me.

Of course you can change the redirect to just give them a standard error page.

RewriteRule ^.* - [F]

Crazy_Fool




msg:441863
 11:01 pm on Mar 20, 2002 (gmt 0)

so, if i want to use toolman or Supermans code to keep bots etc out, all i need to do is save the code above in my root as .htaccess? the file starts with "RewriteEngine on" and finishes with the rewrite rule? nothing before or after?

and what about "RewriteBase /" as used on toolman's second line? is that necessary? Superman hasn't used it ....

Macguru




msg:441864
 2:12 am on Mar 21, 2002 (gmt 0)

>>why I would wish to lay these pests on some one else?

Because spammers spamming each other is funny.

Edge




msg:441865
 6:01 pm on Mar 21, 2002 (gmt 0)

And I thought I had a huge .htaccess file Superman. Looks like mine will be as long as yours...

I have been thinking for some time now that I should implement a script that limits the number of page views for a unique visitor. This script would help stop all those folks who change thier useragent name to gain a offline copy of my website or other unidentified email bots etc. Now this script would allow all the good spiders and other simular folks all the access they want. Currently, my average visitor views about 5.9 pages per visit. With this knowledge I could limit a visit to say.. 20 page views a day before I redirect them. The script would redirect to a "Become a Member" page that would require registration. All the registered folks would be allowed unlimited access.

What am I missing?

Superman




msg:441866
 9:57 pm on Mar 21, 2002 (gmt 0)

Crazy_Fool,

The script works perfectly as is ... I'm certainly no expert on htaccess, but I've tested them extensively while implementing others ideas into mine. I've honestly never seen the RewriteBase / anywhere but here ... maybe it is technically correct, I don't know. It works fine without it though.

I have learned that there are multiple ways to do these things, and also that the slightest error in the file can screw up everything. For example, I once left out the space before the [OR] on one of the lines and the script did not block anything.

Edge,

That is actually only a small portion of my htaccess ... I have another one in my images folder to prevent people hotlinking my pics, another one in my members directory for password authentication and that blocks many IP addresses hackers have used to try to bust in ... all proxy servers.

I like your idea, but I would not know how to implement it ... it's a good idea though.

Bogglesworld




msg:441867
 12:15 pm on Mar 22, 2002 (gmt 0)

Superman, tool, et al. Thanks a bunch. I went ahead and did everything you recommended here. Final question:
How can I check whether I did it correctly?
So far nothing seems wrong anyway?

To webmasterworld: Thanks. I have made good use of your glossary and the forums. What do you think of a "dictionary of spiders"? or is that too much work for something that is not really necessary. I used superman's list myself but I wonder if there are any that I shouldn't have prevented?

Edge




msg:441868
 1:58 pm on Mar 22, 2002 (gmt 0)

Bogglesworld,

Try Teleport Pro, you can change the user agent to anything you want to test your site. I suggest that you first test your site with a succesfull download before you try a blocked download.

toolman




msg:441869
 3:31 pm on Mar 22, 2002 (gmt 0)

>>>>How can I check whether I did it correctly?

I like Sam Spade.org...a common tool lots of us use here. It will let you change your ua so you can test and also do head requests to see what kind of server someone is running on. A handy tool to use for diagnosing server troubles as well.

WOW. This has really bloomed. I'm glad to see so many people finding the rewrite script handy. I can't really take credit for it, as I just peiced together what littleman, Air, Gorufu and others posted for individual situations. I'm not so concerned with the email bots as I have no email addresses in my sites...except for that IndyLibrary one. That thing will shred your site in 2 seconds flat.

I'm more concerned with things like Front Page and other "theft" bots. One of the really cool aspects of that script is the blocking of that annoying iaea.org screen scraper or what ever it is. I dont think we've really figured out precisely what it is doing (it's certainly raised my awareness of atomic issues ;) ).

I'm just like the rest of you...learning regex as I go. It's a good time for some of the *nix geeks to shine. This has really brought out one of the strengths of WMW....the collective experience of webmasters pitching in to acheive a common goal.

Visi




msg:441870
 4:26 am on Mar 24, 2002 (gmt 0)

New to this, but have a "website quester" hitting my site a lot, always same time. Is this a bot, or someone downloading site? Any advice appreciated, and also some direction omn a good reference site on the robot file if they exist, so I can learn about it.

Thanks

Superman




msg:441871
 5:20 am on Mar 24, 2002 (gmt 0)
That's the offline browser Website Extractor.

http://www.esalesbiz.com/extra/

I'd add it to my htaccess above to block it. It usually shows up in my logs as Website eXtractor, but I see others get it as Website Quester ... simply blocking all agents beginning with Website will take care of it.

RewriteCond %{HTTP_USER_AGENT} ^Website [OR]

knight




msg:441872
 8:33 pm on Apr 23, 2002 (gmt 0)

what is the Microsoft.URL and how is it harmfull?

Brendan

richlowe




msg:441873
 10:45 pm on Apr 23, 2002 (gmt 0)

Anyone know how to do this with IIS?

Visi




msg:441874
 12:30 am on Apr 24, 2002 (gmt 0)

Toolman, tried out the script, but doesnt block out frontpage ripper? Also tried on you index page to see if just me. Ooops can download your profile page too? Any ideas here?

Thanks

Key_Master




msg:441875
 12:46 am on Apr 24, 2002 (gmt 0)

Toolman, I nominate the website listed in your profile for "Most Obnoxious Site On The Net". To bad it can't be added to .htaccess as punishment for bad spiders. :)

hanuman




msg:441876
 2:52 am on Jun 10, 2002 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} ^WebZIP

I am afraid that this line won't work anymore.
WebZip can be configured to show different UA identities.

Does anyone know how to block WebZip? I spend over 1 gig / month bandwidth to Webzippers :(

Nick_W




msg:441877
 6:46 pm on Jun 10, 2002 (gmt 0)

Okay toolman

RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yo-do-main.net.* - [F]

Sorry for the dumb questions but I'm guessing that my sitename would be put instead of iaea.org and that yo-do-main.net is some site that doesn't exist?

Could you clarify for me before I add this to my .htaccess?

Many thanks

Nick

toolman




msg:441878
 8:30 pm on Jun 10, 2002 (gmt 0)

That is blocking that annoying spider that uses the iaea.org site as its referring url. The yo-domain part should be changed to... well, yo domain. ;)

You can leave this out altogether...I like to give that bot a 403 as well as Alexa. I never asked them to stick me in their db's in the first place.

Nick_W




msg:441879
 8:33 pm on Jun 10, 2002 (gmt 0)

Right, got the yo domain bit ;)

But what is the iaea.org bit for? How do we know the spider is coming from there?

I'm more than a little confused on that bit ;-)

Nick

n7qvc




msg:441880
 3:25 pm on Jun 28, 2002 (gmt 0)
Ok I'm not sure what i have done.. but i added this to my .htaccess file..

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ http://www.valli.com[L,R]

and i get misconfiguration error 500
I can remove this information and resend to server and works fine. so its not the way i save it. I used notepad, I did read that your server must support this code. How can i know if it does.. Love the forum.. been reading alot here.. Thank you webmasterworld..

Key_Master




msg:441881
 5:23 pm on Jun 28, 2002 (gmt 0)

You can't escape a space. Try placing the user agent within quotation marks.

Example:

RewriteCond %{HTTP_USER_AGENT} "^Web Sucker" [OR]

Superman




msg:441882
 3:51 am on Jun 29, 2002 (gmt 0)

N7QVC,

Ignore what Key_Master said. Your .htaccess is fine. I tested it on my server and it worked perfectly.

Your server needs to be running Apache for .htaccess to work.

-Superman-

Superman




msg:441883
 4:05 am on Jun 29, 2002 (gmt 0)
On second thought, your rewriterule line is messed up:

RewriteRule ^.*$ http://www.valli.com[L,R]

It should be:

RewriteRule ^.*$ http://www.valli.com [L,R]

Note the space between the URL and [L,R]

I don't know if that will make a difference, as it doesn't on my server, but that could be the problem.

-Superman-

Pushycat




msg:441884
 4:45 am on Jun 29, 2002 (gmt 0)

>richlowe asked
>Anyone know how to do this with IIS?

I test for these agents in global.asa in the Session_OnStart event and send them to an explanation page that has no links it can follow.

Then I use a browscap.ini file that you can get from my website that has a special section for website strippers and other nasties.

You can get this browscap.ini file and soon some sample code from my personal website.

This 243 message thread spans 9 pages: < < 243 ( 1 [2] 3 4 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved