homepage Welcome to WebmasterWorld Guest from 54.204.73.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 243 message thread spans 9 pages: < < 243 ( 1 2 3 [4] 5 6 7 8 9 > >     
A Close to perfect .htaccess ban list
toolman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 3:30 am on Oct 23, 2001 (gmt 0)

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

 

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 2:09 pm on Sep 21, 2002 (gmt 0)

When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested. I set up a trap script via. SSI in the html file (_vti_inf.hmtl), search for trap.pl on webmaster world.

The trap.pl script blocks thier ip address from further access to your website. This is very safe since "_vti_inf.hmtl" is only requested by Frontpage.

Works great!

mundonet

10+ Year Member



 
Msg#: 687 posted 8:36 pm on Sep 21, 2002 (gmt 0)

Edge: what if we are using FP to upload (Publish)? Doesn't FP request the file to determine what needs updating? Will we ban ourselves?

stapel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 11:40 pm on Sep 21, 2002 (gmt 0)

"Edge" said:
When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested.

I'm still waiting to hear from my host (this being the weekend) about whether "mod_rewrite" is available to me, but, in the meantime, I know that "Redirect" works. So could I do a Redirect, something like:

Redirect /_vti_inf.hmtl [purplemath.com...]

...to get rid of the FrontPage bums?

-----ten minutes later-----

I just tried the above line in my .htaccess file, and FrontPage was still able to download whatever it wanted from Purplemath into one of my other "webs". *sigh*

So tell me more about this script thingy...?

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 12:04 am on Sep 22, 2002 (gmt 0)

Oops, did I say "_vti_inf.hmtl", I realy meant "_vti_inf.html"

Sorry about that.

stapel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 12:49 am on Sep 22, 2002 (gmt 0)

Duh! I didn't even notice the misspelling when I did the cut-n-paste.

But I just tried again, using the proper spelling, and it still didn't work.

Oh, well. About that script you mentioned...?

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 4:28 pm on Sep 22, 2002 (gmt 0)

For those of you with multiple domains, and want it in httpd.conf in stead of .htaccess (and have root access!) I have a solution!

First, install Apache::BlockAgents for each VH, and have them all point to the same bad_agent.txt- thus you only have one file to update for all hosts. (Note that all copies of this I have found on the web have perl errors in them- you will have to tweak that code to make it work at all.)

Then, make a copy of BlockAgents, modify the code a bit to handle IP's instead of agents, rename it BlockIP (or something!)and make a master bad_ip.txt file.

Third, get that trap.pl script, and modify that to write to bad_ip.tx rather than .htaccess. I further modified trap to it day/time stamps each entry, so I can clean it out every week.

This method is REALLY fast, and painless once set up (although set-up is a B***H!) It will work across all your VH, and if someone gets to onbe VH, they get locked out of all of them!

dave

bull

10+ Year Member



 
Msg#: 687 posted 12:32 pm on Sep 25, 2002 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} httrack [OR]

won't work always. had this one today, grabbed some hundred pages from my beloved site:

p5084d1b1.dip.t-dialin.net - - [25/Sep/2002:13:34:40 +0200] "GET /_omitted.htm HTTP/1.0" 200 2373 www.mydomain.net "_omitted.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" "-"

So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
Besides, HTTrack seems to respect robots.txt

1host

10+ Year Member



 
Msg#: 687 posted 3:06 am on Sep 26, 2002 (gmt 0)


I guess none of the code discussed here will work without mod_rewrite, what alternative is there if my server doesn't have mod_rewrite installed?

I'd really like to block these fiends as well :)

thx
tom

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 1:25 pm on Sep 26, 2002 (gmt 0)

So, this might be better as far as I can see:
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]

You are right in mentioning that the matching should be case insensitive (the NC flag). The '.*', however, is not neccessary, since in

RewriteCond %{HTTP_USER_AGENT} httrack [NC,OR]
the pattern is not anchored anywhere (start¦end of string). The engine will try to match the pattern anywhere in the string.

With '^httrack' the pattern is anchored at the beginning, with httrack$ at the end of the string. When you anchored your pattern at the start and end you would need the '.*' if you wanted to match httrack in a string that is not just 'httrack'. Your pattern would need to look like this: '^.*httrack.*$'. Note that this pattern does not make sense, unless you would want to grab the substring before and after the httrack.

To sum up, here is a chart of the four options mentioned above. NA = not anchored; BA = anchored at beginning; EA = anchored at end; MS = modified suggestion.

1. achttrackac (NA: +; BA: -; EA: -; MS: +;)
2. htTracKacac (NA: +; BA: +; EA: -; MS: +;)
3. aacaHttrack (NA: +; BA: -; EA: +; MS: +;)

Note that '.*' will match the '' string, since the quantifier * greedily (rather more than less) matches 0 or more times.

Andreas

58sniper

10+ Year Member



 
Msg#: 687 posted 1:44 pm on Sep 26, 2002 (gmt 0)

Okay, this brings up a somewhat related question...

I'm trying to ban one site from getting to me. I want to redirect to a page called /robots.php

So I tried this:

rewriteEngine On
rewriteCond {HTTP_REFERER} ^http://(www\.)?domain.com [NC,OR]
RewriteRule ^.*$ /robots.php [L]

but that seems to block everyone. What am I doing wrong?

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 2:02 pm on Sep 26, 2002 (gmt 0)

what alternative is there if my server doesn't have mod_rewrite installed?

You could use mod_access and mod_setenvif which are compiled and loaded into the server by default. They should be available unless you or your hosting company removed them.

Deny [httpd.apache.org] is used to restrict access to the server based on hostname, IP address, or environment variables. Hostname and IP won´t work, so we need a way to set environment variables depending on the User-Agent. SetEnvIf [httpd.apache.org] allows us to do just that. Preferrably we would like the matching to be case insensitive. Luckily the Apache developers provided a method to do just that SetEnvIfNoCase [httpd.apache.org].

Now we need to put those pieces together.

SetEnvIfNoCase User-Agent EmailSiphon AC_FORBIDDEN
SetEnvIfNoCase User-Agent EmailWolf AC_FORBIDDEN
SetEnvIfNoCase User-Agent Crescent AC_FORBIDDEN
SetEnvIfNoCase User-Agent LinkWalker AC_FORBIDDEN
SetEnvIfNoCase User-Agent EmailCollector AC_FORBIDDEN
Order Allow,Deny
Allow from all
Deny from env=AC_FORBIDDEN

As with the regular expression in the RewriteCond directive you could just use one SetEnvIfNoCase [httpd.apache.org] like this:

SetEnvIfNoCase User-Agent EmailSiphon¦EmailWolf¦Crescent¦LinkWalker¦EmailCollector AC_FORBIDDEN
Order Allow,Deny
Allow from all
Deny from env=AC_FORBIDDEN

where everything from SetEnvIfNoCase to AC_FORBIDDEN would need to be in a single line.

Andreas

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 2:20 pm on Sep 26, 2002 (gmt 0)

What am I doing wrong?

Lose the OR flag and add a % sign in front of {HTTP_REFERER}.
You don´t need the pattern in the RewriteRule to be anchored.

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(www\.)?domain.com [NC]
RewriteRule .* /robots.php [L]

Andreas

58sniper

10+ Year Member



 
Msg#: 687 posted 2:45 pm on Sep 26, 2002 (gmt 0)

Ok. I figured out the OR issue myself, but the % did get it to work correctly.

Thanks!

58sniper

10+ Year Member



 
Msg#: 687 posted 3:56 pm on Sep 26, 2002 (gmt 0)

On to the next issue -

Can anyone tell me what "RewriteCond: bad flag delimiters" means (other than the obvious)? As soon as I plug in the following to my .htaccess, I'm getting 500 errors, and "RewriteCond: bad flag delimiters" shows up in the error_log.

RewriteCond %{HTTP_USER_AGENT} ^Mozilla* [OR]
RewriteCond %{HTTP_USER_agent} .*almaden.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_agent} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_agent} ^attach [OR]
RewriteCond %{HTTP_USER_agent} ^autoemailspider [OR]
RewriteCond %{HTTP_USER_agent} ^BackWeb [OR]
RewriteCond %{HTTP_USER_agent} ^Bandit [OR]
RewriteCond %{HTTP_USER_agent} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_agent} ^Buddy [OR]
RewriteCond %{HTTP_USER_agent} ^bumblebee [OR]
RewriteCond %{HTTP_USER_agent} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_agent} ^CICC [OR]
RewriteCond %{HTTP_USER_agent} ^Collector [OR]
RewriteCond %{HTTP_USER_agent} ^Copier [OR]
RewriteCond %{HTTP_USER_agent} ^Crescent [OR]
RewriteCond %{HTTP_USER_agent} ^DA [OR]
RewriteCond %{HTTP_USER_agent} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_agent} ^DISCo\Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_agent} ^Download\ Wonder [OR]
RewriteCond %{HTTP_USER_agent} ^Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Drip [OR]
RewriteCond %{HTTP_USER_agent} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_agent} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_agent} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_agent} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_agent} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_agent} ^GetSmart [OR]
RewriteCond %{HTTP_USER_agent} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go\!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_agent} ^gotit [OR]
RewriteCond %{HTTP_USER_agent} ^Grabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_agent} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_agent} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_agent} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_agent} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_agent} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_agent} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_agent} ^Iria [OR]
RewriteCond %{HTTP_USER_agent} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_agent} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_agent} ^LexiBot [OR]
RewriteCond %{HTTP_USER_agent} ^lftp [OR]
RewriteCond %{HTTP_USER_agent} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_agent} ^likse [OR]
RewriteCond %{HTTP_USER_agent} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_agent} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_agent} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^Memo [OR]
RewriteCond %{HTTP_USER_agent} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_agent} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_agent} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MS\ FrontPage* [OR]
RewriteCond %{HTTP_USER_agent} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_agent} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_agent} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_agent} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_agent} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_agent} ^Ping [OR]
RewriteCond %{HTTP_USER_agent} ^PingALink [OR]
RewriteCond %{HTTP_USER_agent} ^Pockey [OR]
RewriteCond %{HTTP_USER_agent} ^psbot [OR]
RewriteCond %{HTTP_USER_agent} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_agent} ^Reaper [OR]
RewriteCond %{HTTP_USER_agent} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_agent} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_agent} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_agent} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_agent} ^Snake [OR]
RewriteCond %{HTTP_USER_agent} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_agent} ^Stripper [OR]
RewriteCond %{HTTP_USER_agent} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_agent} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_agent} ^URLSpiderPro [OR]
RewriteCond %{HTTP_USER_agent} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_agent} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web Downloader [OR]
RewriteCond %{HTTP_USER_agent} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_agent} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_agent} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_agent} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_agent} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]

58sniper

10+ Year Member



 
Msg#: 687 posted 4:38 pm on Sep 26, 2002 (gmt 0)

I believe part of the problem is with:
RewriteCond %{HTTP_USER_AGENT} .*almaden.* [OR]

So I changed it to:
RewriteCond %{HTTP_USER_AGENT} almaden [OR]

I also determined that some of the problem was with:
RewriteCond %{HTTP_USER_AGENT} ^Web Downloader [OR]
It didn't escape the space.

This appears to have resolved the problems.

bull

10+ Year Member



 
Msg#: 687 posted 5:25 pm on Sep 26, 2002 (gmt 0)

RewriteCond %{HTTP_USER_agent} ^EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_agent} ^EmailWolf [OR]

so IMHO can be reduced to

RewriteCond %{HTTP_USER_agent} email [NC,OR]

as I don't see any legitimate UA has "email" in its name.

58sniper

10+ Year Member



 
Msg#: 687 posted 5:58 pm on Sep 26, 2002 (gmt 0)

Yeah, I'm going to consolidate. I can probably do the same with "download" "grab" "bot" and "spider"

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 6:09 pm on Sep 26, 2002 (gmt 0)

GoogleBot :o

Andreas

58sniper

10+ Year Member



 
Msg#: 687 posted 8:05 pm on Sep 26, 2002 (gmt 0)

Ya know, this has got me thinking....

Wouldn't it be easier to just write the .htaccess on what to allow, instead of what to deny?

stapel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 8:30 pm on Sep 26, 2002 (gmt 0)

I think this would require way too many listings on what to allow, and what then would happen when a new browsing product came out that you didn't know about yet?

As depressingly long as these "deny" lists can get, I think they're still the better way to go.

Eliz.

andreasfriedrich

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 8:50 pm on Sep 26, 2002 (gmt 0)

Wouldn't it be easier to just write the .htaccess on what to allow, instead of what to deny?

As with the distiction of rights and liberties in basic human rights and with the question of your default policy in firewalls this is a question of what you want to prevail in new cases and in cases of doubt.

Monarchies, even constitutional ones, often are liberties based. Everything is forbidden (not only in the sense of criminal law) until allowed. E.g this was the case in the UK. This changed slightly with the passing of the Human Rights Act 1998. In countries with a rights based law systems everything is allowed until explicitly forbidden. Article 2 of the German contitutes that everybody can do as he pleases. The 10th Amendment to the US constitution states that all powers not delegated to the US or the states remain with the people.

If a new situation arises in a liberties based rights system, it is forbidden while it is allowed in a rights based one. The former is way more restrictive than the latter and not in accordance with the image of an autonomous, intelligent individual or prospective purposive agent.

The same goes for the question of whether you want to deny everybody and allow only certain user agents. It is always quite helpful to think about the consequences of an error in your system (Fehlerfolge).

Deny all: If you miss an UA (you don´t allow access) that UA will not be able to access your site.
Allow all: If you miss an UA (you don´t block it) that UA will be allowed.

Which consequence is the worst depends on your default policy. Would you rather have people see your content or not. In case of the former it is better to allow access to a UA that you don´t want to allow. In the latter case it is better to block more UAs than less.

If you care about freedom be permissive, if you are paranoid be restrictive.

Andreas

bull

10+ Year Member



 
Msg#: 687 posted 8:47 pm on Oct 1, 2002 (gmt 0)

hi,

testing my htaccess I get the following error (excerpt):

You don't have permission to access /
on this server.<P>
<P>Additionally, a 403 Forbidden
error was encountered while trying to use an ErrorDocument to handle the request

although I specified
ErrorDocument 404 /err404.htm
ErrorDocument 403 /err403.htm

and this text above is not my err403. So UA xy also gets a 403 while getting the err page I think.

my last lines in htaccess:
RewriteCond %{HTTP_USER_AGENT} ^PortalBSpider
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.mydomain.net.* - [F]

any ideas? please help.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 9:25 pm on Oct 1, 2002 (gmt 0)

bull,

Yes, you have (with your rewrite) also blocked access to your custom 403 page, resulting in a cascade of 403 errors. It sounds like your server intervened to prevent an infinite loop.

Note also that the "http://(.*\.)?mydomain.net part of your URL is not "visible" to RewriteRules - they only "see" the path name part of the URI - the part of the URL after "mydomain.net/". Use RewriteConds if you need to test the domain.

I assume that you were testing the PortalBSpider rule... If so, here's a way to correct the problem:

RewriteCond %{HTTP_USER_AGENT} ^PortalBSpider
RewriteRule !^err403\.htm$ - [F,L]

This says, "redirect PortalBSpider requests for all files that are not named err403.htm"

-
You could also use:
RewriteRule !^err40[34]\.htm$ - [F,L]
to include allowing your err404 file, if you so desired.

I'm not absolutely sure about the path to your custom 403 page. If the above doesn't work, try

RewriteRule !^/err403\.htm$ - [F,L] or
RewriteRule !err403\.htm$ - [F,L]

You could also add a RewriteCond %{REQUEST_URI} !err403.htm$ instead of doing the err403.htm file exception in the RewriteRule itself, but since there are likely many RewriteCond's [OR]'ed together above the one you've shown, the and/or logic of the conditions is easy to confuse if you're not careful.

Clicking "Submit" and hoping there are no typos again!... Hope this helps,
Jim

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 687 posted 9:36 pm on Oct 1, 2002 (gmt 0)

One way around this would be something like this. I use this to OK ANY requests (no matter who) for robots.txt:

RewriteRule /robots.txt$ - [NC,L]

Make sure that is the TOP of your http.conf container.

Anyway, I assume you could also add something like:

RewriteRule /path/to/403.html$ - [NC,L]

to precent a loop...

dave

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 10:11 pm on Oct 1, 2002 (gmt 0)

Yes,
For those who haven't seen it before, the "-" means "do nothing", [NC] means "NoCase - Ignore uppercase/lowercase", and the [L] means "Last - If this rule matches, then don't process any more rewrites." So this is a good way to bypass all following rewrite rules for specific cases.

So now we have three methods for getting around the problem shown here. :)

Jim

bull

10+ Year Member



 
Msg#: 687 posted 5:38 am on Oct 2, 2002 (gmt 0)

dave, jim -- thanks!

RewriteRule !^err403\.htm$ - [F,L] works pefectly

have a nice day,
jan

Annii

10+ Year Member



 
Msg#: 687 posted 8:49 pm on Oct 4, 2002 (gmt 0)

Please can one of you guys who understands these things confirm that I have this in the right order?

I don't really understand jow each different rule works and so want to be sure that my rules don't conflict with each other in any way

First Part
----------
ErrorDocument 404 /404.htm
ErrorDocument 400 /404.htm
ErrorDocument 403 /404.htm
ErrorDocument 501 /404.htm
ErrorDocument 502 /404.htm
ErrorDocument 503 /404.htm

2nd Part
--------
<FilesMatch "htm([l])*$">
ForceType application/x-httpd-php
</FilesMatch>

3rd Part
--------
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

I've never used the rewrite part (ie part 3 above) before and am worried that if I implement it it could interact badly with parts 1 and 2?

Thanks
Anni

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 9:09 pm on Oct 4, 2002 (gmt 0)

Annii,

As you have this set up now, your custom 403 page cannot be fetched by any banned user-agent, because 404.htm is also blocked. I would recommend creating a separate custom 403 page called 404.html, changing your ErrorDocument directive to

ErrorDocument 403 /403.htm

and changing the last 5 lines of your rewrite ruleset to:

RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org
RewriteRule !^403.htm$ - [F,L]

This blocks the user-agents and referers you have listed from accessing you site, except that it allows them to fetch 403.htm when redirected there by the RewriteRule and your ErrorDocument directive.

Also, I don't really see any point in having a custom 400 error document, since malformed requests are unlikely to come from anything or anyone that you want to help (robots won't follow a 400-series error redirect anyway).

I think you can write the FilesMatch pattern like this:
<FilesMatch "html?$">
which will work better, unless you want to accept requests for filetypes of .htmll, htmlll, htmllll, etc. The question mark just means "match 0 or 1 of the preceding character" - in your case, match "htm" or "html".

Other than the above, your .htaccess looks fine to me.

Jim

58sniper

10+ Year Member



 
Msg#: 687 posted 10:18 pm on Oct 4, 2002 (gmt 0)
Okay. I've streamlined my list as much as I can. If anyone spots any errors, can you point them out:

ErrorDocument 401 /error.php?eid=401
ErrorDocument 403 /error.php?eid=403
ErrorDocument 404 /error.php?eid=404
ErrorDocument 500 /error.php?eid=500

< i have a whole bunch of RedirectPermanent lines here >

RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://(www\.)?macombsheriff.com.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(dev\.)?macombsheriff.com.*$ [NC]
RewriteRule \.(gif¦jpg¦zip¦pdf)$ http://www.runningwolf.com/dev/apology.gif [R,L]

# RewriteCond %{HTTP_USER_AGENT} ^Mozilla* [OR]
RewriteCond %{HTTP_USER_AGENT} almaden [OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [OR]
RewriteCond %{HTTP_USER_AGENT} ^BackWeb [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bandit [OR]
RewriteCond %{HTTP_USER_AGENT} ^BatchFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Buddy [OR]
RewriteCond %{HTTP_USER_AGENT} ^bumblebee [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^CICC [OR]
RewriteCond %{HTTP_USER_AGENT} ^Collector [OR]
RewriteCond %{HTTP_USER_AGENT} copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Drip [OR]
RewriteCond %{HTTP_USER_AGENT} ^DSurf15a [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EasyDL/2.99 [OR]
RewriteCond %{HTTP_USER_AGENT} email [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FileHound [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetSmart [OR]
RewriteCond %{HTTP_USER_AGENT} ^gigabaz [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^gotit [OR]
RewriteCond %{HTTP_USER_AGENT} grab [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^grub-client [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^httpdown [OR]
RewriteCond %{HTTP_USER_AGENT} .*httrack.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy*Library [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetLinkagent [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Iria [OR]
RewriteCond %{HTTP_USER_AGENT} ^JBH*agent [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JustView [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^lftp [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link*Sleuth [OR]
RewriteCond %{HTTP_USER_AGENT} ^likse [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mag-Net [OR]
RewriteCond %{HTTP_USER_AGENT} ^Magnet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla*MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSProxy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetMechanic [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} offline [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^PingALink [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pockey [OR]
RewriteCond %{HTTP_USER_AGENT} ^Pump [OR]
RewriteCond %{HTTP_USER_AGENT} ^Reaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Recorder [OR]
RewriteCond %{HTTP_USER_AGENT} ^QRVA [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Scooter [OR]
RewriteCond %{HTTP_USER_AGENT} ^Seeker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SlySearch [OR]
RewriteCond %{HTTP_USER_AGENT} ^Snake [OR]
RewriteCond %{HTTP_USER_AGENT} ^SpaceBison [OR]
RewriteCond %{HTTP_USER_AGENT} spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Szukacz [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vacuum [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebHook [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMiner [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebMirror [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Whacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ /robots.php [L]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 10:45 pm on Oct 4, 2002 (gmt 0)

All,

I think it would be beneficial to this forum to eliminate the long lists of User-agents in posts - just include the first two and last two, for example, when discussing structural issues. If the subject returns to a comprehensive listing we can include them all, but for structural issues, it just wastes space and makes the thread hard to follow...

Including these long list of User-agents is beginning to approach a code review, and we're not supposed to do those here. Also, we're supposed to refrain from including valid URLs for our sites.

Please don't get me wrong - I just don't want to see this thread - which I have found to be quite useful - closed due to violations of the WebmasterWorld TOS.

Thanks,
Jim

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 687 posted 10:49 pm on Oct 4, 2002 (gmt 0)

58sniper,

But since it's already there...

That first UA you've got commented out may have been meant to block "^Mozzilla*" - a misspelled and bogus user-agent. Blocking the "two z's" version is a good idea, blocking the common version certainly isn't! :)

Jim

This 243 message thread spans 9 pages: < < 243 ( 1 2 3 [4] 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved