Welcome to WebmasterWorld Guest from 54.224.96.57

Forum Moderators: coopster & jatar k & phranque

Message Too Old, No Replies

A Close to perfect .htaccess ban list

     
3:30 am on Oct 23, 2001 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2000
posts:1786
votes: 0


Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

4:45 am on June 29, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:May 21, 2002
posts:50
votes: 0


>richlowe asked
>Anyone know how to do this with IIS?

I test for these agents in global.asa in the Session_OnStart event and send them to an explanation page that has no links it can follow.

Then I use a browscap.ini file that you can get from my website that has a special section for website strippers and other nasties.

You can get this browscap.ini file and soon some sample code from my personal website.

11:30 pm on Aug 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 31, 2002
posts:43
votes: 0


Hi everyone!

Is there anything similar to a .htaccess file for non-Apache NT/Win2K servers (IIS)? Please note that I am NOT a server expert, so if I say something stupid, please forgive me!

Thanks in advance,
Snark

11:39 pm on Aug 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:May 21, 2002
posts:50
votes: 0


Welcome snark. There is no file per se like .htaccess but if you have access to the server you can ban people by IP via the IIS manager. Alternately you can try asking your host to use my browscap.ini file and then you can follow the example code on my website - the URL is in my profile - to ban them.
11:58 pm on Aug 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 31, 2002
posts:43
votes: 0


Hi Pushycat,

Oh, this is wonderful. I've just been to your web site. I don't think that blocking by I.P. would work, since I want to block some e-mail harvesters. I suppose I could keep an eye on their I.P. address and if it's always the same, then block it in IIS. But I'm definitely going to look into the browscap.ini on your site and go from there. What a relief !

Thanks again,
Snark

6:14 am on Aug 19, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 29, 2002
posts:1819
votes: 0


This is a great thread thanks !

Before I add this to my .htaccess I want to make sure this last entry from SuperMan is valid and that there are no valid search engine robots among them.

I am most interested in stopping any bad bots and email harvesters.

Does this list stop most of the major players ? Does it also stop that atomic energy iaea ? I need to also check that it does not stop any potential search engines including Alexa.

[1]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]

11:21 am on Sept 6, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 29, 2001
posts:1089
votes: 21


I've been monitoring this tread for some time now and, as user-agents deserving to be blocked are introduced, I block them.

I noticed today that 95.6% of my visitors are using Explorer and Netscape and the google bot consumes about 2% (total 97.6%). I'm beginning to wonder if it would be easyier to only allow folks whom are using selected browsers to visit my site instead of trying to block all the undesired ones. Maybe I would redirect the unacceptable browsers users to a page telling them I only support Explorer and Netscape.

Thoughts on this?

11:59 am on Sept 6, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


That´s a great idea.

  • But then you should only support the latest versions of those browsers.
  • Do not support browsers running on Windows boxes since it is insecure and people shouldn´t be using it anyway.
  • On second thought do not even support MacOS since those Apple guys want to make money and that´s not something that ought to be supported.
  • You might consider banning Linux users to since there might be some securety issues as well.
  • Mozilla? I don´t think so. Too heavy, needs fast computers with lots of ram which consume lots of energy. We cannot have that in the US anymore with Kyoto and all.
  • Lynx? Yeah that´s ok, however, ...

I don´t think that´s a good idea. I do unterstand the need to ban those email harvesters and offline browsers, but allowing only known browsers is not the way to go.

12:22 pm on Sept 9, 2002 (gmt 0)

New User

10+ Year Member

joined:July 24, 2002
posts:22
votes: 0


hi everyone, superman in particular :)

could someone please provide a nice htaccess list and lets say
update it here every on ore two month?

by the way - there are two other bots im concerned about -
one is called turnitinbot from turnitin.com
and the other one was also from one of these brand control bots -
i see them showing up more and more -
shouldnt we include them as well?

3:37 pm on Sept 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


So, in short: Do I just plonk this in my .htaccess file?

[1]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]

Nick

5:06 pm on Sept 9, 2002 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14096
votes: 170


I'd like to know too. Do we just copy and paste that code in?
6:27 pm on Sept 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, you can cut 'n paste that code into your .htaccess file - at your own risk.

A couple of points first, though...

The [1] at the beginning of the first line is spurious, and should be removed, leaving the line reading:

RewriteEngine On

As it stands, this example code will generate a 403-Forbidden response. You can also configure it to respond with other error codes, or with permanent or temporary redirects to other pages on your own site or elsewhere. I strongly encourage you to read the documentation [httpd.apache.org] on mod_rewrite, whether you plan to "tweak" this example code or not. As states in the documentation, mod-rewrite is a poweful tool; and as such, it is also a dnagerous tool. Some time spend "reading the fine manual" may save you a lot of grief in the future.

By changing the final line to:
RewriteRule ^.* - [F,L]
you can minimize interactions with following rewrite rulesets, and also minimize CPU overhead for processing. The "L" tells mod-rewrite that this is the last rule that needs to be processed in this case, and to stop rewriting as soon as it is processed.

You can customize the 403-Forbidden page returned to the bad-bot (keeping in mind that at some time, as you modify this, you might introduce an error and catch an innocent person instead) to explain what happened and what to do about it. To do this, add:
ErrorDocument 403 /my403.html
at the beginning of the example code, and then create a custom 403 error page (called "my403.html" in this example.)

All RewriteCond's in this example are case-sensitive. This leaves it open to a few more errors as you maintain the file. To make the pattern-match case-insensitive, change the [OR] flag at the end of each line to [NC,OR]. Note also that the [OR] must not be included on the very last RewriteCond - the one directly preceding the RewriteRule. If it is, you'll lock up your server, and you and your users will get 500-Server Error responses to all requests. (After changing anything in your .htaccess file, it's a very good idea to access your own site, and make sure it still works!)

All RewritesCond's in this example assume that the user-agent starts with the pattern of characters shown (That's what the "^" character means). Some user-agent strings do not start with the "bad-bot" user agent string; they start with something common like "Mozilla/3.01" and then contain the bad-bot identification further on in the string. To catch these guys, you will need to remove the starting text anchor "^" from the pattern match string. This makes the pattern matching less efficient, and should only be done if necessary.

Here's one example that I know needs to be changed:
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]

Note that I removed the starting "^", so that it will ban any user-agent with "Indy Library" anywhere in its user-agent string, and that I will accept any character - including a space - after "Indy".

Again - Yes, you can cut 'n paste this into your .htaccess file - at your own risk. I recommend that you minimize this risk by reading the mod_rewrite documentation.

Hope this helps,
Jim

6:38 pm on Sept 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member nick_w is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 4, 2002
posts:5044
votes: 0


Helps a lot!

Thanks for taking the time to go through that with us Jim ;-)

Nick

9:03 pm on Sept 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Forgot something, though...

I've posted this before, but just in case:

mod-rewrite (and many related Apache modules) depend on "regular expressions" for pattern-matching. You can find a short and useful tutorial here [etext.lib.virginia.edu] on the University of Virginia Library Web site.

This is a big help in figuring out ^(what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean¦how\ to\ write\ them\ correctly)\.$
;)

Jim

9:22 pm on Sept 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


This is a big help in figuring out ^(what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean¦how\ to\ write\ them\ correctly)\.$

Shouldn´t it read: This is a big help in figuring out (?:(?:^.*what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean.*how\ to\ write\ them\ correctly)¦(?:^.*how\ to\ write\ them\ correctly.*what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean))\.$

12:41 am on Sept 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


Some of you may wonder about the performance impact of having such a large .htaccess file. I did some simple benchmark tests.

Setup

  • System

    Server: Apache/1.3.26 (Unix) mod_ssl/2.8.10
    OpenSSL/0.9.6d PHP/4.2.1

    Linux version 2.2.19-7.0.16
    Detected 467741 kHz processor.
    Memory: 257496k/262080k available (1076k kernel code,
    416k reserved, 3020k data, 72k init, 0k bigmem)
    128K L2 cache (4 way)
    CPU: L2 Cache: 128K
    CPU: Intel Celeron (Mendocino) stepping 05

  • benchmark script
    #!/usr/bin/perl 

    use LWP::UserAgent;
    use LWP::Simple;
    use Time::HiRes qw(gettimeofday);

    $url = "http://server/root/test.html";
    foreach $agent (qw(BlackWidow Zeus AaronCarter)) {
    for($j=0;$j<10;$j++) {
    $ua = new LWP::UserAgent;
    $ua->agent($agent);
    $t0 = gettimeofday;

    # Request document and parse it as it arrives
    for(my $i=1;$i < 100;$i++) {
    $res = $ua->request(HTTP::Request->new(GET => $url),
    sub { });
    }
    $t{$agent} += gettimeofday-$t0;
    }
    $t{$agent} = $t{$agent}/($j+1);
    }

    print map { $_,' needed ', $t{$_}, ' seconds.', "\n"} sort keys %t;

  • stress script
    #!/usr/bin/perl 

    use LWP::UserAgent;

    $url = "http://server/www.pension-schafspelz.de/";
    $ua = new LWP::UserAgent;

    # Request document and parse it as it arrives
    while(true) {
    $res = $ua->request(HTTP::Request->new(GET => $url),
    sub { });
    }

  • .htacces with single RewriteCond directive
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} BlackWidow¦Bot\ mailto:craftbot@yahoo.com¦ChinaClaw¦DISCo¦Download\ Demon¦eCatch¦EirGrabber¦EmailSiphon¦Express\ WebPictures¦ExtractorPro¦EyeNetIE¦FlashGet¦GetRight¦Go!Zilla¦ Go-Ahead-Got-It¦GrabNet¦Grafula¦HMView¦HTTrack¦Image\ Stripper¦Image\ Sucker¦InterGET¦Internet\ Ninja¦JetCar¦JOC\ Web\ Spider¦larbin¦LeechFTP¦Mass\ Downloader¦MIDown\ tool¦Mister\ PiX¦Navroad¦NearSite¦NetAnts¦NetSpider¦Net\ Vampire¦NetZIP¦Octopus¦Offline\ Explorer¦Offline\ Navigator¦PageGrabber¦Papa\ Foto¦pcBrowser¦RealDownload¦ReGet¦Siphon¦SiteSnagger¦SmartDownload¦ SuperBot¦SuperHTTP¦Surfbot¦tAkeOut¦Teleport\ Pro¦VoidEYE¦Web\ Image\ Collector¦Web\ Sucker¦WebAuto¦WebCopier¦WebFetch¦WebReaper¦WebSauger¦Website\ eXtractor¦WebStripper¦WebWhacker¦WebZIP¦Wget¦Widow¦Xaldon\ WebSpider¦Zeus
    RewriteRule .* - [F,L]
    spaces added above to stop sidescroll - jk

  • .htaccess with multiple RewriteCond directives

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [OR]
    RewriteCond %{HTTP_USER_AGENT} ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
    RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} Image\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [OR]
    RewriteCond %{HTTP_USER_AGENT} larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} NetZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [OR]
    RewriteCond %{HTTP_USER_AGENT} PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} Siphon [OR]
    RewriteCond %{HTTP_USER_AGENT} SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [OR]
    RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} Zeus
    RewriteRule .* - [F,L]

Results

  • htaccess, multiple RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.03904145414179 seconds.
    BlackWidow needed 1.89269917661493 seconds.
    Zeus needed 1.90201771259308 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.05427220734683 seconds.
    BlackWidow needed 1.90449017828161 seconds.
    Zeus needed 1.91795318776911 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.04453534429724 seconds.
    BlackWidow needed 1.89828474955125 seconds.
    Zeus needed 1.90684572133151 seconds.
    [li][b]httpd.conf, multiple RewriteCond, idle server[/b]
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.48258856209842 seconds.
    BlackWidow needed 1.41852938045155 seconds.
    Zeus needed 1.4474944526499 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.47204467383298 seconds.
    BlackWidow needed 1.40937690301375 seconds.
    Zeus needed 1.42638698491183 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.49393899874254 seconds.
    BlackWidow needed 1.42769226160916 seconds.
    Zeus needed 1.44513262401928 seconds.

  • .htaccess, single RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.77475028688257 seconds.
    BlackWidow needed 1.69021615115079 seconds.
    Zeus needed 1.59830655834892 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.76538353616541 seconds.
    BlackWidow needed 1.68273590911518 seconds.
    Zeus needed 1.5909228108146 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.77456200122833 seconds.
    BlackWidow needed 1.69423974644054 seconds.
    Zeus needed 1.60414087772369 seconds.

  • httpd.conf, single RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.50137218562039 seconds.
    BlackWidow needed 1.43611000884663 seconds.
    Zeus needed 1.45189526948062 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.48307919502258 seconds.
    BlackWidow needed 1.41925389116461 seconds.
    Zeus needed 1.43655191768299 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.49346754767678 seconds.
    BlackWidow needed 1.43283208933744 seconds.
    Zeus needed 1.44978855956684 seconds.

  • .htaccess, multiple RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 6.25990908796137 seconds.
    BlackWidow needed 5.3813636302948 seconds.
    Zeus needed 5.76372727480802 seconds.

  • httpd.conf, multiple RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 5.58627272735943 seconds.
    BlackWidow needed 5.23381818424572 seconds.
    Zeus needed 5.10827272588556 seconds.

  • .htaccess, single RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 6.03227272900668 seconds.
    BlackWidow needed 5.1229090907357 seconds.
    Zeus needed 5.46418181332675 seconds.

  • httpd.conf, single RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 5.32499999349768 seconds.
    BlackWidow needed 4.55199999159033 seconds.
    Zeus needed 5.22927273403515 seconds.

Conclusion

  • Use a single RewriteCond directive in your .htaccess files.
  • Use multiple RewriteCond directives in your httpd.conf file.

.htaccess files have to be read each and every time a request is made. It takes longer to parse and compile the multiple RewriteCond directives than the one with the longer, single regular expression. The parsing the is the bottle neck in this scenario.

The httpd.conf file is read once the server starts up. The regular expressions are compiled once. Multiple short and simple REs execute faster than the one single, complex one. Execution time is the factor that matters most in this case.

[edited by: jatar_k at 4:35 pm (utc) on Sep. 22, 2002]
[edit reason] stopped side scroll [/edit]

1:06 am on Sept 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 21, 2001
posts:2489
votes: 0


This has been one hell of a read, i've got my copy of .htaccess file and will drop it on the server, for my next site. Impressive.

Who needs books !

1:08 am on Sept 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


andreasfriedrich,

Excellent data...

I wonder what the result would be performance-wise with four RewriteCond lines: One each for start-anchored, end-anchored, fully-anchored, and unanchored pattern strings. I use this method to keep the patterns organized by type, and to keep them neat and easy to maintain.

Thanks for the test - Very useful.

Jim

1:22 am on Sept 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 10, 2001
posts:1551
votes: 10


Interesting comparison Andreas, thanks!

If I'm reading your code correctly, then you're demonstrating two things:

a) The response time difference between the two .htaccess files makes roughly 10%.

b) Each call takes between 0.002 and 0.006 seconds, depending on the load of the machine.

Combining those two, we're talking about an average additional overhead caused by multiple RewriteConds of 0.0004 seconds (1/2500 seconds) per request, on a somewhat aged machine.

Since you're always fetching the same file from the disk cache, the remaining server side overhead is very low (in contrast to a real life situation, where each request may cause a different set of files to be loaded from disk first), so we can assume that the difference is indeed caused by the different rule sets. The comments in your code talk about parsing the HTML, but I don't understand enough Perl to see if that really happens.

In summary, I'd prefer the maintenance friendly multiple-rule version any time, if it only costs me such a small price in terms of response time. Of course, I'm not serving millions of requests per day, so your mileage may vary.

1:37 am on Sept 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


comments in your code talk about parsing the HTML, but I don't understand enough Perl to see if that really happens.

No, no parsing going on since its irrelevant for the benchmarking.

on a somewhat aged machine

Don´t insult my trusted old linux box. I cannot guarantie for any DoS attacks it will launch on its own ;)

Since you're always fetching the same file from the disk cache, the remaining server side overhead is very low [...], so we can assume that the difference is indeed caused by the different rule sets

That was the idea behind the admittedly artificial setup.

I'd prefer the maintenance friendly multiple-rule version any time

As is quite often the case, it´s a tradeoff between speed and maintainability.

9:36 am on Sept 11, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 11, 2002
posts:2024
votes: 0


Hi,

I followed this thread for quite some time since our server gets harassed by those "evil bots" as well. However, I couldn't quite decide to take action until the posting of jdMorgan which was almost a how-to.

However, having done this I run into the first trouble because my Apache 1.3 says:

Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden

Simply adding a "FollowSymLinks on" on top of the htaccess doesn't work.

Any advice?

On a sidenote: I found some packages like "Sugerplum" and "robotcop" which promise to automize some of the functions intended by this htaccess. Any expereinces with these?

10:09 am on Sept 11, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


The correct directive to enable the FollowSymLinks feature in your .htaccess file would be
Options [httpd.apache.org] +FollowSymLinks

For that to works you need to have at least

AllowOverride [httpd.apache.org] Options
privileges. Those are set in the server config, virtual host context.
3:08 pm on Sept 12, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 26, 2001
posts:1076
votes: 0


i've added the .htaccess list to a new site i've just built, but i've set it to redirect bad bots to a php page that lists all the email addresses of companies that have sent me spam. hopefully the email harvesters will pick up all those email addresses and add the spammers to other spam lists. maybe they'll end up spamming each other into submission?

when spam comes in, i check the actual company site to get their genuine email addresses then manually add their email addresses to a mysql database. this means only genuine mailboxes get listed on the page and not the yahoo or hotmail addresses the spam is often sent from.

i've also added a simple browser check to the php page so that if an IE / Netscape / Opera user visits the page they will only see a normal forbidden message .... well, that's the theory, but i've not been able to test it yet. just need a "browser" or something that will let me set the UA to whatever i want .....

11:56 pm on Sept 12, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 31, 2002
posts:43
votes: 0


This question is for Pushycat:

Thanks for the browscap.ini & sample IIS code, which I got from your site! I want to make sure I understand their use. I implement the browscap.ini (or whatever parts of it I want), and then I implement the code in global.asa for each robot I want to ban? Or just the one block of code gets revised to include each robot to be banned? (Or -- is every robot in the browscap.ini banned, so I should only include those which I wish to ban?) My partner is much more of a web programmer than I am and would take care of this, but I want to make sure I understand what needs to be done first!

Thanks a lot,
Snark

10:50 am on Sept 13, 2002 (gmt 0)

New User

10+ Year Member

joined:July 3, 2002
posts:39
votes: 0


Hi all,
Thanks so much for the info in this thread,
I use .htaccess and have edited the list here a bit. I want to include it in my existing .htaccess file which has a couple of extra rules in it. Will the following work ok? Is it in the right order etc?

ErrorDocument 404 /404.htm
ErrorDocument 400 /404.htm
ErrorDocument 403 /404.htm
ErrorDocument 501 /404.htm
ErrorDocument 502 /404.htm
ErrorDocument 503 /404.htm

<FilesMatch "htm([l])*$">
ForceType application/x-httpd-php
</FilesMatch>

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Thanks

Anni

Arcie

2:29 am on Sept 15, 2002 (gmt 0)

Inactive Member
Account Expired

 
 


Hi Toolman nice compilation of nasty bots! Have you tried sticking the re-writer in httpd.conf? It would run fastest there, although you noted that there was no noticeable speed difference as it is.

First post!

Since I run a virtual server with a number of different domains, it seems to me it would make more sense to put my list of forbidden UAs in the httpd.conf file, rather than try to replicate them in .htaccess on each domain's document root. Are there any caveats or special directions I should follow before I proceed?

Thanks!

Randy

[edited by: jatar_k at 12:04 am (utc) on Sep. 16, 2002]
[edit reason] no sigs please [/edit]

11:39 am on Sept 15, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 22, 2002
posts:1782
votes: 0


Hi Randy and welcome to Webmaster World [webmasterworld.com].

As shown in post #77 [webmasterworld.com] putting your RewriteRules in httpd.conf is indeed faster and the way to go when you have access to it.

However, this will not solve the problem of applying those rules to all virtual servers. You cannot just put the rewriting code in the main section and expect it to work for all virtual servers. For an explanation on this see API Phases [httpd.apache.org] in the mod_rewrite URL Rewriting Engine documentation.

So, after [...] Apache has determined the corresponding server (or virtual server) the rewriting engine starts processing of all mod_rewrite directives from the per-server configuration in the URL-to-filename phase.

my emphasis

There´s also a thread How (and Where) best to control access [webmasterworld.com] that you might want to read on this topic. If you have mod_perl you might want to use the solution mentioned in this thread. Ask carfac [webmasterworld.com] for the modified version of BlockAgent.

And as a sidenote. Do not drop any URLs. Do not use a signature.

1:52 am on Sept 16, 2002 (gmt 0)

New User

joined:Sept 16, 2002
posts:39
votes: 0


Hi,
First, thank you for having this great place, where I was I able to learn more last two weeks then in 6 month since I decided to have my first site.
I am not yet very familiar with .httaccess and when I try to modify it, it always give me an error.
There is text in it that was left there by my host:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

DirectoryIndex index.html index.htm index.php index.phtml index.php3

# AddType application/x-httpd-php .phtml
# AddType application/x-httpd-php .php3
# AddType application/x-httpd-php .php
#
# Action application/x-httpd-php "/php/php.exe"
# Action application/x-httpd-php-source "/php/php.exe"
# AddType application/x-httpd-php-source .phps

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName www.XXXXXX.com
AuthUserFile /www/XXXXXX/_vti_pvt/service.pwd
AuthGroupFile /www/XXXXXX/_vti_pvt/service.grp

Should I remove this before pasting bans or simply add?

Thank you

1:07 pm on Sept 20, 2002 (gmt 0)

New User

10+ Year Member

joined:Sept 20, 2002
posts:28
votes: 0


Dunno if this helps, but I've found this list: [psychedelix.com...]
3:09 pm on Sept 20, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 20, 2002
posts:735
votes: 1


I keep having my stuff stolen (by teachers, not students), and, when I can tell what they used, they downloaded my site using FrontPage.

Can I use this Rewrite stuff to block FrontPage from downloading my site? (I know the educators can still get my stuff from their browser's cache, etc, etc, but it would be nice to make them work at stealing, rather than having it be so easy, ya know?)

Thanks!

[edited by: jatar_k at 4:44 pm (utc) on Mar. 13, 2003]

4:30 pm on Sept 20, 2002 (gmt 0)

New User

10+ Year Member

joined:Sept 20, 2002
posts:28
votes: 0


Hmmm...PHPINFO shows FrontPage 2002 (XP) as
Mozilla/2.0 (compatible; MS FrontPage 5.0)

Dunno if that helps.

This 243 message thread spans 9 pages: 243