homepage Welcome to WebmasterWorld Guest from 54.196.194.204
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 243 message thread spans 9 pages: < < 243 ( 1 2 [3] 4 5 6 7 8 9 > >     
A Close to perfect .htaccess ban list
toolman




msg:441824
 3:30 am on Oct 23, 2001 (gmt 0)

Here's the latest rendition of my favorite ongoing artwork....my beloved .htaccess file. I've become quite fond of my little buddy, the .htaccess file, and I love the power it allows me to exclude vermin, pestoids and undesirable entities from my web sites

Gorufu, littleman, Air, SugarKane? You guys see any errors or better ways to do this....anybody got a bot to add....before I stick this in every site I manage.

Feel free to use this on your own site and start blocking bots too.

(the top part is left out)

<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

 

Pushycat




msg:441884
 4:45 am on Jun 29, 2002 (gmt 0)

>richlowe asked
>Anyone know how to do this with IIS?

I test for these agents in global.asa in the Session_OnStart event and send them to an explanation page that has no links it can follow.

Then I use a browscap.ini file that you can get from my website that has a special section for website strippers and other nasties.

You can get this browscap.ini file and soon some sample code from my personal website.

snark




msg:441885
 11:30 pm on Aug 17, 2002 (gmt 0)

Hi everyone!

Is there anything similar to a .htaccess file for non-Apache NT/Win2K servers (IIS)? Please note that I am NOT a server expert, so if I say something stupid, please forgive me!

Thanks in advance,
Snark

Pushycat




msg:441886
 11:39 pm on Aug 17, 2002 (gmt 0)

Welcome snark. There is no file per se like .htaccess but if you have access to the server you can ban people by IP via the IIS manager. Alternately you can try asking your host to use my browscap.ini file and then you can follow the example code on my website - the URL is in my profile - to ban them.

snark




msg:441887
 11:58 pm on Aug 17, 2002 (gmt 0)

Hi Pushycat,

Oh, this is wonderful. I've just been to your web site. I don't think that blocking by I.P. would work, since I want to block some e-mail harvesters. I suppose I could keep an eye on their I.P. address and if it's always the same, then block it in IIS. But I'm definitely going to look into the browscap.ini on your site and go from there. What a relief !

Thanks again,
Snark

Visit Thailand




msg:441888
 6:14 am on Aug 19, 2002 (gmt 0)

This is a great thread thanks !

Before I add this to my .htaccess I want to make sure this last entry from SuperMan is valid and that there are no valid search engine robots among them.

I am most interested in stopping any bad bots and email harvesters.

Does this list stop most of the major players ? Does it also stop that atomic energy iaea ? I need to also check that it does not stop any potential search engines including Alexa.

[1]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]

Edge




msg:441889
 11:21 am on Sep 6, 2002 (gmt 0)

I've been monitoring this tread for some time now and, as user-agents deserving to be blocked are introduced, I block them.

I noticed today that 95.6% of my visitors are using Explorer and Netscape and the google bot consumes about 2% (total 97.6%). I'm beginning to wonder if it would be easyier to only allow folks whom are using selected browsers to visit my site instead of trying to block all the undesired ones. Maybe I would redirect the unacceptable browsers users to a page telling them I only support Explorer and Netscape.

Thoughts on this?

andreasfriedrich




msg:441890
 11:59 am on Sep 6, 2002 (gmt 0)

That´s a great idea.

  • But then you should only support the latest versions of those browsers.
  • Do not support browsers running on Windows boxes since it is insecure and people shouldn´t be using it anyway.
  • On second thought do not even support MacOS since those Apple guys want to make money and that´s not something that ought to be supported.
  • You might consider banning Linux users to since there might be some securety issues as well.
  • Mozilla? I don´t think so. Too heavy, needs fast computers with lots of ram which consume lots of energy. We cannot have that in the US anymore with Kyoto and all.
  • Lynx? Yeah that´s ok, however, ...

I don´t think that´s a good idea. I do unterstand the need to ban those email harvesters and offline browsers, but allowing only known browsers is not the way to go.

zooros




msg:441891
 12:22 pm on Sep 9, 2002 (gmt 0)

hi everyone, superman in particular :)

could someone please provide a nice htaccess list and lets say
update it here every on ore two month?

by the way - there are two other bots im concerned about -
one is called turnitinbot from turnitin.com
and the other one was also from one of these brand control bots -
i see them showing up more and more -
shouldnt we include them as well?

Nick_W




msg:441892
 3:37 pm on Sep 9, 2002 (gmt 0)

So, in short: Do I just plonk this in my .htaccess file?

[1]RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F]

Nick

martinibuster




msg:441893
 5:06 pm on Sep 9, 2002 (gmt 0)

I'd like to know too. Do we just copy and paste that code in?

jdMorgan




msg:441894
 6:27 pm on Sep 9, 2002 (gmt 0)

Yes, you can cut 'n paste that code into your .htaccess file - at your own risk.

A couple of points first, though...

The [1] at the beginning of the first line is spurious, and should be removed, leaving the line reading:

RewriteEngine On

As it stands, this example code will generate a 403-Forbidden response. You can also configure it to respond with other error codes, or with permanent or temporary redirects to other pages on your own site or elsewhere. I strongly encourage you to read the documentation [httpd.apache.org] on mod_rewrite, whether you plan to "tweak" this example code or not. As states in the documentation, mod-rewrite is a poweful tool; and as such, it is also a dnagerous tool. Some time spend "reading the fine manual" may save you a lot of grief in the future.

By changing the final line to:
RewriteRule ^.* - [F,L]
you can minimize interactions with following rewrite rulesets, and also minimize CPU overhead for processing. The "L" tells mod-rewrite that this is the last rule that needs to be processed in this case, and to stop rewriting as soon as it is processed.

You can customize the 403-Forbidden page returned to the bad-bot (keeping in mind that at some time, as you modify this, you might introduce an error and catch an innocent person instead) to explain what happened and what to do about it. To do this, add:
ErrorDocument 403 /my403.html
at the beginning of the example code, and then create a custom 403 error page (called "my403.html" in this example.)

All RewriteCond's in this example are case-sensitive. This leaves it open to a few more errors as you maintain the file. To make the pattern-match case-insensitive, change the [OR] flag at the end of each line to [NC,OR]. Note also that the [OR] must not be included on the very last RewriteCond - the one directly preceding the RewriteRule. If it is, you'll lock up your server, and you and your users will get 500-Server Error responses to all requests. (After changing anything in your .htaccess file, it's a very good idea to access your own site, and make sure it still works!)

All RewritesCond's in this example assume that the user-agent starts with the pattern of characters shown (That's what the "^" character means). Some user-agent strings do not start with the "bad-bot" user agent string; they start with something common like "Mozilla/3.01" and then contain the bad-bot identification further on in the string. To catch these guys, you will need to remove the starting text anchor "^" from the pattern match string. This makes the pattern matching less efficient, and should only be done if necessary.

Here's one example that I know needs to be changed:
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]

Note that I removed the starting "^", so that it will ban any user-agent with "Indy Library" anywhere in its user-agent string, and that I will accept any character - including a space - after "Indy".

Again - Yes, you can cut 'n paste this into your .htaccess file - at your own risk. I recommend that you minimize this risk by reading the mod_rewrite documentation.

Hope this helps,
Jim

Nick_W




msg:441895
 6:38 pm on Sep 9, 2002 (gmt 0)

Helps a lot!

Thanks for taking the time to go through that with us Jim ;-)

Nick

jdMorgan




msg:441896
 9:03 pm on Sep 9, 2002 (gmt 0)

Forgot something, though...

I've posted this before, but just in case:

mod-rewrite (and many related Apache modules) depend on "regular expressions" for pattern-matching. You can find a short and useful tutorial here [etext.lib.virginia.edu] on the University of Virginia Library Web site.

This is a big help in figuring out ^(what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean¦how\ to\ write\ them\ correctly)\.$
;)

Jim

andreasfriedrich




msg:441897
 9:22 pm on Sep 9, 2002 (gmt 0)

This is a big help in figuring out ^(what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean¦how\ to\ write\ them\ correctly)\.$

Shouldn´t it read: This is a big help in figuring out (?:(?:^.*what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean.*how\ to\ write\ them\ correctly)¦(?:^.*how\ to\ write\ them\ correctly.*what\ all\ the\ strange\ characters\ in\ mod_rewrite\ directives\ mean))\.$

andreasfriedrich




msg:441898
 12:41 am on Sep 10, 2002 (gmt 0)

Some of you may wonder about the performance impact of having such a large .htaccess file. I did some simple benchmark tests.

Setup

  • System

    Server: Apache/1.3.26 (Unix) mod_ssl/2.8.10
    OpenSSL/0.9.6d PHP/4.2.1

    Linux version 2.2.19-7.0.16
    Detected 467741 kHz processor.
    Memory: 257496k/262080k available (1076k kernel code,
    416k reserved, 3020k data, 72k init, 0k bigmem)
    128K L2 cache (4 way)
    CPU: L2 Cache: 128K
    CPU: Intel Celeron (Mendocino) stepping 05

  • benchmark script
    #!/usr/bin/perl 

    use LWP::UserAgent;
    use LWP::Simple;
    use Time::HiRes qw(gettimeofday);

    $url = "http://server/root/test.html";
    foreach $agent (qw(BlackWidow Zeus AaronCarter)) {
    for($j=0;$j<10;$j++) {
    $ua = new LWP::UserAgent;
    $ua->agent($agent);
    $t0 = gettimeofday;

    # Request document and parse it as it arrives
    for(my $i=1;$i < 100;$i++) {
    $res = $ua->request(HTTP::Request->new(GET => $url),
    sub { });
    }
    $t{$agent} += gettimeofday-$t0;
    }
    $t{$agent} = $t{$agent}/($j+1);
    }

    print map { $_,' needed ', $t{$_}, ' seconds.', "\n"} sort keys %t;

  • stress script
    #!/usr/bin/perl 

    use LWP::UserAgent;

    $url = "http://server/www.pension-schafspelz.de/";
    $ua = new LWP::UserAgent;

    # Request document and parse it as it arrives
    while(true) {
    $res = $ua->request(HTTP::Request->new(GET => $url),
    sub { });
    }

  • .htacces with single RewriteCond directive
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} BlackWidow¦Bot\ mailto:craftbot@yahoo.com¦ChinaClaw¦DISCo¦Download\ Demon¦eCatch¦EirGrabber¦EmailSiphon¦Express\ WebPictures¦ExtractorPro¦EyeNetIE¦FlashGet¦GetRight¦Go!Zilla¦ Go-Ahead-Got-It¦GrabNet¦Grafula¦HMView¦HTTrack¦Image\ Stripper¦Image\ Sucker¦InterGET¦Internet\ Ninja¦JetCar¦JOC\ Web\ Spider¦larbin¦LeechFTP¦Mass\ Downloader¦MIDown\ tool¦Mister\ PiX¦Navroad¦NearSite¦NetAnts¦NetSpider¦Net\ Vampire¦NetZIP¦Octopus¦Offline\ Explorer¦Offline\ Navigator¦PageGrabber¦Papa\ Foto¦pcBrowser¦RealDownload¦ReGet¦Siphon¦SiteSnagger¦SmartDownload¦ SuperBot¦SuperHTTP¦Surfbot¦tAkeOut¦Teleport\ Pro¦VoidEYE¦Web\ Image\ Collector¦Web\ Sucker¦WebAuto¦WebCopier¦WebFetch¦WebReaper¦WebSauger¦Website\ eXtractor¦WebStripper¦WebWhacker¦WebZIP¦Wget¦Widow¦Xaldon\ WebSpider¦Zeus
    RewriteRule .* - [F,L]
    spaces added above to stop sidescroll - jk

  • .htaccess with multiple RewriteCond directives

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} BlackWidow [OR]
    RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:craftbot@yahoo.com [OR]
    RewriteCond %{HTTP_USER_AGENT} ChinaClaw [OR]
    RewriteCond %{HTTP_USER_AGENT} DISCo [OR]
    RewriteCond %{HTTP_USER_AGENT} Download\ Demon [OR]
    RewriteCond %{HTTP_USER_AGENT} eCatch [OR]
    RewriteCond %{HTTP_USER_AGENT} EirGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
    RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures [OR]
    RewriteCond %{HTTP_USER_AGENT} ExtractorPro [OR]
    RewriteCond %{HTTP_USER_AGENT} EyeNetIE [OR]
    RewriteCond %{HTTP_USER_AGENT} FlashGet [OR]
    RewriteCond %{HTTP_USER_AGENT} GetRight [OR]
    RewriteCond %{HTTP_USER_AGENT} Go!Zilla [OR]
    RewriteCond %{HTTP_USER_AGENT} Go-Ahead-Got-It [OR]
    RewriteCond %{HTTP_USER_AGENT} GrabNet [OR]
    RewriteCond %{HTTP_USER_AGENT} Grafula [OR]
    RewriteCond %{HTTP_USER_AGENT} HMView [OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [OR]
    RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [OR]
    RewriteCond %{HTTP_USER_AGENT} Image\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} InterGET [OR]
    RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [OR]
    RewriteCond %{HTTP_USER_AGENT} JetCar [OR]
    RewriteCond %{HTTP_USER_AGENT} JOC\ Web\ Spider [OR]
    RewriteCond %{HTTP_USER_AGENT} larbin [OR]
    RewriteCond %{HTTP_USER_AGENT} LeechFTP [OR]
    RewriteCond %{HTTP_USER_AGENT} Mass\ Downloader [OR]
    RewriteCond %{HTTP_USER_AGENT} MIDown\ tool [OR]
    RewriteCond %{HTTP_USER_AGENT} Mister\ PiX [OR]
    RewriteCond %{HTTP_USER_AGENT} Navroad [OR]
    RewriteCond %{HTTP_USER_AGENT} NearSite [OR]
    RewriteCond %{HTTP_USER_AGENT} NetAnts [OR]
    RewriteCond %{HTTP_USER_AGENT} NetSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} Net\ Vampire [OR]
    RewriteCond %{HTTP_USER_AGENT} NetZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} Octopus [OR]
    RewriteCond %{HTTP_USER_AGENT} Offline\ Explorer [OR]
    RewriteCond %{HTTP_USER_AGENT} Offline\ Navigator [OR]
    RewriteCond %{HTTP_USER_AGENT} PageGrabber [OR]
    RewriteCond %{HTTP_USER_AGENT} Papa\ Foto [OR]
    RewriteCond %{HTTP_USER_AGENT} pcBrowser [OR]
    RewriteCond %{HTTP_USER_AGENT} RealDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} ReGet [OR]
    RewriteCond %{HTTP_USER_AGENT} Siphon [OR]
    RewriteCond %{HTTP_USER_AGENT} SiteSnagger [OR]
    RewriteCond %{HTTP_USER_AGENT} SmartDownload [OR]
    RewriteCond %{HTTP_USER_AGENT} SuperBot [OR]
    RewriteCond %{HTTP_USER_AGENT} SuperHTTP [OR]
    RewriteCond %{HTTP_USER_AGENT} Surfbot [OR]
    RewriteCond %{HTTP_USER_AGENT} tAkeOut [OR]
    RewriteCond %{HTTP_USER_AGENT} Teleport\ Pro [OR]
    RewriteCond %{HTTP_USER_AGENT} VoidEYE [OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Image\ Collector [OR]
    RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [OR]
    RewriteCond %{HTTP_USER_AGENT} WebAuto [OR]
    RewriteCond %{HTTP_USER_AGENT} WebCopier [OR]
    RewriteCond %{HTTP_USER_AGENT} WebFetch [OR]
    RewriteCond %{HTTP_USER_AGENT} WebReaper [OR]
    RewriteCond %{HTTP_USER_AGENT} WebSauger [OR]
    RewriteCond %{HTTP_USER_AGENT} Website\ eXtractor [OR]
    RewriteCond %{HTTP_USER_AGENT} WebStripper [OR]
    RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
    RewriteCond %{HTTP_USER_AGENT} WebZIP [OR]
    RewriteCond %{HTTP_USER_AGENT} Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} Widow [OR]
    RewriteCond %{HTTP_USER_AGENT} Xaldon\ WebSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} Zeus
    RewriteRule .* - [F,L]

Results

  • htaccess, multiple RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.03904145414179 seconds.
    BlackWidow needed 1.89269917661493 seconds.
    Zeus needed 1.90201771259308 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.05427220734683 seconds.
    BlackWidow needed 1.90449017828161 seconds.
    Zeus needed 1.91795318776911 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 2.04453534429724 seconds.
    BlackWidow needed 1.89828474955125 seconds.
    Zeus needed 1.90684572133151 seconds.
    [li][b]httpd.conf, multiple RewriteCond, idle server[/b]
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.48258856209842 seconds.
    BlackWidow needed 1.41852938045155 seconds.
    Zeus needed 1.4474944526499 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.47204467383298 seconds.
    BlackWidow needed 1.40937690301375 seconds.
    Zeus needed 1.42638698491183 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.49393899874254 seconds.
    BlackWidow needed 1.42769226160916 seconds.
    Zeus needed 1.44513262401928 seconds.

  • .htaccess, single RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.77475028688257 seconds.
    BlackWidow needed 1.69021615115079 seconds.
    Zeus needed 1.59830655834892 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.76538353616541 seconds.
    BlackWidow needed 1.68273590911518 seconds.
    Zeus needed 1.5909228108146 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.77456200122833 seconds.
    BlackWidow needed 1.69423974644054 seconds.
    Zeus needed 1.60414087772369 seconds.

  • httpd.conf, single RewriteCond, idle server

    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.50137218562039 seconds.
    BlackWidow needed 1.43611000884663 seconds.
    Zeus needed 1.45189526948062 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.48307919502258 seconds.
    BlackWidow needed 1.41925389116461 seconds.
    Zeus needed 1.43655191768299 seconds.
    [af@server SuchmaschinenTricks]$ ./bm
    AaronCarter needed 1.49346754767678 seconds.
    BlackWidow needed 1.43283208933744 seconds.
    Zeus needed 1.44978855956684 seconds.

  • .htaccess, multiple RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 6.25990908796137 seconds.
    BlackWidow needed 5.3813636302948 seconds.
    Zeus needed 5.76372727480802 seconds.

  • httpd.conf, multiple RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 5.58627272735943 seconds.
    BlackWidow needed 5.23381818424572 seconds.
    Zeus needed 5.10827272588556 seconds.

  • .htaccess, single RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 6.03227272900668 seconds.
    BlackWidow needed 5.1229090907357 seconds.
    Zeus needed 5.46418181332675 seconds.

  • httpd.conf, single RewriteCond, server under stress

    K:\SuchmaschinenTricks>perl bm
    AaronCarter needed 5.32499999349768 seconds.
    BlackWidow needed 4.55199999159033 seconds.
    Zeus needed 5.22927273403515 seconds.

Conclusion

  • Use a single RewriteCond directive in your .htaccess files.
  • Use multiple RewriteCond directives in your httpd.conf file.

.htaccess files have to be read each and every time a request is made. It takes longer to parse and compile the multiple RewriteCond directives than the one with the longer, single regular expression. The parsing the is the bottle neck in this scenario.

The httpd.conf file is read once the server starts up. The regular expressions are compiled once. Multiple short and simple REs execute faster than the one single, complex one. Execution time is the factor that matters most in this case.

[edited by: jatar_k at 4:35 pm (utc) on Sep. 22, 2002]
[edit reason] stopped side scroll [/edit]

caine




msg:441899
 1:06 am on Sep 10, 2002 (gmt 0)

This has been one hell of a read, i've got my copy of .htaccess file and will drop it on the server, for my next site. Impressive.

Who needs books !

jdMorgan




msg:441900
 1:08 am on Sep 10, 2002 (gmt 0)

andreasfriedrich,

Excellent data...

I wonder what the result would be performance-wise with four RewriteCond lines: One each for start-anchored, end-anchored, fully-anchored, and unanchored pattern strings. I use this method to keep the patterns organized by type, and to keep them neat and easy to maintain.

Thanks for the test - Very useful.

Jim

bird




msg:441901
 1:22 am on Sep 10, 2002 (gmt 0)

Interesting comparison Andreas, thanks!

If I'm reading your code correctly, then you're demonstrating two things:

a) The response time difference between the two .htaccess files makes roughly 10%.

b) Each call takes between 0.002 and 0.006 seconds, depending on the load of the machine.

Combining those two, we're talking about an average additional overhead caused by multiple RewriteConds of 0.0004 seconds (1/2500 seconds) per request, on a somewhat aged machine.

Since you're always fetching the same file from the disk cache, the remaining server side overhead is very low (in contrast to a real life situation, where each request may cause a different set of files to be loaded from disk first), so we can assume that the difference is indeed caused by the different rule sets. The comments in your code talk about parsing the HTML, but I don't understand enough Perl to see if that really happens.

In summary, I'd prefer the maintenance friendly multiple-rule version any time, if it only costs me such a small price in terms of response time. Of course, I'm not serving millions of requests per day, so your mileage may vary.

andreasfriedrich




msg:441902
 1:37 am on Sep 10, 2002 (gmt 0)

comments in your code talk about parsing the HTML, but I don't understand enough Perl to see if that really happens.

No, no parsing going on since its irrelevant for the benchmarking.

on a somewhat aged machine

Don´t insult my trusted old linux box. I cannot guarantie for any DoS attacks it will launch on its own ;)

Since you're always fetching the same file from the disk cache, the remaining server side overhead is very low [...], so we can assume that the difference is indeed caused by the different rule sets

That was the idea behind the admittedly artificial setup.

I'd prefer the maintenance friendly multiple-rule version any time

As is quite often the case, it´s a tradeoff between speed and maintainability.

pmkpmk




msg:441903
 9:36 am on Sep 11, 2002 (gmt 0)

Hi,

I followed this thread for quite some time since our server gets harassed by those "evil bots" as well. However, I couldn't quite decide to take action until the posting of jdMorgan which was almost a how-to.

However, having done this I run into the first trouble because my Apache 1.3 says:

Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden

Simply adding a "FollowSymLinks on" on top of the htaccess doesn't work.

Any advice?

On a sidenote: I found some packages like "Sugerplum" and "robotcop" which promise to automize some of the functions intended by this htaccess. Any expereinces with these?

andreasfriedrich




msg:441904
 10:09 am on Sep 11, 2002 (gmt 0)

The correct directive to enable the FollowSymLinks feature in your .htaccess file would be
Options [httpd.apache.org] +FollowSymLinks

For that to works you need to have at least

AllowOverride [httpd.apache.org] Options
privileges. Those are set in the server config, virtual host context.
Crazy_Fool




msg:441905
 3:08 pm on Sep 12, 2002 (gmt 0)

i've added the .htaccess list to a new site i've just built, but i've set it to redirect bad bots to a php page that lists all the email addresses of companies that have sent me spam. hopefully the email harvesters will pick up all those email addresses and add the spammers to other spam lists. maybe they'll end up spamming each other into submission?

when spam comes in, i check the actual company site to get their genuine email addresses then manually add their email addresses to a mysql database. this means only genuine mailboxes get listed on the page and not the yahoo or hotmail addresses the spam is often sent from.

i've also added a simple browser check to the php page so that if an IE / Netscape / Opera user visits the page they will only see a normal forbidden message .... well, that's the theory, but i've not been able to test it yet. just need a "browser" or something that will let me set the UA to whatever i want .....

snark




msg:441906
 11:56 pm on Sep 12, 2002 (gmt 0)

This question is for Pushycat:

Thanks for the browscap.ini & sample IIS code, which I got from your site! I want to make sure I understand their use. I implement the browscap.ini (or whatever parts of it I want), and then I implement the code in global.asa for each robot I want to ban? Or just the one block of code gets revised to include each robot to be banned? (Or -- is every robot in the browscap.ini banned, so I should only include those which I wish to ban?) My partner is much more of a web programmer than I am and would take care of this, but I want to make sure I understand what needs to be done first!

Thanks a lot,
Snark

Annii




msg:441907
 10:50 am on Sep 13, 2002 (gmt 0)

Hi all,
Thanks so much for the info in this thread,
I use .htaccess and have edited the list here a bit. I want to include it in my existing .htaccess file which has a couple of extra rules in it. Will the following work ok? Is it in the right order etc?

ErrorDocument 404 /404.htm
ErrorDocument 400 /404.htm
ErrorDocument 403 /404.htm
ErrorDocument 501 /404.htm
ErrorDocument 502 /404.htm
ErrorDocument 503 /404.htm

<FilesMatch "htm([l])*$">
ForceType application/x-httpd-php
</FilesMatch>

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.* - [F,L]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

Thanks

Anni

Arcie




msg:441908
 2:29 am on Sep 15, 2002 (gmt 0)

Hi Toolman nice compilation of nasty bots! Have you tried sticking the re-writer in httpd.conf? It would run fastest there, although you noted that there was no noticeable speed difference as it is.

First post!

Since I run a virtual server with a number of different domains, it seems to me it would make more sense to put my list of forbidden UAs in the httpd.conf file, rather than try to replicate them in .htaccess on each domain's document root. Are there any caveats or special directions I should follow before I proceed?

Thanks!

Randy

[edited by: jatar_k at 12:04 am (utc) on Sep. 16, 2002]
[edit reason] no sigs please [/edit]

andreasfriedrich




msg:441909
 11:39 am on Sep 15, 2002 (gmt 0)

Hi Randy and welcome to Webmaster World [webmasterworld.com].

As shown in post #77 [webmasterworld.com] putting your RewriteRules in httpd.conf is indeed faster and the way to go when you have access to it.

However, this will not solve the problem of applying those rules to all virtual servers. You cannot just put the rewriting code in the main section and expect it to work for all virtual servers. For an explanation on this see API Phases [httpd.apache.org] in the mod_rewrite URL Rewriting Engine documentation.

So, after [...] Apache has determined the corresponding server (or virtual server) the rewriting engine starts processing of all mod_rewrite directives from the per-server configuration in the URL-to-filename phase.

my emphasis

There´s also a thread How (and Where) best to control access [webmasterworld.com] that you might want to read on this topic. If you have mod_perl you might want to use the solution mentioned in this thread. Ask carfac [webmasterworld.com] for the modified version of BlockAgent.

And as a sidenote. Do not drop any URLs. Do not use a signature.

veneerz




msg:441910
 1:52 am on Sep 16, 2002 (gmt 0)

Hi,
First, thank you for having this great place, where I was I able to learn more last two weeks then in 6 month since I decided to have my first site.
I am not yet very familiar with .httaccess and when I try to modify it, it always give me an error.
There is text in it that was left there by my host:

# -FrontPage-

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

DirectoryIndex index.html index.htm index.php index.phtml index.php3

# AddType application/x-httpd-php .phtml
# AddType application/x-httpd-php .php3
# AddType application/x-httpd-php .php
#
# Action application/x-httpd-php "/php/php.exe"
# Action application/x-httpd-php-source "/php/php.exe"
# AddType application/x-httpd-php-source .phps

<Limit GET POST>
order deny,allow
deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>

AuthName www.XXXXXX.com
AuthUserFile /www/XXXXXX/_vti_pvt/service.pwd
AuthGroupFile /www/XXXXXX/_vti_pvt/service.grp

Should I remove this before pasting bans or simply add?

Thank you

58sniper




msg:441911
 1:07 pm on Sep 20, 2002 (gmt 0)

Dunno if this helps, but I've found this list: [psychedelix.com...]

stapel




msg:441912
 3:09 pm on Sep 20, 2002 (gmt 0)

I keep having my stuff stolen (by teachers, not students), and, when I can tell what they used, they downloaded my site using FrontPage.

Can I use this Rewrite stuff to block FrontPage from downloading my site? (I know the educators can still get my stuff from their browser's cache, etc, etc, but it would be nice to make them work at stealing, rather than having it be so easy, ya know?)

Thanks!

[edited by: jatar_k at 4:44 pm (utc) on Mar. 13, 2003]

58sniper




msg:441913
 4:30 pm on Sep 20, 2002 (gmt 0)

Hmmm...PHPINFO shows FrontPage 2002 (XP) as
Mozilla/2.0 (compatible; MS FrontPage 5.0)

Dunno if that helps.

Edge




msg:441914
 2:09 pm on Sep 21, 2002 (gmt 0)

When Frontpage first accesses a web site, the file _vti_inf.hmtl is requested. I set up a trap script via. SSI in the html file (_vti_inf.hmtl), search for trap.pl on webmaster world.

The trap.pl script blocks thier ip address from further access to your website. This is very safe since "_vti_inf.hmtl" is only requested by Frontpage.

Works great!

This 243 message thread spans 9 pages: < < 243 ( 1 2 [3] 4 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved