homepage Welcome to WebmasterWorld Guest from 54.198.42.105
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 80 message thread spans 3 pages: 80 ( [1] 2 3 > >     
A Close to perfect .htaccess ban list - Part 3
More tips and tricks for banning those pesky "problem bots!"
txbakers




msg:1506420
 7:38 pm on Oct 13, 2003 (gmt 0)

Continued from A close to perfect .htaccess ban list - Part 2 [webmasterworld.com]

Whee - what a great discussion.

[edited by: Marcia at 11:23 pm (utc) on Oct. 13, 2003]

[edited by: jdMorgan at 12:24 am (utc) on Nov. 19, 2003]
[edit reason] Corrected URL [/edit]

 

amznVibe




msg:1506421
 4:22 pm on Oct 18, 2003 (gmt 0)

So using these new great condensed rules to detect, block and optionally trap the bad bots, is there a way to pass to the trap.cgi what set of rules caused the trap to trip?

For example if I use this, is there some way to pass to the email that trap sends me what actually happened? (last few lines of the log?)
# Forbid requests for exploits & annoyances - and TRAP
#
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET劣EAD同OST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida夷dq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form圩orm圩orm.?mail妃ail妃ailto)\.(cgi圯xe如l地sp如hp)$ [NC,OR]
# GuestBook
RewriteCond %{REQUEST_URI} (guestbook)\.(cgi圯xe如l地sp如hp)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice囝vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin圭md多ttpodbc好siislog字oot存hell)\.(dll圯xe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/圭gi/圭gi\-local/存umthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
# RewriteRule .* - [F]
RewriteRule .* /cgi-bin/trap.cgi [L]

claus




msg:1506422
 8:05 pm on Oct 18, 2003 (gmt 0)

This seems like a good opportunity to point out that it's not a good idea to copy blindly from this lenghty thread.

By all means it's a great thread and it has tons of good information. Go ahead and use it - just don't copy anything that you are not entirely sure about. Do research. All kinds of people have posted - they all face different issues, and they all have separate reasons for posting what they did. If you are not 100% sure about a thing - ask. Either here or in the other relevant forums.

This thread generally concentrates on making it work. There's not many questions asked about the motives and reasons. Some people will want to ban things other people (competitors) make their living from - and not all bots are bad for everyone. A question is never stupid; a copy can easily become very stupid. Don't just copy and paste, unless:

(1) you know exactly what you are doing, and what you're not doing [AND]
(2)
you know that what you are doing is also the right thing to do [AND]
(3)
you know that what you are [i]not[/i] doing is also the right thing to avoid.

Even though there's nothing questionable or unusual at all, you still need to know exactly what you are doing and why. Otherwise you could cause trouble for yourself and others. That is: Every single line needs to have a reason, and you need to know that reason personally. You also need to know that this line applies to your specific situation.

There is no such thing as a one-size-fits-all.


Here's an innocent example from amznVibes(*) post above:

-------------------------------
RewriteCond %{REQUEST_URI} ^/(bin/圭gi/圭gi\-local/存umthin) [NC,OR]
-------------------------------

The quite normal directory name "/cgi-bin/" is not matched here. In stead, the unusual name "/sumthin" is matched.

What does this mean? It means, that whoever uses this line does not even know the name of his/her own cgi-folder(!?). No, it tells me that this is example code that is not adapted to specific use. This ("sumthin") must be an example - just like writing "www.example.com" when referring to some domain. Don't copy. Adapt to your own specific use in stead.

If you have a folder named "/sumthin/", or a file named "/sumthing-else.html", then it's pretty obvious that visitors requesting that one will get banned. But there's something else. It's probably intentional here to match the special cases of cgi-folders that are not on this particular domain.

So, if you copy this, and run valid scripts on your own server form a folder named /cgi/ then you'll be putting legitimate visitors into the bot-trap. And you don't want to do that. Further, if your bot trap is also located in that folder, then you'll be messing up seriously, and you seriously don't want to do that.

Now, that was an innocent example. As this thread is now in three parts you will find some that are worse, even much worse. So; Don't copy blindly. Do research. Always adapt to your own needs.

/claus


(*) AmznVibe, i'm sure you know what you're doing and i'm not questioning your motives or anything else; this is only aimed at new readers of this amazing thread. It was a good illustration that even perfectly valid lines can cause trouble if applied to non-similar conditions.
jdMorgan




msg:1506423
 10:15 pm on Oct 18, 2003 (gmt 0)

Not to take away from the very important point of claus' post above, but requests for /sumthin are the result of some kind of server probe. IIRC, it's a worm that infects linux servers.

From this morning's "loser" log:

61.78.109.21 - - [18/Oct/2003:09:45:19 -0400] "GET /sumthin HTTP/1.0" 403 234 "-" "-"

...Served it a nice, tasty, low-calorie 403-Forbidden. :)

<added>Actually, to turn this post around to a more "reinforcing" direction, the code claus cites above *cannot* be used on one of my servers, because the "standard" location for user scripts is /cgi-local, which would match one of the patterns in the posted code. If I installed that line as originally posted, it might (depending on the contents and placement of the corresponding RewriteRule in my .htaccess file) actually disable one of my most important scripts! So, again, claus' advice is very sound: Don't just copy and paste this stuff if you don't know what each line means.</added>

Jim

Wizcrafts




msg:1506424
 10:45 pm on Oct 18, 2003 (gmt 0)

On October 18, 2003, AmznVibe queried our human databases on this issue:

So using these new great condensed rules to detect, block and optionally trap the bad bots, is there a way to pass to the trap.cgi what set of rules caused the trap to trip?
For example if I use this, is there some way to pass to the email that trap sends me what actually happened? (last few lines of the log?)

If I grok your question correctly you are asking how to include the variables that trigger the trap script, in the email alert it sends to you. If this is what you want, I have exactly what you are looking for! The following works for me, on my server:

#!/usr/bin/perl -w

$remreq = $ENV{REQUEST_URI};
$remaddr = $ENV{REMOTE_ADDR};
$usragnt = $ENV{HTTP_USER_AGENT} 戌 "The UA is blank";
$referer = $ENV{'HTTP_REFERER'} 戌 "there is no referer";
$date = scalar localtime(time);
$remmeth = $ENV{REQUEST_METHOD};
$remhost = $ENV{'HTTP_HOST'};

open(MAIL, "/usr/sbin/sendmail -t") 戌 die "Content-type: text/text\n\nCan't open /usr/sbin/sendmail!";
print MAIL "To: xxx\@yyy\.zzz\n";
print MAIL "From: xxx\@yyy\.zzz\n";
print MAIL "Subject: You caught another one!\n\n";
print MAIL "The following 'intruder' was caught by the \"Bot Trap\" and has been added to the ban env in .htaccess:\n\n";
print MAIL "The ip address: $remaddr was listed on $date \n";
print MAIL "The file requested was: $remreq\n";
print MAIL "The method used was: $remmeth\n";
print MAIL "The intruder's user agent was: $usragnt\n";
print MAIL "The document was referred by: $referer\n";
print MAIL "The Host Server is was $remhost\n";
close(MAIL);
exit;

This sends me and email as soon as the trap is sprung, which includes the date and time, the intruder's IP, the name of the file requested, the method (GET, POST, CONNECT, etc), the intruder's User Agent (or if blank), the referrer or blank, and the host from which the email was sent.

I hope this helps. You may have different paths to Perl and Sendmail. I also obfuscated my to and from email addresses in the example. You will need to input your own. Also, the vertical pipes (¦¦) are broken on this forum and should be retyped with your keyboard.

Wiz

claus




msg:1506425
 11:46 pm on Oct 18, 2003 (gmt 0)

>> requests for /sumthin are the result of some kind of server probe

That was just one great example of the importance of research :)

>> is there some way to pass to the email that trap sends me what actually happened?

AFAIK, when you do an internal rewrite like the example you posted above, the Environment Variables for the request will get passed on. So you could just get the Environment Variables in the script.

Here's a little snippet that prints all of them out alphabetically as raw text; one variable-value pair on each line. You could include this in the relevant section of the bot-trap you are using, ie. print these in the email:
--------------------------------------------------------------------------
foreach $key (sort keys(%ENV)) {print "$key: $ENV{$key}\n";}
--------------------------------------------------------------------------

/claus


Added: Just saw Wizcrafts example and realized that the above snippet will not print the timestamp. It's in the mail header anyway, but you might want it in the mail body as well. Using "MAIL" like above:
-------------------------------------
$date = scalar localtime(time);
print MAIL "Timestamp: $date\n";
foreach $key (sort keys(%ENV)) {
print MAIL "$key: $ENV{$key}\n";
}

-------------------------------------

amznVibe




msg:1506426
 3:51 am on Oct 19, 2003 (gmt 0)

LoL, I love the overreaction but I know better than to take it personally :)

Before I used the code, not only did I search for "sumthin" but I also Googled for "cgi-local" here before I used that script, and found they were valid terms to block for (in my case ;) ).

What you both missed, but I saw and left in there is "sensepost" which both Google and the internal search cannot find anything on (in webmasterworld). I chose to leave it in there because "it couldn't hurt".

In fact I did customize the script. The original script had nothing about "guestbook" in there, which many sites don't use but are scanned for, and now that I posted it, I realize I can also add ¦htm to the guestbook line. Can I add "¦htm?" or should I use "¦htm¦html"?

But keep up the great work and sharing. I still learned more even from the reaction!

amznVibe




msg:1506427
 4:06 am on Oct 19, 2003 (gmt 0)

Wizcrafts thanks for the code. I already email myself most of that data except for the REQUEST_METHOD and the REQUEST_URI because for some reason in my mind I thought that I would get the trap.cgi info. I forgot that mod_rewrite should actually make the original information available because its not a real redirect (right?). Thanks for the ideas.

By the way I have found $ENV{'REMOTE_HOST'}; never seems to work on my server (I get a blank response) but I found that this code works for me:

$remote_addr = $ENV{'REMOTE_ADDR'};
use Socket;
$iaddr = inet_aton("$remote_addr");
$remote_host = gethostbyaddr($iaddr, AF_INET);
$remote_addr =~ s/\./\\\./gi;

that way I get the reverse dns for REMOTE_ADDR (before it reformats it for the htaccess file)

jdMorgan




msg:1506428
 4:10 am on Oct 19, 2003 (gmt 0)

amznVibe,

None of what I wrote was directed at you, but rather at people who pick up information in this thread out of context; in that case, I believe it *is* important to make the point that while some of the user-agents in the list are "definitely bad," others must be viewed in perspective: They may be good or bad, depending on the specific Web site, the market segment it's in, etc.

Best,
Jim

claus




msg:1506429
 11:35 am on Oct 19, 2003 (gmt 0)

AmznVibe, it wasn't a (over)reaction to your specific post - it was just a few words of warning to others that might copy contents of any post without knowing for sure what each line means. Nothing personal at all :)

>> guestbook... (htm/l)?

First, htm will not catch html, as you have an end anchor in that line ("$"), so you will need both. Second, it might be better to use this line in stead:

RewriteCond %{REQUEST_URI} guestbook [NC,OR]

Why? Because you intend to match requests for an url containing the word "guestbook", so you should focus on the important part. Otherwise you would still miss the ".shtml" extension, and then you would still miss the ".php4" extension, and then you would still miss the ".jsp" extension, and then you would still miss the ".cfm" extension, and then.... This example matches "guestbook" anywhere in the URL, extensions does not matter.

For others: This line implies that if you request a guestbook url, then you will get on the "banned bad-bots" list. This might not be a good idea for everyone, especially not if you run a guestbook.

>> mod_rewrite should actually make the original information available because its not a real redirect (right?)

Right. See post #6 - you can even use the snippet i provided to check it.

Just add the perl shebang line (#!/path/to/perl) before it, a file name ending in ".pl" or ".cgi" and chmod it to 755. Then make an internal rewrite from some odd filename to this file. Enter the odd filename in your browser address bar to get a list of all environment variables for this request.

/claus


Not many forums have threads that go on for so long that you can post your #1,000 post in the same thread that originally made you discover the forum (my first post was probably in this thread as well). A big "thank you" to everyone that ever posted here. :)
amznVibe




msg:1506430
 12:40 pm on Oct 19, 2003 (gmt 0)

Actually since I want any extension but the uri pretty much should start with guestbook, I probably should do this no?
{REQUEST_URI} ^guestbook [NC,OR]

what I (re)did in the end was just extend the msoffice line

RewriteCond %{REQUEST_URI} ^/(MSOffice囝vti¦guestbook) [NC,OR]

(btw even if someone has a guestbook, they would be foolish to call it "guestbook" for the html page or the cgi, because its like keeping your formmail cgi named "formmail.cgi" and being surprised when you get spambot attacks)

congrats on 1000 posts, I bet at least 90% or more of them really helped folks... thanks! (is claus short for santa-claus? ;) )

claus




msg:1506431
 4:25 pm on Oct 19, 2003 (gmt 0)

The environment variable
REQUEST_URI is offset from the domain name; it includes the slash in front of it. If you want a start anchor, you should include the slash like this:

RewriteCond %{REQUEST_URI} ^/guestbook [NC,OR]

This will also match directories like this: http //example.com/guestbook/index.php

If you just want to match "guestbook+dot+some extension" then you could include the dot in the rewrite condition:

RewriteCond %{REQUEST_URI} ^/guestbook\. [NC,OR]

/claus
I wasn't creative enough to find a nick, and i keep forgetting those anyway whenever i use them ;)

amznVibe




msg:1506432
 1:54 pm on Oct 21, 2003 (gmt 0)

Okay I have another issued based on a good idea that was posted in this thread (well in part 1 or 2)
How do I allow exceptions to this ruleset:
# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]

where I need an exception either for one specific cgi (not prefered solution but acceptable) or based on the rDNS of the visitor IP (my host doesn't seem to provide the env Remote_Host name, is that common because of overhead for rDNS or am I doing something wrong?)

you can get the background on why I have this problem here [webmasterworld.com]

jdMorgan




msg:1506433
 2:25 pm on Oct 21, 2003 (gmt 0)

amznVibe,

> overhead for rDNS

Yes, it is often slow enough to visibly slow down your site. Use %{REMOTE_ADDRESS} instead - just look up the IP address range for the host you want to allow, and plug that into a RewriteCond in your ruleset.

Jim

amznVibe




msg:1506434
 2:43 pm on Oct 21, 2003 (gmt 0)

Well PayPal has warned me that the IP is subject to change and may float, so I might just have to allow the cgi access instead. Sigh. I guess I'll have to rely on security via the obscurity of the cgi name being unknown.

Amiganiac




msg:1506435
 9:25 pm on Oct 21, 2003 (gmt 0)

I'm new to this ... but why perform all the
checks for every access? Wouldnt it be
better to just check when 404 occured?

And the 404-error script then can decide
what to do? Guess this should reduce server load
a little ... or am I wrong?

DrJOnes




msg:1506436
 9:36 am on Nov 2, 2003 (gmt 0)

Hi all.
Thank you so muh for all the info posted. I've been reading a lot on building my htaccess file. Wow, it's fascinating to see the amount of info I have learned in three days...

One question.
I have a problem with a bot on an html page located on a geo cities html page. How do I block requests from that page?

For example, let's say the bot is placed on :
http://www.example.com/annoyingsite/bot.html

What would be the syntax in my htaccess to block any request coming from that html page?

Thank you,

DrJOnes

[EDIT] While I'm at it, I have another question:

If I place a general htaccess file in my website root (where the main index.html file is located), do the rules apply to all subfolders, even password protected folders? For instance, I have added set of rules that block bots and site grabbers in my .htaccess in my root. I have another .htaccess file located in a member directory (/members/.htaccess). Do I need to re-insert all block rules in that .htaccess file as well? Thanks.

[edited by: jdMorgan at 8:02 pm (utc) on Nov. 2, 2003]
[edit reason] Examplified URL and delinked [/edit]

Wizcrafts




msg:1506437
 5:28 pm on Nov 2, 2003 (gmt 0)

For example, let's say the bot is placed on :
http://www.example.com/annoyingsite/bot.html

What would be the syntax in my htaccess to block any request coming from that html page?

I believe that would be something like this (but I could be mistaken):

RewriteRule ^http://www\.example\.com/annoyingsite/bot\.html$ - [F,L]

In answer to question two; your root .htaccess applies to all subdirectories. If you place an htaccess command in a subdirectory it only applies to that directory, and its children. If similar but conflicting rules appear in both, the root .htaccess rules will override subdirectories htaccess commands.

Wiz

[edited by: jdMorgan at 8:04 pm (utc) on Nov. 2, 2003]
[edit reason] Examplified and delinked URL [/edit]

DrJOnes




msg:1506438
 6:08 pm on Nov 2, 2003 (gmt 0)

Thanks, I appreciate the tip and also the clarificationon question 2. I have added the code in my .htaccess file and will monitor my stats in the next few days to see if it blocked the url successfuly.

Just to make sure I understood correctly, the ban rules inserted in the root .htaccess will also be effective in all subdirectories, even user protected subdirectories that contain a new .htaccess, correct?

DrJOnes

Wizcrafts




msg:1506439
 6:32 pm on Nov 2, 2003 (gmt 0)

Just to make sure I understood correctly, the ban rules inserted in the root .htaccess will also be effective in all subdirectories, even user protected subdirectories that contain a new .htaccess, correct?

DrJOnes


That is correct! And, welcome to WebmasterWorld!

I usually experiment with rules in specially created subdirectories to test my brand new rules, then add them to the root .htaccess when they are proved to be safe. Before learning that I have blocked access to my website with bad commands in .htaccess. An example of a simple mistake that can break your website is to forget to escape blank spaces in RewriteCond statements, such as this example:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0 (compatible)$ [NC,OR]
This will lockout all visitors. It should be escaped before blank space, and before both parenthesis, thusly:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [NC,OR]

Wiz

amznVibe




msg:1506440
 8:40 am on Nov 5, 2003 (gmt 0)

Hmm, how do I patch in IP exceptions to the user agent blocks?
For example if I want to make an exception for 123.123.123.123 for this list:

RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse划val吁urf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo刎ull.?Web印ite同roduction刎ranklin危issauga危issigua).?(Bot印ocat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net多hjhj@yahoo\.com奸erly\.net妃apfeatures\.net妃etacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry匈nternet匈UFW印incoln危issouri同rogram).?(Program划xplore名eb吁tate列ollege吁hareware) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac吐am划ducate名EP).?(Finder吁earch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa危SIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC]
RewriteRule .* - [F]

Is it as simple as adding
RewriteCond %{REMOTE_ADDR}!^123\.123\.123\.123$
at the top of the list? (without any [OR] )

Wizcrafts




msg:1506441
 2:20 pm on Nov 5, 2003 (gmt 0)

Is it as simple as adding
RewriteCond %{REMOTE_ADDR} !^123\.123\.123\.123$
at the top of the list? (without any [OR] )

AmznVibe;
Yes, but I think you should put that condition at the end of the conditions list, not the top. Without the [OR] switch it becomes an AND NOT condition, so putting it at the top, before any AND or OR conditions makes no logical sense. Make it the last rule before the RewriteRule .* - [F] line.

Wiz

jdMorgan




msg:1506442
 3:54 pm on Nov 5, 2003 (gmt 0)

It shouldn't make any difference, since the scope of [OR] is "local" to the line it is found on and the line that follows that one. Therefore the implied parentheses controlling operator precedence always surround the [OR]ed lines, if that makes any more sense.

In other-other words,

NOT(someIP) AND (ua1 OR ua2 OR ua3...) is equivalent to
(ua1 OR ua2 OR ua3...) AND NOT(someIP)

I usually like to put RewriteCond exclusions in the order most likely to stop Rule processing soonest, so that unnecessary RewriteCond testing does not take place. I put the most "selective" RewriteConds first as a speed-up, in other words.

Jim

Wizcrafts




msg:1506443
 11:19 pm on Nov 5, 2003 (gmt 0)

NOT(someIP) AND (ua1 OR ua2 OR ua3...) is equivalent to
(ua1 OR ua2 OR ua3...) AND NOT(someIP)

Why couldn't I see that? Duh!

jdMorgan




msg:1506444
 12:42 am on Nov 6, 2003 (gmt 0)

Oh, it's not at all obvious... It depends entirely on where the implicit parentheses go, and the only clue I've ever found is the phrase, "to combine rule conditions with a local OR instead of the implicit [per-line] AND" in the description of the RewriteCond [OR] flag. (emphasis and clarification "per-line" added).

It made my brain hurt until I experimented with it and figured out the operator and per-line precedence... and I've worked using complex boolean logic on a daily basis for almost 30 years. :)

Jim

Wizcrafts




msg:1506445
 1:00 am on Nov 6, 2003 (gmt 0)

What I mean is that I failed to grok the equality of the NOT AND and AND NOT operators. This is something I studied a long time ago, in a far away place, in binaryland. I should have retained that knowledge and that is what got my goat, which led to my misstatement about the order of the rules.

W

balam




msg:1506446
 1:00 am on Nov 8, 2003 (gmt 0)

Sometimes I'm quite behind the times... :)

amznVibe quoted verbatim my .htaccess file, so I thought I'd point out a couple of things as well...

> /sumthin

As earlier noted, "sumthin" isn't an example but an exploit check. I was hit three times by folks looking for, well, something, and decided to add that to my "directories that don't exist" line in .htaccess. Subsequent WebmasterWorld research and discussion lead to the opinion that someone was checking the server response headers for something to exploit...

> sensepost.exe

I remember that I had two different visitors drop by looking for this mystery file in the space of a couple of days, but since I didn't like the look of it (or could find anything about it), I added it to my .htaccess. This was some time ago and I don't think anyone has come looking for it since... No real reason why it should be left/added in your, the reader's, .htaccess file.

DrJOnes




msg:1506447
 10:33 am on Nov 20, 2003 (gmt 0)

I have done a lot of reading here and bellow is my block list (mainly a copy/paste of another list found here, with minor personal modifications to it).

I checked my stats and it blocked pretty much everything that needed blocking.

Except, I still get one UNKNOWN BROWSER in my server stats. I fear that this unknown browser could be a grabber.

Is there a safe way to block unknown browsers without blocking legit browsers?

Thanks,

DrJOnes666

MY BLOCK LIST IN HTACCESS:
--------------------------
(PS: if you copy/paste this block list for your htaccess, don't forget to change all for the correct vertical bar on your keyboard!)
--------------------------

RewriteEngine On

# Forbid requests for exploits & annoyances
# Bad requests
RewriteCond %{REQUEST_METHOD}!^(GET劣EAD同OST) [NC,OR]
# CodeRed
RewriteCond %{REQUEST_URI} ^/default\.(ida夷dq) [NC,OR]
RewriteCond %{REQUEST_URI} ^/.*\.printer$ [NC,OR]
# Email
RewriteCond %{REQUEST_URI} (mail.?form圩orm圩orm.?mail妃ail妃ailto)\.(cgi圯xe如l)$ [NC,OR]
# MSOffice
RewriteCond %{REQUEST_URI} ^/(MSOffice囝vti) [NC,OR]
# Nimda
RewriteCond %{REQUEST_URI} /(admin圭md多ttpodbc好siislog字oot存hell)\.(dll圯xe) [NC,OR]
# Various
RewriteCond %{REQUEST_URI} ^/(bin/圭gi/圭gi\-local/存umthin) [NC,OR]
RewriteCond %{THE_REQUEST} ^GET\ http [NC,OR]
RewriteCond %{REQUEST_URI} /sensepost\.exe [NC]
RewriteRule .* - [F]

# Forbid if blank (or "-") Referer *and* UA
RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F]

# Banning BOTS bellow
# Address harvesters
RewriteCond %{HTTP_USER_AGENT} ^(autoemailspider划xtractorPro) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^E?Mail.?(Collect劣arvest危agnet吐eaper吁iphon吁weeper名olf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DTS.?Agent划mail.?Extrac) [NC,OR]
RewriteCond %{HTTP_REFERER} iaea\.org [NC,OR]
# Download managers
RewriteCond %{HTTP_USER_AGENT} ^(Alligator刑A.?[0-9]刑C\-Sakura刑ownload.?(Demon划xpress危aster名onder)刎ileHound) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Flash印eech)Get [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Fresh印ightning危ass吐eal吁mart吁peed吁tar).?Download(er)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gamespy刖o!Zilla夷Getter匡etCar吉et(Ants同umper)吁iteSnagger吋eleport.?Pro名ebReaper) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(My)?GetRight [NC,OR]
# Image-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(AcoiRobot刎lickBot安ebcollage) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Express危ister名eb).?(Web同ix匈mage).?(Pictures列ollector)? [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image.?(fetch吁tripper吁ucker) [NC,OR]
# "Gray-hats"
RewriteCond %{HTTP_USER_AGENT} ^(Atomz冰lackWidow冰logBot划asyDL危arketwave吁qworm吁urveyBot名ebclipping\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (girafa\.com夙ossamer\-threads\.com夙rub\-client吉etcraft吉utch) [NC,OR]
# Site-grabbers
RewriteCond %{HTTP_USER_AGENT} ^(eCatch(Get吁uper)Bot匠apere劣TTrack匡OC吏ffline各tilMind合aldon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(Auto列op圬up刎etch刎ilter刖ather刖o印each危ine危irror同ix吊L吐ACE吁auger) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web.?(site.?(eXtractor吊uester)吁nake存ter吁trip吁uck宅ac安alk名hacker后IP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCapture [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo\ Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
# Tools
RewriteCond %{HTTP_USER_AGENT} ^(curl刑art.?Communications划nfish多tdig匡ava奸arbin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FrontPage匈ndy.?Library吐PT\-HTTPClient) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww奸wp同HP同ython安ww\.thatrobotsite\.com安ebbandit名get后eus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Microsoft危FC).(Data匈nternet各RL名ebDAV刎oundation).(Access划xplorer列ontrol危iniRedir列lass) [NC,OR]
# Unknown
RewriteCond %{HTTP_USER_AGENT} ^(Crawl_Application印achesis吉utscrape) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[CDEFPRS](Browse划val吁urf) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Demo刎ull.?Web印ite同roduction刎ranklin危issauga危issigua).?(Bot印ocat) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (efp@gmx\.net多hjhj@yahoo\.com奸erly\.net妃apfeatures\.net妃etacarta\.com) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Industry匈nternet匈UFW印incoln危issouri同rogram).?(Program划xplore名eb吁tate列ollege吁hareware) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Mac吐am划ducate名EP).?(Finder吁earch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(Moz+illa危SIE).?[0-9]?.?[0-9]?[0-9]?$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9][0-9]?.\(compatible[\)\ ] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverRobot [NC]
RewriteRule .* - [F]

balam




msg:1506448
 3:21 pm on Nov 20, 2003 (gmt 0)

It's the stats software calling it "UNKNOWN BROWSER", correct? Do you know what the User Agent/UA/browser string is?

DrJOnes




msg:1506449
 6:17 pm on Nov 20, 2003 (gmt 0)

Yeah, it's the stat software calling it "Unknown"...
I use Advanced Web Statistics 5.9 (awstats).

DrJOnes666

This 80 message thread spans 3 pages: 80 ( [1] 2 3 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved