Forum Moderators: phranque

Message Too Old, No Replies

Banning User Agent

Syntax to use

         

LoneGunman

2:37 pm on Jan 12, 2004 (gmt 0)

10+ Year Member



This subject has probably been covered somewhere in this huge list but after checking through the first 16 pages I was unable to find it. (Search feature?)

I want to use htaccess to ban by user agent certain rogues who use fairly distinctive browsers. I not sure what the exact sytnax is for different agents. I'm not sure which fields, for example require to be "escaped". I have a couple of example from by log below. If someone can show the syntax needed it would be helpful. Also is it necessary to have all the agent listed? Could for instance anyone using Opera browser of any type be banned?

"Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)"

[could this agent be banned just by Gecko/20030624 Netscape/7.1 (ax)]?

Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

What would be the syntax to use on this for instance?

RewriteEngine On
RewriteCond %(HTTP_USER_AGENT) ^Mozilla/5.0 Windows; U; Win 9x 4.90; en-US; rv:1.4 Gecko/200030624 Netscape/7.1 (ax)
RewriteRule ^.*$ - [F]

Thank for the great resource

jdMorgan

7:14 pm on Jan 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



loner,

Many of the characters in your user-agent string have special meanings to the regular-expressions parser used by mod_rewrite. Therefore, they must be escaped by preceding them with a backslash:


^Mozilla/5\.0\ Windows;\ U;\ Win\ 9x\ 4\.90;\ en-US;\ rv:1\.4\ Gecko/200030624\ Netscape/7\.1\ \(ax\)$

However, this is one of the more popular non-Microsoft browers, and I recommend that you do not block it.

Jim

LoneGunman

1:35 pm on Jan 13, 2004 (gmt 0)

10+ Year Member



Would the incoming agent have to match the entire string exactly to be redirected or only some part of it?

Thanks

jdMorgan

4:10 am on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



LoneGunman,

As posted above, using both a start and end anchor ("^" and "$") and omitting the [NC] (no case) flag, the match would have to be exact letter-per-letter, and the case of each letter would have to match.

I'd suggest you block by user-agent *and* IP address range, in order to minimize collateral damage, since Netscape 7 is one of the top second-tier browsers. In other words, please don't block me from your site, just because I'm browsing using Netscape tomorrow!

You can add RewriteConds to limit the damage to certain class A, B, or C IP address ranges; Make them as wide a range as necessary, but no wider. Also, even with an IP restriction to limit collateral damage, be aware that your *are* likely to whack a few innocent bystanders, so I'd recommend "being polite" on whatever page they end up at.


# limit rule to 256 IP addresses starting at 192.168.0.0
RewriteCond %{REMOTE_ADDR} ^192\.168\.0\. [OR]
# limit rule to 65536 IP addresses beginning at 90.0.0.0
RewriteCond %{REMOTE_ADDR} ^90\.0\. [OR]
# limit rule to 16,777,216 IP addresses beginning at 10.0.0.0
RewriteCond %{REMOTE_ADDR} ^10\.
# Note No [OR] on previous RewriteCond, so this is an "AND"
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ Windows;\ U;\ Win\ 9x\ 4\.90;\ en-US;\ rv:1\.4\ Gecko/200030624\ Netscape/7\.1\ \(ax\)$
RewriteRule .* - [F]

So this code blocks that browser only if it comes from one of those three address range, each of a different size.

Jim

LoneGunman

1:34 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



Thanks,
I was using the above only as an example to see how to parse it out with the "\". The user agents containing Indy Library are the biggest problem for the moment. Now I find that Mod_rewrite is not even enabled so I'm screwed anyway. I don't have access to the apache configuration file to turn it on. is there any other way to do a re-direct for agents using Indy Library?

jdMorgan

6:41 pm on Jan 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, see mod_setenvif and mod_access. You may be able to use them instead:

SetEnvIf User-Agent "Indy.Library" getout
SetEnvIf Request_URI "^(/403.*\.html¦/robots\.txt)$" public
<Files *>
Order Deny,Allow
Deny from env=getout
Allow from public
</Files>

The first line sets a variable called "getout" if the user-agent is Indy
The second line sets a variable called "public" if the request is for a custom error page called "403.html" or for robots.txt, both of which should be universally-accessible.
The remaining code section blocks Indy unless it is requesting either of those two files.

The names of the variables are arbitrary, as is the name of the custom error page -- use any names you like, as long as they're consistent.

Be aware that "User-Agent" is hyphenated and "Request_URI" uses an underscore. This is how they are shown in the Apache SetEnvIf documentation, and I've never had the time or inclination to experiment to see if it mattered.

Change all broken pipe "¦" characters above to solid pipes before use.

If this method isn't allowed on your server, it may be time for a better host...

Actually, you didn't say how you "knew" that mod_rewrite wasn't enabled, so I took your word for it. I should point out that you may need to precede your mod_rewrite code with:


Options +FollowSymLinks
RewriteEngine on

This should only appear once in your file, before any other mod_rewrite code.

Jim

LoneGunman

7:15 pm on Jan 14, 2004 (gmt 0)

10+ Year Member



Thanks I will give this a try.

I can only assume the mod_rewrite is not enabled because the script I wrote is not working. No one will respond to my e-mails asking about it so I don't really know for sure. In the meantime I will try your latest suggestion and at the same time shop around for another host that better meets my needs. Below is part of the script I'm using. Maybe its screwed up which is why It's not working. Hard to tell since the dopes that run the server won't talk to me.
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} snykeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scooter [NC,OR]
RewriteCond %{HTTP_USER_AGENT} slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC] [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.*$ [****.com...] [L]

jdMorgan

7:18 pm on Jan 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The rest is OK, but this part is problematic:

RewriteCond %{HTTP_USER_AGENT} Indy.Library [b][NC] [OR][/b]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.*$ [b]http://****.com/_.htm[/b] [L]

That should be:

RewriteCond %{HTTP_USER_AGENT} Indy.Library [b][NC,OR][/b]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule .* [b]/_.htm[/b] [L]

Instead of encouraging these automated user-agents, I suggest you change your rule to
RewriteRule .* - [F]

and just give them a 403-Forbidden response. Otherwise, they will get a 200-OK and keep coming back for more.

Jim