Forum Moderators: phranque

Message Too Old, No Replies

RewriteCond help

Mozilla Agents - Good versus Bad?

         

ICPWeb

6:54 am on Jan 10, 2004 (gmt 0)

10+ Year Member



Hello all,

Any insights on this would be greatly appreciated.

How do I determine good Mozilla Agents from bad ones?
I'm seeing a lot of entires in my site logs for:

Mozilla/3.01 (compatible;)
Mozilla/4.0

And there's no-other info, or identification associated with them.

If these are bad agents what is the proper syntax to add RewriteCond for them in my htaccess file?

Is this correct?
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.01 \ (compatible) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 [OR]

Thank you in advance.

[edited by: jdMorgan at 7:12 am (utc) on Jan. 10, 2004]
[edit reason] Disabled smiley-faces for this post [/edit]

jdMorgan

7:18 am on Jan 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ICPWeb,

Welcome to WebmasterWorld [webmasterworld.com]!

Mozilla/3.01 is one of the most common user-agents advertised by caching proxies -- block that, and you block half the world.

Here are some Mozilla 'filters' I use -- based on badly-faked user-agent strings:


# BLOCK faked Mozilla UAs.
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[1-9]\.[0-9]+$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\+?\(compatible; [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/3.Mozilla/2\.
RewriteRule !^403error\.html$ - [F]

It's been long enough that I can't remember the raw user-agent string(s) that match each one, but you can derive them from the patterns.

To the best of my knowledge, this code does not block any legitimate Mozilla user-agents, but you use it entirely at your own risk.

Jim
<edit> typo </edit>

[edited by: jdMorgan at 9:42 am (utc) on Jan. 10, 2004]

ICPWeb

8:49 am on Jan 10, 2004 (gmt 0)

10+ Year Member



Thanks Jim.

I'm trying my hardest to get a handle on some of these robots that are hammering my sites.
Although I've been around for awhile the UA(s) are new territory for me :(

I've been building new htaccess and robot txt files (with much help from webmasterworld) to control as many of them as I can.

But I'm very perplexed over what to do about agents like Webcopier Pro, where a user can use it's advance tools featured to change/mask the agents identification as, Anonymous, MS IE, Netscape Comm., Opera and/or a user defined id.

Having this:
RewriteCond %{HTTP_USER_AGENT} ^WebCopier.* [OR]
in my htaccess file doesn't prevent a user of Webcopier Pro from sucking down what he wants if he elects to change the agent's identification.

I simply have no idea what to do about this, and can't seem to find any understandable insight from the forums on it.

Simply dumbfounded!