Forum Moderators: phranque
Any insights on this would be greatly appreciated.
How do I determine good Mozilla Agents from bad ones?
I'm seeing a lot of entires in my site logs for:
Mozilla/3.01 (compatible;)
Mozilla/4.0
And there's no-other info, or identification associated with them.
If these are bad agents what is the proper syntax to add RewriteCond for them in my htaccess file?
Is this correct?
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.01 \ (compatible) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4.0 [OR]
Thank you in advance.
[edited by: jdMorgan at 7:12 am (utc) on Jan. 10, 2004]
[edit reason] Disabled smiley-faces for this post [/edit]
Welcome to WebmasterWorld [webmasterworld.com]!
Mozilla/3.01 is one of the most common user-agents advertised by caching proxies -- block that, and you block half the world.
Here are some Mozilla 'filters' I use -- based on badly-faked user-agent strings:
# BLOCK faked Mozilla UAs.
RewriteCond %{HTTP_USER_AGENT} ^Mozilla$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[1-9]\.[0-9]+$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4\.0\+?\(compatible; [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3\.0\ \(compatible\)$ [OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/3.Mozilla/2\.
RewriteRule !^403error\.html$ - [F]
To the best of my knowledge, this code does not block any legitimate Mozilla user-agents, but you use it entirely at your own risk.
Jim
<edit> typo </edit>
[edited by: jdMorgan at 9:42 am (utc) on Jan. 10, 2004]
I'm trying my hardest to get a handle on some of these robots that are hammering my sites.
Although I've been around for awhile the UA(s) are new territory for me :(
I've been building new htaccess and robot txt files (with much help from webmasterworld) to control as many of them as I can.
But I'm very perplexed over what to do about agents like Webcopier Pro, where a user can use it's advance tools featured to change/mask the agents identification as, Anonymous, MS IE, Netscape Comm., Opera and/or a user defined id.
Having this:
RewriteCond %{HTTP_USER_AGENT} ^WebCopier.* [OR]
in my htaccess file doesn't prevent a user of Webcopier Pro from sucking down what he wants if he elects to change the agent's identification.
I simply have no idea what to do about this, and can't seem to find any understandable insight from the forums on it.
Simply dumbfounded!