Forum Moderators: phranque

Message Too Old, No Replies

Blocking Useragents of combined words?

Because spiders shouldn't be allowed to have split personalities.

         

JAB Creations

8:07 am on Apr 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The useragent that got on my nerve...

NuSearch Spider (compatible; MSIE 6.0)

It's either NuSearch or MSIE - not both!

So I want to deny useragents that include two words as penality. I'm 100% sure I want to do this but I only have some sort of idea what to do...

This is just an uneducated guess...

RewriteCond %{HTTP_USER_AGENT} NuSearch & MSIE
RewriteRule .* - [F,L]

Is the amperstamp the AND operator?

John

jdMorgan

2:13 pm on Apr 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



NuSearch Spider (compatible; MSIE 6.0)

It's either NuSearch or MSIE - not both!

It doesn't claim to be both. It states (in an approximation of the formal language of User-agent strings, as originally defined by Netscape [mozilla.org]) that it's name is NuSearch, and that it is compatible with MSIE 6.0, meaning it can handle any markup that MSIE 6.0 can handle.

I think you'll find that the new Googlebot is Mozilla-compatibile, so consider this before banning these 'split personality' User-agents based on "compatible."

This particular format is questionable, since the User-agent is supposed to follow the name of the thing it's claiming to be compatible with, but there are plenty of legitimate spiders that mix up this syntax (Their authors should click on the link above).

I'm not saying you shouldn't block this UA, but be very careful of generalizing to block anything using this format.

To answer your coding question, "&" is not interpreted as an AND function in mod_rewrite, since it's often used as a delimter between query string name/value pairs. To perform the AND function, use two RewriteConds:


RewriteCond %{HTTP_USER_AGENT} ^NuSearch\ Spider
RewriteCond %{HTTP_USER_AGENT} MSIE
RewriteRule .* - [F]

In the absense of the [OR] (local-OR) flag, RewriteConds are ANDed by default, so all must match (evaluate as TRUE) to invoke the rule.

If the sub-strings are always in the same order, then you can just code that into the pattern itself:


RewriteCond %{HTTP_USER_AGENT} ^NuSearch\ Spider.+MSIE
RewriteRule .* - [F]

But if this UA were to annoy me, I'd just use:

RewriteCond %{HTTP_USER_AGENT} ^NuSearch
RewriteRule .* - [F]

Jim

JAB Creations

4:13 pm on Apr 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim!

The spider does not annoy me, it's the useragent that annoys me!

I don't care what is compatible with what when I'm looking at my statistics and have to wonder what ...misguided useragents are tainting my statistics in regards to browsers and spiders. Sure I can create filters but that will always require manually doing so until some sort of standard is established for useragents.

John