SetEnvIfNoCase User-Agent question

Forum Moderators: phranque

Message Too Old, No Replies

SetEnvIfNoCase User-Agent question

SetEnvIfNoCase User-Agent "[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" bad_user_age

mikesz

10:30 am on Mar 16, 2008 (gmt 0)

Hello all,

I saw a reference to blocking bad bots with junk characters in the User-Agent and this code was referenced, "[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" with no example of what the author intended to use to do the actual blocking.

Logically, I think it translates like the following:

SetEnvIfNoCase User-Agent "[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" bad_user_agent

Can someone give me a clue if this is a viable way to trap user-agents that have junk character and what the reference does. I would be very grateful.

In the original text the author used this as an example of junk text:

Zobv zkjgws pzjngq

Many thanks for your attention,

regards, mikesz

jdMorgan

6:26 pm on Mar 16, 2008 (gmt 0)

> Can someone give me a clue if this is a viable way to trap user-agents that have junk character and what the reference does.

Only you can decide if this detects "junk-character user-agents" -- One Webmaster's junk may be another's treasure...

Standing alone it does not block anything. Rather, it sets the server variable "bad_user_agent" to TRUE (non-blank). That variable can then be tested by Apache mod_access, Apache mod_rewrite (in some cases), Server-Side Includes, or PHP or PERL scripts to take some action -- Usually, to return a 403-Forbidden server response.

The "reference" is a regular-expressions pattern (see tutorial link in our forum charter, and threads in our library) and reads, "Match one uppercase letter, followed by three or more lowercase letters, followed by a space, followed by four or more lowercase letters, followed by a space, followed by four or more lowercase letters." If the user-agent matches this description, then the bad_user_agent variable is set for later use.

Jim

Samizdata

9:27 pm on Mar 16, 2008 (gmt 0)

For example, it would match the user-agent "Use webmasterworld search" but not the user-agent "use WebmasterWorld search" - so I would say it wasn't really viable, even if the correct directive was added.

mikesz

1:51 am on Mar 17, 2008 (gmt 0)

Thanks for the replies. I thought it was missing something. I agree that it is a pretty subjective call. If it used the following:

SetEnvIfNoCase User-Agent "^[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" bad_user_agent

Here is what I think it is saying, to match, the User-Agent needs to start with an upper case alpha character, followed by lower case alphacharacters and should be at least 3 characters but can be more (seems like 3 upper case characters would be a match too?) The second block needs to be all lowercase and can be 4 or more and the third block also needs to be lowercase and can be 4 character or more.

Seems like it is a marginally useful blocking technique and I fear that it would arbitrarily block possibly useful requests.

Thanks again for the replies.

I am actually looking for a way to reduce the size of my badbots lists and am starting to take a look at generating a "White" list of bots and ban any robots that I don't explicitly allow. Anyone have any experience with that technique?

I realized testing yesterday that I could shoot myself in the foot with that premise, i.e. banning everyone from my site LOL...

thanks again, mikesz