Forum Moderators: phranque
I saw a reference to blocking bad bots with junk characters in the User-Agent and this code was referenced, "[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" with no example of what the author intended to use to do the actual blocking.
Logically, I think it translates like the following:
SetEnvIfNoCase User-Agent "[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" bad_user_agent
Can someone give me a clue if this is a viable way to trap user-agents that have junk character and what the reference does. I would be very grateful.
In the original text the author used this as an example of junk text:
Zobv zkjgws pzjngq
Many thanks for your attention,
regards, mikesz
Only you can decide if this detects "junk-character user-agents" -- One Webmaster's junk may be another's treasure...
Standing alone it does not block anything. Rather, it sets the server variable "bad_user_agent" to TRUE (non-blank). That variable can then be tested by Apache mod_access, Apache mod_rewrite (in some cases), Server-Side Includes, or PHP or PERL scripts to take some action -- Usually, to return a 403-Forbidden server response.
The "reference" is a regular-expressions pattern (see tutorial link in our forum charter, and threads in our library) and reads, "Match one uppercase letter, followed by three or more lowercase letters, followed by a space, followed by four or more lowercase letters, followed by a space, followed by four or more lowercase letters." If the user-agent matches this description, then the bad_user_agent variable is set for later use.
Jim
SetEnvIfNoCase User-Agent "^[A-Z][a-z]{3,} [a-z]{4,} [a-z]{4,}" bad_user_agent
Here is what I think it is saying, to match, the User-Agent needs to start with an upper case alpha character, followed by lower case alphacharacters and should be at least 3 characters but can be more (seems like 3 upper case characters would be a match too?) The second block needs to be all lowercase and can be 4 or more and the third block also needs to be lowercase and can be 4 character or more.
Seems like it is a marginally useful blocking technique and I fear that it would arbitrarily block possibly useful requests.
Thanks again for the replies.
I am actually looking for a way to reduce the size of my badbots lists and am starting to take a look at generating a "White" list of bots and ban any robots that I don't explicitly allow. Anyone have any experience with that technique?
I realized testing yesterday that I could shoot myself in the foot with that premise, i.e. banning everyone from my site LOL...
thanks again, mikesz