homepage Welcome to WebmasterWorld Guest from 23.20.28.193
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
How to Block Bogus User Agent ") )"
wilderness




msg:4664323
 3:29 am on Apr 19, 2014 (gmt 0)

There must be some absurd reason why both SetEnvIF & mod_rewrite fail to grasp this?

I've some other similar items for broken UA's and spaces that work just fine.

Ive tried and failed:
SetEnvIf User-Agent "\) \)" keep_out
SetEnvIf User-Agent \)\ \)$ keep_out

RewriteCond %{HTTP_USER_AGENT} \)\ \)$

quoted containers fail in mod_rewrite thus, would fail as well.

RewriteCond %{HTTP_USER_AGENT}"\) \)"$

Any ideas?

 

not2easy




msg:4664340
 5:24 am on Apr 19, 2014 (gmt 0)

have you tried
\b\)\ \)/
as in
SetEnvIf User-Agent \b\)\ \)/ keep_out

This is for two ) with a space between. I think. It is purely a guess, because I can't see the UA you want to hit.

wilderness




msg:4664346
 6:07 am on Apr 19, 2014 (gmt 0)

Thanks.

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )"

wilderness




msg:4664347
 6:11 am on Apr 19, 2014 (gmt 0)

That fails and gives a loop.

wilderness




msg:4664350
 6:42 am on Apr 19, 2014 (gmt 0)

After reviewing my logs, it actually caused a 500.

keyplyr




msg:4664358
 7:46 am on Apr 19, 2014 (gmt 0)

For:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )"

Try:
RewriteCond %{HTTP_USER_AGENT} \) \)$

lucy24




msg:4664361
 8:17 am on Apr 19, 2014 (gmt 0)

\b\)

\b in conjunction with a non-word character is meaningless, since you are already at a word boundary by definition.

Why don't you just slam the door on MSIE 6? Doesn't address the question, but allows you to sidestep it ;)

RewriteCond %{HTTP_USER_AGENT} \)\ \)$

What happens when you do this? And what's the closing anchor for? Surely the ) ) sequence never occurs legitimately? In fact, did you try it without the anchor?

I experimented on my test site and confirmed that (a) the sequence is recognized and (b) it doesn't lead to anything in the 500 class. Er, I guess that's (b) and (a) in that order.

The form in mod_rewrite is
\)\ \)
(everything escaped); the form in mod_setenvif can be any one of
"\) \)"
\)\ \)
"\)\ \)"
where I guess the third form is belt-and-suspenders.

I first tested by asking for the bogus page "blahblah) )" and writing the rules accordingly:
SetEnvIf Request_URI etcetera
RewriteCond %{THE_REQUEST} etcetera
but then I remembered that Safari lets you put in a fake user-agent so I tried it that way as well.

Everything worked fine-- that is to say, I successfully locked out or redirected myself. Do you happen to know what Apache version you're on?

MickeyRoush




msg:4664382
 12:35 pm on Apr 19, 2014 (gmt 0)

What about using the character class?

RewriteCond %{HTTP_USER_AGENT} \)\s\)$

brotherhood of LAN




msg:4664399
 2:30 pm on Apr 19, 2014 (gmt 0)

Not sure if Apache sanitizes anything to this point, but there may be null/unprintable characters in there?

wilderness




msg:4664409
 3:32 pm on Apr 19, 2014 (gmt 0)

FWIW, I've had the following lines active for some years (in multiple-files/directories:

SetEnvIf User-Agent " ; " keep_out
SetEnvIf User-Agent " \( " keep_out
SetEnvIf User-Agent "; " keep_out
SetEnvIf User-Agent "\) ; " keep_out

Unfortunately, the second parentheses with the leading blank space, will not function in this same manner.

wilderness




msg:4664414
 4:05 pm on Apr 19, 2014 (gmt 0)

lucy,
This seems to work (tested by changing UA and got my custom 403)

RewriteCond %{HTTP_USER_AGENT} \)\ \)
RewriteRule .* - [F]

Many thanks

Don

keyplyr




msg:4664487
 1:30 am on Apr 20, 2014 (gmt 0)

For:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )"

Try:
RewriteCond %{HTTP_USER_AGENT} \) \)$

And what's the closing anchor for? Surely the ) ) sequence never occurs legitimately?

Nothing as far as I'm concerned. I just figured since wilderness kept putting it in his examples, then he needed it for some (yet to be disclosed) reason. Regardless, it works both ways.

wilderness




msg:4664507
 3:48 am on Apr 20, 2014 (gmt 0)

Many thanks to all for the help.

With some help from lucy, as well as some previous syntax provided by Jim, I was able to convert the previous SetEnvIf's to mod_rewrite:

RewriteCond %{HTTP_USER_AGENT} \)\ \) [OR]
RewriteCond %{HTTP_USER_AGENT} \ ;[\ ] [OR]
RewriteCond %{HTTP_USER_AGENT} \ \([\ ] [OR]
RewriteCond %{HTTP_USER_AGENT} ;\ [\ ] [OR]
RewriteCond %{HTTP_USER_AGENT} \)\ ;[\ ]
RewriteRule .* - [F]

lucy24




msg:4664513
 4:30 am on Apr 20, 2014 (gmt 0)

Doesn't it seem as if those five conditions should be collapsible into one, with the aid of a few more pipes and brackets? When you look at it with human eyeballs and human brain, they all reduce to "punctuation in the wrong place".

:: detour to multi-file search in TextWrangler ::

You wouldn't think " )" could ever occur in a legitimate human UA, but there's a fair number of them. I tried
200 .+? \)..+"
to filter out known bad bots and to constrain the search to non-final " )" (they're fairly common at the end of a UA string).

On the other hand, I can't find any legitimate "( " (space after opening parenthesis). Some 200s, but on closer inspection they're all robots or hotlinkers.

You can definitely collapse this pair:
\)\ \)
and
\)\ ;[\ ]
become
\)\ (;\ |\))
and, as a bonus, it gets rid of that vexing Trailing Space Issue :)

This one
;\ [\ ]
could also be expressed as
;\ {2,}
though you don't actually save any bytes, and may be putting the server to more work, so scratch that unless the form is easier for you to internalize.

As an alternative,
\ ;[\ ]
and
;\ [\ ]
are
(;\ |\ ;)[\ ]
though again this isn't much of a gain unless you're really trying to conserve line breaks.

Dang. Where's the RewriteCond that says
%{ANY-ELEMENT-OF-REQUEST} fishy-punctuation
?

incrediBILL




msg:4673630
 6:14 pm on May 22, 2014 (gmt 0)

After reading this it occurs to me that anyone wanting to drive a novice bot blocking webmaster nuts would be to simply make the user agent contain all of the special characters that require escaping in Apache.

What a load of fun that would be!

blend27




msg:4674224
 3:44 pm on May 24, 2014 (gmt 0)

On top of that I just checked logs on 2 e-com sites and there are legitimate customers, with purchases, with the same pattern in UA: ") )" starting from 2007. All UA's are IE with inclusion of "(compatible; MSIE 6.0; Windows NT 5.1; SV1)" in UA.

example:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) )

Not sure what "Bogus " in the title of the thread is though.


So the question stands: SV1 or NOT SV1

lucy24




msg:4674265
 7:18 pm on May 24, 2014 (gmt 0)

Does the ) ) come at the end of the UA string? If so, you can write a rule to exclude it, using the pattern

\)\ \)..

(extra . to ensure against trailing spaces).

The elements "MSIE 6.0" and "legitimate customers, with purchases" do not often come in the same sentence. Are you selling antique furniture or vintage cars or something? (Buyers who have money but haven't modified their computer since the grandchildren forced them to buy one in 2003.)

"SV1" makes me think of a UA that's currently blocked by my host's (optional) mod_security. I think it goes, in full,

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

While looking this up, I found some blocked requests from
Mozilla/1.22 (compatible; MSIE 2.0; Windows 95)
(referer spam for a Russian site with "prostitutki" in the name). There comes a point where you can only laugh :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved