Welcome to WebmasterWorld Guest from 54.146.11.8

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

=mozilla/5 (windows, u, windows nt 6.1, en-us)

     
12:53 pm on Mar 30, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 137
votes: 12


The UA is "=mozilla/5 (windows, u, windows nt 6.1, en-us) applewebkit/534.16 (khtml, like gecko) chrome/10.6480.204 safari/534.16". It shows up on my log as "Err:509" and really confuses, then freezes my spreadsheet software. It seems to preceeding "=" really does something odd.

I have had this guy pester me for quite a long time. My UA ban of
SetEnvIf User-Agent "\=mozilla" keep_out

does not work. Someone must have seen this before. Can someone tell me what they are doing and how to ban it? Thanks.

Today's IP: 216.19.209.232
216.19.192.0 - 216.19.223.255
Organization: GetNet Inc.
4:58 pm on Mar 30, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13923
votes: 496


Huh. Mine's a more comprehensive
BrowserMatch ^\W bad_agent
(initial non-word character). "BrowserMatch" is a useful setenvif shortcut, identical to "SetEnvIf User-Agent".

The quotation marks and the escape are both unnecessary. In mod_setenvif, the only time you really need quotation marks is to "protect" a literal space in the string you're evaluating. The = sign isn't a reserved character in RegEx-in-general; it's only got syntactic meaning in a few specific Apache contexts, and this isn't one of them. (Not sure, but the quotation marks here may even mean that \= is interpreted as literal backslash.)

Is it really lower-case "mozilla"? Yuk.

:: detour to check recent logs ::

Huh. I know I've seen this configuration--with leading = sign--but they don't seem to have been around lately. What I do see a scattered handful of is
--user-agent=Mozilla/5.0 etcetera
or
User-Agent=Mozilla etcetera
where the latter is another pattern I formerly blocked ("formerly" because it's obviously a hallmark of an exceptionally stupid robot, meaning they'll never get through the gate anyway so why bother making the server check at all).
7:24 pm on Mar 30, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 137
votes: 12


Let me try your:
SetEnvIf User-Agent =mozilla/5.0 keep_out

and see what happens.

^\W does this mean start of the string, one character, negate (word character)? That's brilliant regex! So small yet so powerful. I added it as well.
SetEnvIf User-Agent ^/W keep_out
9:29 pm on Mar 30, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13923
votes: 496


Yes, ^ means start of string. I think it's the only symbol that has a completely different meaning inside grouping brackets.

Here it's obviously essential, because legitimate UA strings do contain non-word characters; they just don't belong at the beginning. (Before implementing this rule, I verified that no law-abiding robot or human browser has a parenthesis right at the beginnning.) But it's also useful when the name of a robot--whether welcome or otherwise--happens to come at the very beginning, because then the server doesn't have to go through the entire UA string looking for it. "Doesn't start with EvilSpamBot? I'll move along to the next rule, then."

In RegEx-in-general, capitals are a standard form of negation. So if \s means space, \S means non-space; where \d means digit (aka [0-9]), \D means non-digit and so on.

It occurred to me after-the-fact that if a UA string starts in = it's most likely because the botrunner had a formula beginning in "User-Agent=" and then they deleted "User-Agent" but forgot to delete the = sign. Hah. Even better is when they send a misspelled header, like "Useragent" or "Referrer" (for, ahem, a given definition of "misspelled") or "X-Fowarded-For".

SetEnvIf User-Agent ^/W keep_out

I sure hope that was a / typo :) It wouldn't be actively harmful; it would just prevent the rule from having its intended effect.
2:37 am on Mar 31, 2017 (gmt 0)

Junior Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 137
votes: 12


That was a big 'ol error, and thanks for the correction.
SetEnvIf User-Agent ^\W keep_out
6:56 am on Mar 31, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:9282
votes: 449


You do know it will continue to show in your logs, just a 403 now.

Note: there's also a bookmark checker with the UA: =mozilla
7:56 pm on Mar 31, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13923
votes: 496


there's also a bookmark checker with the UA: =mozilla

You mean just "=mozilla" and that's it?

:: endless rhetorical question of why robots persist in doing things that increase their chances of being blocked ::
8:13 pm on Mar 31, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:9282
votes: 449


Yes (and we've discussed it, albeit years ago.)

Well that's just it. Without a working knowledge of regex, most can't block it

However the bookmark checker/updater isn't a bot & personally I allow it.