Welcome to WebmasterWorld Guest from 18.207.132.114

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Short after school REGEX tutoral needed

blocking UA fragments

     
12:04 am on Dec 16, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


I'm embarrassed to ask this, because I thought my REGEX was already up to it, but I'm having the devil of a time blocking the Mozilla fragment user agent

Mozilla/5.0 (compatible)


without also blocking myself and presumably all others using a full Mozilla UA string.

I have no problem blocking

^Mozilla$


or

^Mozilla/5\.0$


as indicated above, but when it comes to the longer partial (without the standard full UA terminal semi-colon)

Mozilla/5.0 (compatible)


neither

^Mozilla/5.0\ (compatible)$


nor

^Mozilla/5.0\ \(compatible\)$


is working; the latter throws a 500 rather than a 403 like all the preceding.

There's obviously some gap in my understanding of REGEX here; what am I doing wrong?
1:18 am on Dec 16, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


You have jumped from ^Mozilla/5\.0$ to ^Mozilla/5.0$ in your code.

Escape.

...
1:34 am on Dec 16, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


The problem is in the " " (literal space). Since spaces have syntactic meaning in Apache, you need to either escape the space (usual approach in a RewriteCond) or put the whole thing inside quotation marks (usual approach in mod_setenvif). I'm guessing you are in mod_rewrite, because the equivalent error in mod_setenvif probably wouldn't cause a 500, it would just lead to unintended consequences in a mild way.
2:33 am on Dec 16, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


The problem is in the " " (literal space).

The space is properly escaped in both of the failed examples given.

The brackets are properly escaped in only one of them.

The Mozilla version is properly escaped in neither.

...
2:39 am on Dec 16, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


Samizdata: Sorry, I believe the place I neglected to escape the . was here (typo).

Lucy24: I'm using BrowserMatchNoCase (mod_setenvif).
 ^Mozilla/5\.0 (compatible)$
throws a 403 for UAs beginning with Mozilla/5.0space Escaping the space
^Mozilla/5\.0\ (compatible)$
throws a 500.

Are you suggesting I ditch the ^ and $ and use
"Mozilla/5\.0 (compatible)"
instead? That doesn't throw a 403, but I believe I had something very similar to that before which did nothing.

OTOH, all the UAs I've seen with (compatible in them follow compatible with a semi-colon rather than a closing parenthesis, so maybe that would be sufficient to do it, i.e., block for the construction surrounding compatible rather than the truncated UA string itself.

I still don't understand, though, why
^Mozilla/5\.0\ (compatible)$
creates problems, or
^Mozilla/5\.0\ \(compatible\)$
if supposedly I need to escape ( and ) as well.
5:49 am on Dec 16, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


I actually overlooked one of your \ backslashes when replying. Oops. But as it turns out, I was mistaken for a different reason.

In mod_setenvif, if the pattern contains a space, you need quotation marks in addition to--not instead of--any opening and/or closing anchors. Anchors go inside the quotation marks. This may be strictly a mod_setenvif rule; in most situations the "" would convert the whole thing into a literal string, but here it remains a RegEx that just happens to contain a space.

In any RegEx, regardless of context, you have to escape parentheses, because otherwise it is interpreted as:
Match the exact string "Mozilla/5.0 compatible" and capture the "compatible"
--meaning that the pattern will simply never match. Not a 500, just no results.

:: detour to test site ::

After some experimenting, I got it to throw 500s on a regular basis. I think I'd never actually tried escaping spaces in mod_setenvif, though I've done so in mod_rewrite. Turns out you simply can't do it, maybe because mod_setenvif--unlike mod_rewrite--doesn't have an exact number of permitted syntactical units per line.

In the non-500 versions, what's really happening is that if you say, for example,
BrowserMatch ^Mozilla/5\.0 (compatible) keep_out
you're setting two environmental variables: one called (compatible) and then another called keep_out. And if you change it to \(compatible\) it's still the same thing, only now the literal backslashes are part of your variable name. Once you've departed from the RegEx--which happens as soon as you pass the first non-quoted space--everything reverts to being a literal character.

:: more experimenting ::

If you use quotation marks anywhere other than the very first thing in the pattern (where anchors count as part of the pattern), they will simply be interpreted as literal characters. And if you do have a quotation mark at the beginning of your pattern, you'll need a second one, or that's a 500 again.

Aaaaand . . . That's why I have a test site. Nobody gets hurt ;)
5:19 pm on Dec 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


So, in mod_setenvif, I'll need anchor, escape, escape, escape, anchor

^Mozilla/5\.0 \(compatible\)$


wrapped in quotes as well

"^Mozilla/5\.0 \(compatible\)$"


Correct?

And if in mod_rewrite, no quotes, but then I would have to escape the space as well

^Mozilla/5\.0\ \(compatible\)$


Please confirm or correct.

[edited by: JamesSC at 6:04 pm (utc) on Dec 17, 2018]

5:49 pm on Dec 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 29, 2006
posts:1378
votes: 18


^Mozilla/5.\0\ \(compatible\)$

The Mozilla version now has an escaped zero rather than an escaped point/period.

...
6:05 pm on Dec 17, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


The Mozilla version now has an escaped zero rather than an escaped point/period.


Arrg. Fixed. Thanks.
10:10 pm on Dec 17, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


Please confirm or correct.
Yup.

Incidentally--not that you'd ever try this--in mod_rewrite, you cannot have a line-final escaped space, unless you particularly enjoy 500 errors. (I only know this because I've done it; can't remember if I was experimenting or if it was intended to work.)

Finally, there's an alternative in all situations: the special character \s meaning "whitespace of any kind". Quick detour to test site confirms that it works in both modules, creating an easy way to bypass the syntactical-space issue. The locution \s also means tab, line break, nonbreaking space and so on--but none of those would be likely to occur in a user-agent string. (And if they did, you could assume the visitor is up to no good.)
1:04 am on Dec 18, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


Finally, there's an alternative in all situations: the special character \s meaning "whitespace of any kind".


Okay, so

^Mozilla/5\.0\s\(compatible\)$


would work in either mod_rewrite or mod_setenvif, although requiring additional fore and aft quotes in the latter?
1:15 am on Dec 18, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15944
votes: 890


In mod_setenvif you would no longer require the quotation marks, since the string no longer contains a literal space. That's the advantage of doing it this way.

If the test string contains a single space, you have achieved a net savings of one byte; if it contains two spaces, you break even :) though I seriously doubt this would be a real-life factor in deciding which form to use. As so often, we are now in personal-coding-style territory.
4:43 am on Dec 18, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Oct 4, 2018
posts: 45
votes: 2


In mod_setenvif you would no longer require the quotation marks, since the string no longer contains a literal space. That's the advantage of doing it this way.


Yes...of course. Brilliant.

Dang if this wasn't as informative as it was useful. Thanks, Lucy24!