Forum Moderators: coopster & phranque

Message Too Old, No Replies

Why is this regex matching everything?

         

csdude55

4:11 pm on Dec 11, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There has to be a typo here, I just can't see it.

This works as expected:

$ENV{'HTTP_USER_AGENT'} eq 'Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-S903VL Build/MMB29M) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/6.2 Chrome/56.0.2924.87 Mobile Safari/537.36' ||


while this is matching everything:

$ENV{'HTTP_USER_AGENT'} =~ #Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-S903VL Build/MMB29M) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/[^A-Za-z]+ Chrome/[^A-Za-z]+ Mobile Safari/537.36# ||


They should be the same, I just changed the 6.2 and 56.0.2924.87 to [^A-Za-z]+. Of course, I realize that the unescaped . in the second one matches any character, but I'm not too concerned about that right now; I'll go back and escape it once this issue is figured out.

I left the || at the end just in case it's relevant; this is a snippet of a larger if-else statement, but this is the only part of it giving me attitude.

TIA!

not2easy

6:22 pm on Dec 11, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Maybe try replacing the [A-Za-z]+ with [0-9]+ because the a-z only covers the alpha and not the numeric part of the target, and the target is numeric. The one for 6.2 would be [0-9]\.[0-9] I believe.

csdude55

7:38 pm on Dec 11, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, it's the darnedest thing... I changed the regex delimiter to // instead of ##, then escaped the / in the pattern, and it worked fine.

Am I wrong that I can interchange the // for ## like that? I don't code in Perl as much as I used to so I'm a little rusty, but I was almost certain about that one!

csdude55

8:56 pm on Dec 11, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Grrr... when changing the delimiter, I have to use the m modifier... =~ m##;

I swear I think someone on here told me that a few years ago and it slipped my mind again :-( I need more sleep at night or something.

phranque

11:50 pm on Dec 11, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



https://docs.perl6.org/language/syntax#Single-line_comments
...comments in Perl 6 starts with a single hash character # and goes until the end of the line.


the m modifier changes the hash syntax in this case...

csdude55

8:54 pm on Dec 16, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, as an unexpected followup, why didn't THESE match?

$filter = 0;

$user_agent = 'Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-S903VL Build/MMB29M) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/6.2 Chrome/56.0.2924.87 Mobile Safari/537.36';

if ($user_agent =~ m#Mozilla/5.0 (Linux; Android 6.0.1; SAMSUNG SM-S903VL Build/MMB29M) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/[^A-Za-z]+ Chrome/[^A-Za-z]+ Mobile Safari/537.36#) {
$filter = 1;
}

print $filter; # returns 0


IIRC, using [^A_Za-z] should catch anything that's not a letter, right? So it should match any number and the .?

lucy24

11:27 pm on Dec 16, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[^A_Za-z]
I hope that was a typo, since the quoted code above has the expected A-Z.

The thing that jumps out at me is: “not a letter” includes “is a space”. So at a minimum you'd want to say [^A-Za-z ] but isn't it more likely to work better as [\d.] assuming the string is made up only of numerals and dots? For that matter, unless you're explicitly excluding letters, a simple \S would do.

It also makes me uneasy to see literal parentheses expressed as ( and ) rather than escaped to \( and \) but then I don't speak perl.