Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regex not working like I expected

Am I making a newbie error here?

         

csdude55

9:13 am on Dec 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi all,

This is really a general regex question, and while I'm doing it in Perl, I guess it's not necessarily language specific.

I have a regex that is catching more than I expect. Here is my code:


if ($comment =~ /902|921/) { print "Found"; }


Someone submitted the following, though, and it matched the regex:

$500 for a 2bedroom


Now, I expected it to only catch if $comment contained either "902" or "921". Theoretically, I could understand if it would only match 9 followed by 0 followed by (2 or 9), followed by 2, followed by 1, but that's not happening, either. The best I can tell, it only matched because of the single 2.

I thought that the above was a fluke until someone made a much longer post today that matched the regex. The only numbers it contained were:

128533


Am I misunderstanding how the regex works? Or is there something different about matching integers than a regular text string?

TIA!

lucy24

9:37 pm on Dec 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I expected it to only catch if $comment contained either "902" or "921"

So would I. So, sadly, this is not a RegEx question but a Perl syntax question :( It looks as if it's treating the pattern as [902]|[921] i.e. [0129]: match any one of the specified numerals.

Incidentally, the pattern could be expressed as "9(02|21)" though I don't suppose you'd gain anything in efficiency in this specific context, unless the pattern is really supposed to be \b9(02|21)\b or possibly \D9(02|21)\D to match a set of exactly three numerals

csdude55

9:57 pm on Dec 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you think that it it just because the regex only matches for numbers?

I have several hundred regexes in a few different filters, and most (if not all) of the others search for text. I haven't noticed a problem like this before with those, but just because I haven't noticed a problem doesn't mean that it's not happening.

robzilla

10:36 pm on Dec 19, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is that an exact copy of your code? Because I'm unable to reproduce.

if ("$500 for a 2bedroom" =~ /902|921/) {
print "Found";
} else {
print "Not found";
}

Returns "Not found", as it should. What happens if you just put that in a script and run it?

If you still get "Found", I don't get it. If you get "Not found", you can stop focusing on the regex.

I would probably avoid regular expressions in this scenario and use index instead for a string-in-string search. It's faster and you won't have this problem, but you'll probably need to write a simple function to pass an array to if you need to search for multiple strings.

csdude55

11:29 am on Dec 20, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm, I'm not sure what to think, then. The exact code is more like:


# Earlier in the script
$comment = param('comment');

if (
# $comment =~ /whatever ||

(
(
$host eq '123.45.67.89' ||
$host eq '321.54.76.98'
) &&
$comment =~ /902|921/
)
) { $filter = 1; }


In production there are dozens of matches that could make $filter = 1, but this is the only one related to this particular IP so it's the only one that should have been able to match.

$comment is submitted by a site user. The $500 for a 2bedroom is exactly what was submitted by a user with an IP of 123.45.67.89, so the only way it should have matched is if $comment matched /902|921/. They made an identical post with a different IP and it did not match.

If you're not duplicating it in a simple script, though, then it could possibly be related to one of the modules or homemade libraries that I'm using. I'll try to dig deeper, then, and if I can't figure it out then I'll post back. Thanks for checking!

eurohttp

8:53 pm on Jan 17, 2016 (gmt 0)

10+ Year Member



I would do instead of this:
if ($comment =~ /902|921/) { print "Found"; }
This:
if ($comment =~ m/902|921/) { print "Found"; }