Forum Moderators: phranque

Message Too Old, No Replies

Regex, matching whatever is here that's NOT defined

         

csdude55

6:00 am on Jan 27, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let's say that I have a regex like this:

$str = 'this';
if ($str !~ /^(foo|bar)/i) { ... }


Is there an easy way to find what IS in this area, since it's neither foo nor bar? $1 is just empty.

This works, but it feels hacky and I was hoping to not have to run 2 regexes (regii?):

if ($str =~ /^([A-Za-z]+)/ && $1 !~ /^(?:foo|bar)$/i) { ... }


In the one I'm actually working on, there are 26 pre-defined strings that I don't want to match. So I really don't want to have 26 sections of $1 !== 'foo' && $1 !== 'bar' && blahblahblah, either.

Any better suggestions?

lucy24

6:19 am on Jan 27, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since I don't normally touch Expressions with a barge pole, I don't know if you're allowed to use lookaheads, on the order of
^(?!foo|bar|zippity|doo|dah|day)[A-Za-z]+$

But good lord, excluding 26 specific strings? As I said in a parallel thread, it may help to step back and explain in English what the ultimate goal is.

csdude55

4:58 pm on Jan 27, 2023 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But good lord, excluding 26 specific strings? As I said in a parallel thread, it may help to step back and explain in English what the ultimate goal is.

I'm sorry, I missed when you asked that in another thread! I didn't intentionally ignore it, I apologize for that.

In this particular case, I'm using Apache and matching the first section in REQUEST_URI, but want to exclude any real or virtual directory in the account. There are 21 real and 5 virtual directories.

For example, I have a directory named "foo", and have "bar" rewritten to it. I know that I could use -d to test for "foo", but I'd still have to exclude "bar" manually. And I thought that -d might be slower to process than manually adding "foo" to the regex that already has to exist for "bar", but I'm not sure.

So if the user goes to www.example.com/blerg/, I need to check whether "blerg" exists. If not, I take "blerg" and send it somewhere else. (Note, I know that I would need to have the ^/ in the match, too; I left it out for the sake of making the post easier to read)

But I've run across this issue in the past, too, so I was trying to be a little more generic for the sake of my own education.

I don't know if you're allowed to use lookaheads

It's my understanding that Apache uses PCRE, so I tested it in Perl. I had to wrap [A-Za-z] in ( ), too, and then it worked perfectly! :-D Thanks @lucy24, I still struggle with understanding lookaheads and lookbehinds.

phranque

12:00 am on Jan 28, 2023 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



this will require 2 rulesets - one to capture the string and store it in an environment variable, and another to act on the excluded cases.
you may also need an intervening or subsequent ruleset to unset the environment variable for the excluded cases.