Forum Moderators: phranque

Message Too Old, No Replies

Negative match in htaccess RewriteRule

         

csdude55

4:33 am on Apr 4, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How do I rewrite if a section doesn't match a defined string?

Example, I want to rewrite unless the (string) is "foo" or "bar". This works:

RewriteCond %{REQUEST_URI} !^/example/(foo|bar) [NC]
RewriteRule ^example/([-\w]+?)/? target.php?id=$1 [NC,QSA,NE,L]


But trying to make it a one-liner:

// doesn't match
RewriteRule ^example/(!foo|bar)/?$ target.php?id=$1 [NC,QSA,NE,L]
RewriteRule ^example/!(foo|bar)/?$ target.php?id=$1 [NC,QSA,NE,L]
RewriteRule ^example/(!foo|!bar)/?$ target.php?id=$1 [NC,QSA,NE,L]

// this matches, but it would catch strings that don't begin with example, too
RewriteRule !^example/(foo|bar)/?$ target.php?id=$1 [NC,QSA,NE,L]

// I tried negative lookahead, but these didn't match, either
RewriteRule ^example/(?!foo|bar)/?$ target.php?id=$1 [NC,QSA,NE,L]
RewriteRule ^example/(?!(foo|bar))/?$ target.php?id=$1 [NC,QSA,NE,L]

What's the magic trick? Or am I just asking too much of .htaccess?

phranque

4:57 am on Apr 4, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



the RewriteRule Pattern "is a perl compatible regular expression"

the RewriteCond CondPattern "is usually a perl compatible regular expression, but there is additional syntax available to perform other useful tests against the Teststring"

source: Apache Module mod_rewrite documentation [httpd.apache.org]

in other words, if you need a negation operator then you must use RewriteCond.

lucy24

6:09 am on Apr 4, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But trying to make it a one-liner:
Don’t try. Putting a negative into the body of a RewriteRule is, at best, asking for trouble.

Target should always start with / in an internal rewrite.

What’s the [NE] flag for? There's nothing in the [-\w] group--which incidentally is more safely expressed as [\w-] with the hyphen last--that could possibly be affected by escaping.

And why the [NC] in the condition? Do you have a legacy of RaNdOmLy CaSeD links to /example/foo.html that you also need to exclude?

csdude55

6:57 am on Apr 4, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I gotcha. I'm gonna have to revise how I do things in the near future, anyway, which I think will mean moving a lot of this to httpd.conf instead of the htaccess. But for now I'm just seeing what I can minimize.

The fact that cPanel plugs this in before every rewrite made my 5kb .htaccess file turn in to 20kb!

RewriteCond %{REQUEST_URI} ^/[0-9]+\..+\.cpaneldcv$
RewriteCond %{REQUEST_URI} ^/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} ^/\.well-known/pki-validation/[A-F0-9]{32}\.txt(?:\ Comodo\ DCV)?$
RewriteCond %{REQUEST_URI} ^/\.well-known/acme-challenge/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} ^/\.well-known/cpanel-dcv/[0-9a-zA-Z_-]+$
RewriteCond %{REQUEST_URI} ^/\.well-known/pki-validation/(?:\ Ballot169)?


Target should always start with / in an internal rewrite.

Oh, I didn't realize that... why? It seems to work either way, is there a potential condition where it would break without it?

What’s the [NE] flag for? There's nothing in the [-\w] group--which incidentally is more safely expressed as [\w-] with the hyphen last--that could possibly be affected by escaping.

My live script actually has 2 variables that are plugged in, I was just trying to keep it short for the example. So it looks more like:

RewriteRule ^example/([-\w]+?)?/?[^/]*/?([0-9]+)/? target.php?var=$1&id=$2 [NC,QSA,NE,L]

I kept [NE] to prevent it from sending var=$1%26id=$2

And why the [NC] in the condition? Do you have a legacy of RaNdOmLy CaSeD links to /example/foo.html that you also need to exclude?

I actually started using [NC] at every opportunity after watching my 72 year old dad try to use my site! Turns out that he keeps his caps lock on all the time and doesn't quite understand how to click on things, so he types direct addresses any time he can... which is quite painful to watch! A significant number of my users are older, and I figured that if one does it then others probably do. I don't think it hurts anything, so it's plugged in just-in-case.

lucy24

6:21 pm on Apr 4, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Starting a rewrite target with / is more about insurance. The default RewriteBase is / anyway, but if you say / explicitly it eliminates the chance of bad actors sneaking in with All Your RewriteBase Are Belong To Us motivations. Nothing will break if you don't do it; it's just a good habit, like starting all external redirect targets with the full protocol-plus-hostname.

Turns out that he keeps his caps lock on all the time
Criminy. But how does this work without mod_speling (or 2.4 equivalent, if there is such a thing)? If there is a request for /FILENAME.HTML, does the server know to look for /filename.html? Is it done as a redirect or as a rewrite?

I kept [NE] to prevent it from sending var=$1%26id=$2
Oh, yikes, I hadn’t considered the possibility of an ampersand. So that makes two situations where [NE] is essential. (The other, which is cited in the docs and is the only one I’ve personally used, is when you’re redirecting to a # fragment.)

csdude55

8:47 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Criminy. But how does this work without mod_speling (or 2.4 equivalent, if there is such a thing)? If there is a request for /FILENAME.HTML, does the server know to look for /filename.html? Is it done as a redirect or as a rewrite?

Nah, I didn't go that deep with it. The majority of the features on my site are named like:

example.com/foo
example.com/bar
example.com/blah

and they all rewrite to:

example.com/default/index.php?topic=(foo|bar|blah)

There are 32 of those potential topics, but there might be more as I build. There are more features on the site, of course, but that's a pretty popular feature.

I saw that he was using all-caps one day when he told me he never uses my site because he can't get it to work. I had him show me, and saw that he was going to:

EXAMPLE.COM/FOO

He couldn't understand what I meant about him doing it in upper case, so adding [NC] was just a quick and easy fix :-)

lucy24

10:23 pm on Apr 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



and they all rewrite to:
And from there it's presumably trivial to tell the php to flatten the casing. All good, so long as no search engine goes into entrapment mode (they typically don't in this situation*) and then you'd be into Duplicate Content.


* Some spiders, notably bingbot, have been known to request "filename" when it's supposed to be "FileName", but I've never known one to go FROM lower-case TO upper-case.

w3dk

12:05 pm on Apr 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



I kept [NE] to prevent it from sending var=$1%26id=$2


You don't need the NE flag to prevent the "&" being URL encoded when used as it's meta character in the query string part of the URL. (I think you are seeing another bug in the MWL htaccess online tester?!)

Rather confusing.... whilst the docs for the NE flag state that "by default, special characters, such as & and ?, ... will be converted to their hexcode equivalent" - I'm not aware of any situation when "&" is automatically encoded by Apache? It does convert "?" and "#" - but it is still context aware - in that it doesn't encode the first "?" (that delimits the query string), just subsequent "?" that appear in the query string itself (and do arguably need to be encoded).

lucy24

8:34 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, thanks w3dk, that’s reassuring.

Also a bit humiliating, because I have personally used & (ampersand) in Rewrite targets, as in
RewriteRule ^fun/panda\.html /fun/panda.php?animal=robot&page=panda [L]

So ... Yeah. Duh, it does not need escaping. Or would things, in any case, only be percent-encoded if the target is sent back out into the world as an external redirect?