Forum Moderators: phranque

Message Too Old, No Replies

Storing a pattern as an environment variable [E]

         

csdude55

10:18 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I broke this off from the other thread, since we were really discussing performance of environment variables rather than the actual code.

Revisiting this bit of code:

RewriteRule ^ - [E=PATTERN:foo|bar]

RewriteCond %{REQUEST_URI} ^/blah/?(?:%{ENV:PATTERN})?/?$ [NC]
RewriteRule ^blah/?(.*)/?$ /board/?topic=$1 [NC,QSA,L]

I expected example.com/blah/foo to be rewritten by this line, but it doesn't match.

Worse, this next line DOES match:

RewriteCond %{REQUEST_URI} ^/blah/(?:%{ENV:PATTERN})?/?[a-z-]+/?$ [NC]
RewriteRule ^blah/(.*)/?([a-z-]+)/?$ /board/?topic=$1&p=$2 [NC,QSA,NE,L]

but it sends board/?topic=fo&p=o ! I tried changing the pattern to a variety of strings, but the second rule always matches with the last letter being sent for p=.

If I change it from %{ENV:PATTERN} to simply coding (foo|bar), though, then the first rule does match:

RewriteCond %{REQUEST_URI} ^/blah/?(?:foo|bar)?/?$ [NC]
RewriteRule ^blah/?(.*)/?$ /board/?topic=$1 [NC,QSA,L]

What do you guys and gals think? Is there a flaw with using a pattern in [E], or is there something wrong in my coding logic?

lucy24

11:08 pm on Apr 12, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<tangent>
^blah/(.*)/?([a-z-]+)/?$ /board/?topic=$1&p=$2

I’m inclined to think that what you really mean is
^blah/(?:([^/]+)/)?([a-z-]+)/?$ /board/?topic=$1&p=$2
--and not only because non-final .* or .+ should be avoided at almost any cost.
</tangent>

The rest of the question begins to get into Personally I’m Not Touching That With A Barge Pole territory
:: looking vaguely around for w3 or someone like him ::
but we shall see. Is the idea that PATTERN is set to the string "foo|bar" and then mod_rewrite is supposed to interpret the | as a RegEx pipe? I am by no means certain you can do that.

w3dk

11:30 pm on Apr 12, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



As mentioned in your "other thread", you can't use variable expansion of the form %{VARIABLE} in any argument that expects a regex. Likewise, you can't use $N or %N backreferences either. Apache uses PCRE flavor of regex. Allowing %{VARIABLE} etc. would conflict with this (it would need to be parsed twice...?! But there could be unresolvable conflicts...?! ...Just thinking aloud.).

You can only use %{VARIABLE} syntax in the RewriteCond TestString (first argument) and RewriteRule substitution (second) arguments - which are "ordinary" strings, not regex.

If you attempt to use %{ENV:PATTERN} in the regex then you are trying to match the literal string "%{ENV:PATTERN}" - which is unlikely to match a real URL!

Worse, this next line DOES match:


You've made that part of the regex optional, so it's just being skipped and "/foo" is being matched by the last part of the regex "/?[a-z-]+" - so the condition is ultimately "successful".

but it sends board/?topic=fo&p=o !


Yes, that regex will do that! Regex is greedy by default and you have too many "optional" elements (which are effectively ignored). The only mandatory part is "([a-z-]+)" - which matches the trailing last character "o" and the greedy "(.*)" that precedes it matches everything else, ie. "fo".

w3dk

12:37 am on Apr 13, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Is there a flaw with using a pattern in [E] ...


Yes, I believe so, if the intention is to use that "pattern" (the value of an env var) as a regex in another Apache directive.

What is the reason behind separating out this subpattern?

csdude55

3:33 am on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is the idea that PATTERN is set to the string "foo|bar" and then mod_rewrite is supposed to interpret the | as a RegEx pipe?

Yup, that was the idea :-( I was hoping to do something similar to how you could do this in PHP:

$pattern = '#foo|bar#';
$result = preg_replace($pattern, 'blah', $string);


As mentioned in your "other thread", you can't use variable expansion of the form %{VARIABLE} in any argument that expects a regex.

I gotcha. I misunderstood, I guess... I thought that you meant I couldn't use %{VARIABLE} in the RewriteRule, but I could within RewriteCond.

What is the reason behind separating out this subpattern?

There are really 2 reasons:

1. In the actual code I have 5 rules that all match for the same pattern. I was hoping that I could store the pattern as a single variable and save some space.

2. I have 3 PHP scripts that all load the same array with these strings as the index (eg, $array = ['foo', 'bar']). After I go live there's the possibility that I'll add or modify this list of potential matches, so it would have been a lot easier to have one permanent list here that I could change one time, instead of having to change it in the .htaccess and each of the scripts.

So in the end it would have been smaller files and easier to maintain long term.

csdude55

6:00 am on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, this code works fine (as far as I can tell), but I was trying to simplify it:

# example.com/blah/foo
#-> board/?topic=foo
# example.com/blah
#-> board/?topic=
RewriteRule ^blah/?(foo|bar)?/?$ /board/?topic=$1 [NC,QSA,L]

# example.com/blah/foo/baseball-game/12345
#-> board/view/?topic=foo&id=12345
# example.com/blah/baseball-game/12345
#-> board/view/?topic=&id=12345
RewriteRule ^blah/(foo|bar)?/?[a-z-]+/([0-9]+)/?$ /board/view/?topic=$1&id=$2 [NC,QSA,NE,L]

# example.com/blah/foo/favorites
#-> board/?topic=foo&favorites=1
# example.com/blah/favorites
#-> board/?topic=&favorites=1
RewriteRule ^blah/(foo|bar)?/?favorites/?$ /board/?topic=$1&favorites=1 [NC,QSA,NE,L]

## this rule is for a legacy link
# example.com/blah/foo/search.php?search=baseball-game
#-> board/?topic=foo&p=baseball-game
# example.com/blah/search.php?search=baseball-game
#-> board/?topic=&p=baseball-game
RewriteRule ^blah/(foo|bar)?/?search.php /board/?topic=$1 [NC,QSA,L]

# example.com/blah/foo/baseball-game
#-> board/?topic=foo&p=baseball-game
# example.com/blah/baseball-game
#-> board/?topic=&p=baseball-game
RewriteRule ^blah/(foo|bar)?/?([a-z-]+)/?$ /board/?topic=$1&p=$2 [NC,QSA,NE,L]


I'm pretty sure that this works (not well tested), but I would still have to modify all of the PHP scripts:

## go ahead and set ?topic= but don't use [L], so the new variable is passed to the other rules
# example.com/blah/foo
# example.com/blah
RewriteRule ^blah/?(foo|bar)?/?(.*)$ /board/$2?topic=$1 [NC,QSA]

# example.com/blah/foo/baseball-game/12345
# example.com/blah/baseball-game/12345
#-> board/view/?topic=foo&id=12345
RewriteRule ^board/[a-z-]+/([0-9]+)/? /board/view/?id=$1 [NC,QSA,NE,L]

# example.com/blah/foo/favorites
# example.com/blah/favorites
#-> board/?topic=foo&favorites=1
RewriteRule ^board/favorites/?$ /board/?favorites=1 [NC,QSA,NE,L]

## this rule is for a legacy link
# example.com/blah/foo/search.php?search=baseball-game
# example.com/blah/search.php?search=baseball-game
#-> board/?topic=foo&p=baseball-game
RewriteRule ^board/search.php /board/ [NC,QSA,L]

# example.com/blah/foo/baseball-game
# example.com/blah/baseball-game
#-> board/?topic=foo&p=baseball-game
RewriteRule ^board/([a-z-]+)/?$ /board/?p=$1 [NC,QSA,NE,L]

lucy24

3:38 pm on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would still have to modify all of the PHP scripts
If something would have to be changed in lots of places, now is the time to figure out how to shift it all into a single php include so it only has to be changed once, no matter how many scripts in how many different places use it.

csdude55

7:02 pm on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That would be easy enough, I already have a variables scripts that I include on every PHP script, anyway. But I'd still have to change it in .htaccess or httpd.conf, too, so the "ideal" solution would be to set it as an environment variable.

I think I'll still set the pattern in [E], and just duplicate the value for the RewriteRule. I'll have to update the .htaccess when/if I add more in the future, anyway, and this way I'd just have to update the one file. And who knows, maybe a future version of Apache will let me use the value in the regex :-)

csdude55

7:05 pm on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On second thought... I discovered this tidbit on SetEnv:

The internal environment variables set by this directive are set after most early request processing directives are run, such as access control and URI-to-filename mapping. If the environment variable you're setting is meant as input into this early phase of processing such as the RewriteRule directive, you should instead set the environment variable with SetEnvIf.


[httpd.apache.org...]

Does [E] work the same way, only setting the value at the end of the script? If so, would using SetEnvIf to set the value mean that I could use the value within the regex?

lucy24

8:34 pm on Apr 13, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[E] is a flag in mod_rewrite. That means it is set by mod_rewrite, whenever it runs. A rough-and-ready guideline is “reverse alphabetical order”, meaning something like
mod_setenvif
mod_rewrite
mod_env
mod_authwhatsit
in that order.

csdude55

4:03 am on Apr 15, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On a similar note... is there a way to store a backreference as the value of an [E]?

This is showing a literal "$1" instead of the value of it:

RewriteRule ^([a-z]+)(?:/(.*)) /$2 [NC,E=VAR:$1]

phranque

6:45 am on Apr 15, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This is showing a literal "$1" instead of the value of it

strange!
it says here:
The full syntax for this flag is:

[E=VAR:VAL]
[E=!VAR]


VAL may contain backreferences ($N or %N) which are expanded.

source: https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_e

lucy24

4:11 pm on Apr 15, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is showing a literal "$1" instead of the value of it:
Where and how is it “showing” the environmental variable? Please say it isn’t on that same htaccess-testing site that I distrust so strongly.

:: memo to self: experiment on test site, using LogHeaders function (which also tracks environmental variables that I set in mod_setenvif) ::

lucy24

5:32 pm on Apr 15, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



An hour later ...

Wow, I'm glad this question came up, because it turns out that ever since my server upgraded its php at the beginning of the month, getenv() is no longer willing to accept an argument, although it’s supposed to. This in turn led to some blood, toil and sweat before I arrived at
if (preg_match ('/(REDIRECT_)?[a-z]/',$name) && $value)
replacing the earlier
if ($value)
which I do to keep my logged headers from turning into a bloated unreadable information dump. The REDIRECT_ element is needed for when the headers are logged on an error document; the [a-z] part is because my custom environmental variables are all lower-case while the Apache ones are in ALL CAPS. (One exception, which I can live with.)

It was worth it, however, because it led to a discovery. The rule under discussion is
RewriteRule ^([a-z]+)(?:/(.*)) /$2 [NC,E=VAR:$1]
which on my test site turned into
RewriteRule ^([a-z]+) - [E=test1:$1]
(initially I absent-mindedly left the $2, which created a server error).

!important
The string that was written into an environmental variable as $1 was not the name of the overall page, whether that was an existing URL or the name of an error document. Instead it was the name of the navigation footer, included via SSI, which in its turn includes the logheaders function. To prevent this from happening, add the [NS] flag so the variable retains the value it had at the time of the page request.

You also need to keep the environmental variable from being set on non-page requests, so make sure you constrain the rule to pages.

csdude55

12:16 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still running PHP 5.x until I get everything rebuilt. But I think I figured it out... [VAR] is wrong, but [REDIRECT_VAR] shows the value correctly. I'm not sure why it has the prefix, though.

REDIRECT_ environment variables are created from the environment variables which existed prior to the redirect.


[httpd.apache.org...]

That's about as clear as mud :-(

My guess is that it's prepending with REDIRECT_ because I have a destination in the rewrite, instead of just - ?

## first RewriteCond makes sure that VAR doesn't match an existing directory, so
## (foo|bar) is actually a list of every physical directory
## maybe I should do this instead?
# RewriteCond %{REQUEST_URI} !-f
# RewriteCond %{REQUEST_URI}/index\.php !-f
# RewriteCond %{REQUEST_URI} !-d
RewriteCond %{SCRIPT_NAME} !/(?:foo|bar) [NC]
RewriteCond %{QUERY_STRING} !(?:^|&)var=[a-z]+ [NC]
RewriteRule ^([a-z]+)(?:/(.+))?$ /$2?var=$1 [NC,E=VAR:$1,L]


Then I just look for it in PHP with:

<?php print_r($GLOBALS); ?>

The ones I set like this show up without the prefix:

RewriteRule ^ - [E=BLAH:boop]

phranque

1:39 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That's about as clear as mud :-(

especially considering a custom error document is normally more like an internal rewrite than a redirect...

lucy24

2:00 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That part makes sense, because if it were an external redirect, it would be an entirely new request and the server wouldn’t know anything about the earlier one. Apache docs do tend to use “redirect” in places where we prefer “rewrite”. Personally I like double markedness: “external redirect” vs. “internal rewrite” to make it all crystal clear.

Incidentally, I did eventually figure out that
(a) there is no point to saying
/(REDIRECT_)?blahblah/
when there was no opening anchor, because it will match anyway, and
(b) since all the original variables are retained, I don’t need the REDIRECT_ version at all, and can proceed directly to /^[a-z]/ to get the same information. (In my case I decided to say [abce-z] to omit “dsid”, whatever the heck that is, which is present on every request. None of my own env vars happen to start in “d”, so that was easy.)

There are times when it would be awfully useful to know that there has been an internal redirect (i.e. a rewrite, not necessarily through the agency of mod_rewrite), but in my logged headers it is just clutter.

phranque

2:17 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That part makes sense, because if it were an external redirect


clearly stated (with "double markedness" even):
None of these will be set if the ErrorDocument target is an external redirect (anything starting with a scheme name like http:, even if it refers to the same host as the server).

source: http://httpd.apache.org/docs/current/custom-error.html#variables

csdude55

3:34 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I found this tidbit:

For per-directory and htaccess rewrites, where the final substitution is processed as an internal redirect, environment variables from the previous round of rewriting are prefixed with "REDIRECT_".


[httpd.apache.org...]

Am I to understand, then, that:

(a) the REDIRECT_ prefix is being set because of the target set other than - ; and

(b) when I move it to httpd configuration instead of .htaccess then the prefix will go away?

phranque

5:33 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



when I move it to httpd configuration instead of .htaccess then the prefix will go away?

it depends.
"per-directory and htaccess rewrites" would include the contents of <Directory> containers in server config files.
https://httpd.apache.org/docs/current/mod/core.html#directory

csdude55

6:09 am on Apr 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We're getting a tad beyond my expertise, but I don't think I'll need the <Directory> container... I'll just be moving everything to <VirtualHost> where ServerName matches my domain. It'll be awhile before I get there, though, so I guess I'll just code my PHP with:

if ($_SERVER['VAR'] || $_SERVER['REDIRECT_VAR']) { ... }

lucy24

3:32 pm on Apr 16, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't think I'll need the <Directory> container...
I was going to say that you may want to keep it for a handful of access-control rules that will be the same for all hostnames everywhere, without exception:
--universally block access to ^\.ht (reminder: this just means that people can’t request and view .htaccess and .htpasswd; it doesn’t prevent the server from reading and acting on them)
--universally allow access to error documents with a consistent name such as “forbidden.html”
--universally allow access to robots.txt

But then again, the <Files> and <FilesMatch> envelopes can also lie loose in config, with no containing <Directory>. Your call.

if ($_SERVER['VAR'] || $_SERVER['REDIRECT_VAR']) { ... }
It may not even be necessary, based on what I’m seeing in logged headers: the REDIRECT_ version comes in addition to, not instead of the original version.

w3dk

11:10 pm on Apr 16, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Taking a step back for a moment...

I would still have to modify all of the PHP scripts


If something would have to be changed in lots of places, now is the time to figure out how to shift it all into a single php include so it only has to be changed once, no matter how many scripts in how many different places use it.


This.

That would be easy enough, I already have a variables scripts that I include on every PHP script, anyway. But I'd still have to change it in .htaccess or httpd.conf, too, so the "ideal" solution would be to set it as an environment variable.


To be honest, I don't really see why you need to add this additional layer of "validation"(?) in Apache config to begin with? This would seem to be just adding an additional layer of complexity/maintenance/potential bugs?

csdude55

6:32 am on Apr 17, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To be honest, I don't really see why you need to add this additional layer of "validation"(?) in Apache config to begin with? This would seem to be just adding an additional layer of complexity/maintenance/potential bugs?

Right now I'm doing this, where /blah/ is a real directory with /blah/index.php:

RewriteRule ^ - [E=TOPICS:foo|bar]

# if this link is an exact match, make this [L]
RewriteRule ^blah/(foo|bar)/?$ /blah/index.php?topic=$1 [NC,QSA,L]

# else, set ?topic= and continue on to other rules
RewriteRule ^blah/(foo|bar)(?:/(.+))?$ /blah/$2?topic=$1 [NC,QSA]

RewriteRule ^blah/[a-z-]+/(\d+)$ /blah/view/index.php?id=$1 [NC,QSA,L]
RewriteRule ^blah/favorites$ /blah/index.php?favorites=1 [NC,QSA,L]
RewriteRule ^blah/([a-z-]+)$ /blah/index.php?p=$1 [NC,QSA,L]

The way I'm setting up my message board, the user can go to example.com/blah and see every thread under every topic. Or they can select from a list to narrow it down by predefined topics (in this example, "foo" or "bar"). If they do then they're redirected to example.com/blah/foo or example.com/blah/bar.

If they're on the full list (without foo or bar) and click to read a thread, the link is example.com/blah/urified-title-thread/12345, which rewrites to /blah/view/index.php?id=12345

If they're using a filter and click to read a thread, the link would be example.com/blah/foo/urified-title-thread/12345, which rewrites to /blah/view/index.php?topic=foo&id=12345

They can also select to view threads they've saved, which would be at example.com/blah/favorites

And finally, if the user leaves off the ID and just goes to example.com/blah/urified-title-thread (which happens more often than you would think), I want them to rewrite to /blah/index.php?p=urified-title-thread. Then PHP uses the "p" param to search all of the subjects.

If I don't predefine the list of acceptable topics then the script wouldn't know whether that 2nd field is a topic, urified subject, or a search query.