Forum Moderators: phranque

Message Too Old, No Replies

SetEnv not behaving as expected

Some HTTP headers work, others do not

         

dstiles

2:38 pm on Jan 5, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Either I have misunderstood the method somehow or I'm doing something wrong.

For some time I have been gradually implementing SetEnv as a means of blocking the more obvious nasties to a handful of minor, low-traffic sites I have moved to linux apache from IIS. These are sites I can afford to make mistakes on. I have discussed my approach to this in the past and have received encouraging responses, many of which I've implemented or adapted.

The blocking file contains a number of SetEnv(If/etc) directives and some Require statements. I also use a few <if> statements. What I have at the moment seems to have worked well for several months, with a few updates and additions. I added a logging mechanism in php for 200-response files and have now adapted the server to use a php ErrorDocument which can then log the non-200 responses.

The logging is a multi-line record per hit. The format of the records is:

IP Date ResponseCode
Host Page Pre-redirectedPage(if relevant)
EnvVars (Name: Value:regex) (one per line)
HTTP headers (one per line)

A typical EnvVars section, obtained using PHP's apache_getenv(), is:

accept: none
accept_lang: none
badua: old_browser:Chrome/66.
BlockCountry: 1
browser: chrome:Chrome/66
ips: amazon:18.237.
proto: too_low:HTTP/1.0
useragent: scrape:iodc

All of the above are trapped by a variety of SetEnvIf or BrowserMatch and are reported correctly, as are a few more such as bot. Ones that are not reported are things like:

SetEnvIf Remote_Host 10.0.1.21 host=us:$0 (our IP, obfuscated)

<if " %{Remote_Host} =~ m#\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}# ">
SetEnv host=numbers:$0
</if>
SetEnvIf Request_URI "\/\.?wp-" badrequri=wp:$0
...or its alternative...
<if " %{Request_URI} =~ m#wp-#i ">
SetEnv badrequri=wp:$0
</if>

In fact, none of the REQUEST_URI, REMOTE_HOST, QUERY_STRING, HTTP_REFERER, HTTP_COOKIE are reported, although they do work in trapping bad requests (as logged in the apache error log and site logs). As far as I can tell, my methods are as documented by apache.

So, what am I doing wrong or misunderstanding, please?

lucy24

1:19 am on Jan 22, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How are you checking the environment var?
By name. I first tried a {foreach} approach working straight from getenv, essentially the identical code to the function that gets any and all headers. But it dumped out every conceivable environmental variable--including the ones that simply repeat information I've already got--clogging up the file with information I didn't need and wasn't interested in. So instead I went to a function that starts with

$counter = array("keep_out","bad_protocol",
-- et cetera, listing all the environmental variables I personally use--and then the foreach is applied to the array.

And then to strip the log file down still further I put in a bit that goes
foreach (getenvif() as $name => $value)
{
if ($value)
{ fwrite($fh, "$name: $value\n"); }
}

So my logheaders file only lists the environmental variables that actually have a value. That's why the empty ones, like $2 when there haven't been two captures, don't get logged. (For headers it isn't an issue, since it's rare to send an empty header--and potentially interesting to know, when it does happen.)

If server logs can be trusted, I'm on Apache/2.4.29

w3dk

1:59 am on Jan 23, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



if ($value)


This checks whether $value evaluates to (bool)True - not just whether it has a value. A string like "0" (that evaluates to (bool)False) would also be excluded. So this naturally does not differentiate between (bool)False (the return value from getenv() when the var is not set) or an empty string value.

Still no idea where the "2" was coming from? Unless you're doing a straight getenv() / var_dump() how can you rule out some other bug?

Or, what do you see in the following "redirect"?


# Request "/foo?question"
SetEnvIfExpr "%{QUERY_STRING} =~ /.+/" thisworks=$0
RewriteRule ^foo$ /bar?thisworks=@%{ENV:thisworks}@ [R,L]

lucy24

5:25 am on Jan 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huh.

request /foo?something
>>
https://www.example.com/bar?thisworks=@2@

Remove the paired @ from the target, and I get
https://www.example.com/bar?thisworks=2
so I hope they were meant to be literal text.

And if I change the SetEnvIfExpr rule to /(.+)/ then it gives me whatever the query string was.

A string like "0" (that evaluates to (bool)False) would also be excluded.
Yes, that's essentially what I want. I can't currently think of any situation where it would be useful to know that suchandsuch variable has been set, but has a value of zero (or of the string "" in the case of a nonexistent capture). In this respect it's analogous to headers: for access-control purposes it makes no difference whether a given field is empty, or if it wasn’t sent at all.

Edit: It should be pointed out that I only speak about three words of php, so there’s a limit to what I can do. For example my getheaders code is based on something incrediBill once posted, with just a few minor tweaks. (Notably his version used a direct write-to-file, which has unfortunate results when two requests come in right on top of each other. I think dstiles deals with it by building a string and then writing the whole string at once; I use an output buffer.)

w3dk

12:13 pm on Jan 23, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



The "@" was just a delimiter I stuck in at the last minute in case there was something else lurking there (it seems not).

Hhhmm... still a "wtf" then. Maybe we're seeing "undefined behavior"?! I did just wonder whether using the alt regex syntax "m#.+#" would make any difference? I wouldn't think it would and it doesn't make any difference at my end. But otherwise I'm out of ideas as to where that "2" might be coming from?

(Aside: I did just try this on a LiteSpeed server, until I realised that LiteSpeed doesn't appear to support SetEnvIfExpr at all. But, typical LiteSpeed (it seems), it just fails silently on anything it doesn't understand - which makes debugging jaw-droppingly frustrating at times!)

dstiles

2:52 pm on Jan 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy:
So, in your findings, SetEnvIf can use the $0 etc. captures from the <If> envelope?

No, the SetEnvIf gets its value from WITHIN the <if>...
SetEnvIf Remote_Addr .* amazon ips=amazon:$0

Trying out your suggestions (for which many thanks!), from a few cookie traps...
SetEnvIfExpr "%{HTTP_COOKIE} =~ /.+/" cookie=notours:$0

...got an incorrect value for the cookie compared with the expected value. In fact in one case it returned ll(dot)org(dot)uk which were the final characters of the domain name; it should have been "humans-" something.

What did work was:
SetEnvIfExpr "%{HTTP_COOKIE} =~ /(.+)/" cookie=notours:$0

Then I forced a Querystring hit against a similar (.+) and got an expected...
query: inject:xx=md5

w3dk:
I reluctantly accept that, but whether or not they are server variables, if they are of a similar type I would expect a similar algorithm to be applied from a consistent set of triggers - SetEnvIf QUERY_STRING should behave the same as SetEnvIf REQUEST_URI. There are probably reasons for NOT doing that but I'm sure they can be worked around to present a consistent interface. Unfortunately the internet, especially the web, is riddled with inconsistent code, from html and css to servers and languages. I am aware that what I EXPECT will often not be the case, but I don't have to like it. :(

lucy24

6:53 pm on Jan 23, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



SetEnvIfExpr "%{HTTP_COOKIE} =~ /.+/" cookie=notours:$0

...got an incorrect value for the cookie compared with the expected value. In fact in one case it returned ll(dot)org(dot)uk which were the final characters of the domain name; it should have been "humans-" something.
Trala, more datapoints. The $0 without capture is the same thing that gave a value of 2 in the version I tried. Fortunately the lesson learned is straightforward: when in doubt, use parentheses!

(The other lesson learned is: A test site will always pay for itself :) I wonder if it’s possible to figure out what the putative $0 value actually refers back to? I’ll put further experimentation on the agenda.)

No, the SetEnvIf gets its value from WITHIN the <if>...
Oops, right, gotcha. But at least SetEnvIf, unlike SetEnvIfExpr, doesn't demand parentheses in order to return an accurate value for $0.

w3dk

11:18 pm on Jan 23, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



SetEnvIf QUERY_STRING should behave the same as SetEnvIf REQUEST_URI. There are probably reasons for NOT doing that ...


Really not sure why the query string is not made directly accessible to SetEnvIf? The docs do make reference to this and suggest using mod_rewrite instead. However, it does seem like a glaring omission. (But this is the way it always has been since... forever... [webmasterworld.com...] Apache does separate out the query string in a few places, making it inaccessible to certain modules (mod_alias is another).

Interesting to note... Helicon Tech (that make Apache compatible modules for IIS as part of "Helicon Ape") do include "Query_String" as a directly accessible attribute in their version of mod_setenvif. They also allow direct access to the Server Variables - which Apache does not (as we've already discussed).

Helicon Ape - mod_setenvif: [helicontech.com...]

lucy24

12:12 am on Jan 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



After further experimenting:

No matter what variable I insert in the expression--QUERY_STRING, REQUEST_URI, HTTP_ACCEPT, TIME_YEAR, SERVER_PROTOCOL blahblah etcetera picking at random--the value of $0 (when no parentheses are used) is always 2. That is, if it’s set at all: my test site doesn’t have cookies, so if I say HTTP_COOKIE .* it is set to 2, but if I say .+ it is the empty string.

Code I used:
SetEnvIfExpr "%{TIME_YEAR} =~ /.+/" test1=$0
SetEnvIfExpr "%{TIME_YEAR} =~ /(.+)/" test2=$0

RewriteRule ^foo$ /bar?test1=%{ENV:test1}&test2=%{ENV:test2} [R,L]
The Redirect trick is a real timesaver, since I can see it right there in my browser's address bar--and then I pulled up the logged headers and confirmed that I was getting the same test1 and test2 values each time.

I remain utterly stumped about what "2" is.
(a) the first digit of the date in ymd order (also in dmy order, but I doubt results will differ on the 30th)
(b) the first digit of my IPv6 address (the test site uses IPv6)
(c) a numeral that occurs once in my Accept-Language header and twice in the UA string
(d) the number of gremlins living on the test site? ? ?

dstiles

10:56 am on Jan 24, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would opt for (d) where "gremlins" equates to "poor programming" (not you!).

Could it be something like my comment above... "returned ll(dot)org(dot)uk which were the final characters of the domain name"?
This 39 message thread spans 2 pages: 39