Forum Moderators: phranque
IP Date ResponseCode
Host Page Pre-redirectedPage(if relevant)
EnvVars (Name: Value:regex) (one per line)
HTTP headers (one per line)
accept: none
accept_lang: none
badua: old_browser:Chrome/66.
BlockCountry: 1
browser: chrome:Chrome/66
ips: amazon:18.237.
proto: too_low:HTTP/1.0
useragent: scrape:iodc
SetEnvIf Remote_Host 10.0.1.21 host=us:$0 (our IP, obfuscated)
<if " %{Remote_Host} =~ m#\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}# ">
SetEnv host=numbers:$0
</if>
SetEnvIf Request_URI "\/\.?wp-" badrequri=wp:$0
...or its alternative...
<if " %{Request_URI} =~ m#wp-#i ">
SetEnv badrequri=wp:$0
</if>
EnvVars (Name: Value:regex) (one per line)Oh, that's a good idea. I should add that to my own header logging so I can see exactly why a given request got blocked.
If the environment variable you're setting is meant as input into this early phase of processing such as the RewriteRule directive, you should instead set the environment variable with SetEnvIf.This may or may not matter, depending on site. But using the ordinary reverse-alphabetical-order rule, mod_env unlike mod_setenvif would execute after mod_rewrite (though still before mod_authzwhatever).
First: Have you ever, in any circumstances, done something involving an <If> envelope that worked as intended?
<if "-R '13.64.0.0/11' || -R '13.96.0.0/13' || -R '13.104.0.0/14' || -R '40.77.167.0/24' || -R '52.145.0.0/16' || -R '64.4.0.0/18' || -R '65.52.0.0/16' || -R '65.54.0.0/15' || -R '131.253.24.0/21' || -R '131.253.32.0/20' || -R '157.55.0.0/15' || -R '191.232.0.0/16' || -R '199.30.16.0/20' || -R '207.46.0.0/16' ">
BrowserMatch bingbot bing bot=bing
Require env bing
</if>
<if " ! %{HTTP_USER_AGENT} =~ m#((Apple|bing|Clara|Cliqz|Exa|Google|istella)bot|(Mojeek|Seznam|Yandex)Bot|BingPreview|DuckDuck|facebook|Qwantify|Vagabondo|Yeti)# && ! %{REQUEST_URI} =~ m#/robots\.txt#">
BrowserMatch [Bb]ot|crawler|spider bot_is=evil_robot:$0
</if>
Are you really using extra spaces inside the quotation marks in those If expressions
Are all these variables case-insensitive
that's a good idea.
... to use a php ErrorDocument which can then log the non-200 responses.
SetEnv host=numbers:$0
I'm using the casing that apache shows in the docs - eg Request_URI (which "fails" with SetEnvIf) - although from the above example, %{REQUEST_URI} "works" in an IF statement
request are renamed with a "REDIRECT_" prefix
I don't think you can use backreferences with the SetEnv directive (only with SetEnvIf).
But I'm not sure that mod_setenvif will use backreferences from an enclosed Expression (or maybe it does in later versions of Apache than what I have here)?
And for the request header Referer - is that ONLY a shorthand or is HTTP_REFERER a complete no-no?Depends on the module.
SetEnvIf RemoteAddr .* varname=value:$0
$value=getenv($name,true);
BrowserMatch curl|libwww|perl useragent=scrape:$0
BrowserMatch bitcoin|miner useragent=miner:$0
useragent=scrape:libwww
useragent=miner:bitcoin
$value=apache_getenv($name,true); $value=getenv($name,true); The walk_to_top argument is only available in apache_getenv()Oh, ###, I forgot to change that. getenv uses $local_only instead. I'm surprised it didn't break.
The getenv() function has to be supplied with a valid env var name - you can't use a wild card.In my case, I supplied names anyway, because otherwise--this was in early testing with apache_getenv--it dumped a bunch of stuff that I guess the server counts as environmental variables but which are of no use to me, often because I'm already getting the same information in more concise form. And then I added a further
request are renamed with a "REDIRECT_" prefix
Thanks, but I think that only applies to header variables, not env vars
But I'm not sure that mod_setenvif will use backreferences from an enclosed Expression (or maybe it does in later versions of Apache than what I have here)?
Do you mean "enclosed in quotes"? I have both enclosed and not enclosed and for badrequri they ALL fail.
Anyone know how to use something else as an empty SetEnvIf header/value pair?
SetEnvIf ^ ^ MY_VAR=value
RewriteRule ^ - [E=MY_VAR:%{HTTP:Some-Header}]
$value=apache_getenv($name,true); [EDITED]
The "true" refers to the optional parameter "walk_to_top", which reports the env var for the top-most setting. Without that, for Request_URI, $value is the error document URL. Eg: with no walk_to_top the returned request is always /errdoc.php; with it the result is (eg) /wp-login/.
It would be useful to allow several env var values to be set for a single name but this seems to be something apache does not cater for. A trivial example might be:
BrowserMatch curl|libwww|perl useragent=scrape:$0
BrowserMatch bitcoin|miner useragent=miner:$0
...resulting in
useragent=scrape:libwww
useragent=miner:bitcoin
Update: I got apache_getenv to work locally (Apache 2.2, php 5.something), but it broke on the live site (2.4, 7.something). Instead I tried plain getenv, which worked as intended.
Is there a way to constrain it to environmental variables you've set yourself, or do you have to read specific names out of an array (my stopgap solution to avoid a major information dump)?
keep uploading a separate file for every site
<VirtualHost nnn.nn.nnn.nnn:443>
ServerAdmin alert@mydomain.co.uk
ServerName www.example.com
DocumentRoot /srv/example
Header edit Set-Cookie ^(.*)$ __Host-$1;HttpOnly;Secure;SameSite=Strict
<Directory "/">
AllowOverride None
Require all denied
</Directory>
<Directory "/srv/example">
DirectoryIndex index.php
AllowOverride All
Include /etc/apache2/use-setenv.conf # this one here!
Include /etc/apache2/rewrite.conf
</Directory>
(etc)
</VirtualHost>
$0 backreference only refers back to the regex within the SetEnvIf directive itself
SetEnvIf ^ ^ MY_VAR=value
How would you envisage reading the values back from the receiving script? Like an "array"?
prefix all your own env vars with your own unique prefix
function logHeaders() {
$envmde=array("TrapIP","ips","BlockCountry","host","accept","accept_lang","badua","badrequri","bot","bot_is","browser","query","referer","useragent"); # TrapIP set by db Bad IP Found
$ip=$_SERVER['REMOTE_ADDR'];
$hst=$_SERVER['HTTP_HOST'];
$fh; $str; $stat = http_response_code(); # name log for error code - easier to view
$dt = date('Ymd'); $tm = date('G:i:s'); $fn="header";
if (isset($_SERVER['REDIRECT_URL']) ) $url=$_SERVER['REDIRECT_URL']; else $url="";
if ($ip=="nn.nn.nnn.nn") { $fn="our-$fn"; } # my own ip to log away from general hits
else {
$value=apache_getenv('bot'); # log bots into separate files
if(!empty($value)) { $fn="bot-$fn"; }
else {
$value=apache_getenv('bot_is'); # bot_is resolves bad things named bot, crawl etc
if(!empty($value)) { $fn="bot-$fn"; }
}
}
$str="IP: $ip\t$dt $tm\t$stat\n";
$str .= "Host: $hst\tPage: ".$_SERVER["PHP_SELF"]; # page read if 200, else /errdoc.php
if (!empty($url)) { $str .= "\tURL: ".$url; } empty if 200 else set to original page
$str .= "\n";
# report setenv values
foreach ($envmde as $name) { $value=apache_getenv($name,true); if(!empty($value)) { $str .= "$name: $value\n"; } }
# report headers (except Host, dealt with above)
foreach (getallheaders() as $name => $value) { if ($name!="Host") { $str .= "$name: $value\n"; } }
$str .= "----\n\n"; # end of record separator
# write to log file
$fh = fopen("/srv/0logs/0headers/$fn-$dt-$stat.log","a");
fwrite($fh, $str);
fclose($fh);
}
<if "-R '77\.75\.72.0/21' ">
BrowserMatch SeznamBot seznam bot=seznam
Require env seznam
</if>
<if "-R '3.0.0.0/8' || -R '34.192.0.0/10' ">
SetEnvIf Remote_Addr .* amazon ips=amazon:$0
</if>
(which logs ips: amazon:34.197.76.213) If blahblah is plain text (with or without quotation marks), like "Firefox" or "Opera 8", the value will be set as the literal string "$0".
# Results in bad_agent=Firefox
# Can use $0 or $1 since the entire pattern is a capturing group
BrowserMatch (Firefox) bad_agent=$1 I suspect replacing .* with ^ does not - not tested, though
The ^ (start-of-string-anchor) doesn't actually match anything, so cannot capture anything. So if you used ^ (instead of .*) then any backreferences will be empty.Which, in turn, means that the variable's value would be nothing, blank or empty if you used a $0 construction--probably the exact opposite of the intended result if you're using it in access controls involving "Require env". But if you simply said name-of-variable without the =$0 bit, it would be set to the default 1. (An interesting case of Less Is More.)
...can anyone remember HOW to post a comment in Apache docs?
<if " %{QUERY_STRING} =~ m#[a-z]+#i ">
SetEnvIf QUERY_STRING .* query=any:$0
</if> query: any: <if "-R '3.0.0.0/8' || -R '34.192.0.0/10' ">
SetEnvIf Remote_Addr .* amazon ips=amazon:$0
</if>
(which logs ips: amazon:34.197.76.213) SetEnvIfExpr "%{QUERY_STRING} =~ /.+/" thisworks=$0
result "thisworks: 2" SetEnvIfExpr "%{QUERY_STRING} =~ /(.+)/" thisworks=$0
SetEnvIfExpr "%{QUERY_STRING} =~ /(.+)/" thisworks=$1
result for both: "thisworks: question" SetEnvIfExpr "%{QUERY_STRING} =~ /(.+)/" thisworks=$2
result: variable "thisworks" NOT SET (technically I guess it is set to a value of "", which counts as not set) SetEnvIfExpr "%{QUERY_STRING} == 'question'" thisworks
result: "thisworks: 1" (i.e. variable is set, using default value)
SetEnvIf QUERY_STRING .* query=any:$0
1.SetEnvIfExpr "%{QUERY_STRING} =~ /.+/" thisworks=$0
result "thisworks: 2"
(Feel free to join me in “wtf?” in two-part harmony.)
2.
3.SetEnvIfExpr "%{QUERY_STRING} =~ /(.+)/" thisworks=$2
result: variable "thisworks" NOT SET (technically I guess it is set to a value of "", which counts as not set)
4.