Upgrading Apache to HTTP/2

Forum Moderators: phranque

Message Too Old, No Replies

Upgrading Apache to HTTP/2

dstiles

1:16 pm on May 24, 2021 (gmt 0)

What am I doing wrong? I have a recently-installed apache web server running HTTPS on all sites. I thought, since it was a new installation, that it would automatically handle HTTP/2 but in my eagerness to transfer the sites from an old server I forgot to check.

I have now tried twice to upgrade to HTTP/2 and both times succeeded only in downing the web server for a few hours during the process. I have now returned it to HTTP/1.1 whilst I work on other things. The second time I began by setting up a new server on a different VPS as HTTP/2 from scratch. It only has a single page (total content, "Site Unavailable") but I took that as a good sign and ploughed ahead with a similar setup.

Problems encountered on live web server:
Loss of apache_getenv()
Status 550 (does not support the HTTP protocol)

Upgrade procedure (Debian 10):
sudo apt-get install php7.3-fpm
sudo a2dismod php7.3
sudo a2enconf php7.3-fpm
sudo a2enmod proxy_fcgi
sudo a2dismod mpm_prefork
sudo a2enmod mpm_event
sudo a2enmod ssl
sudo a2enmod http2
sudo systemctl restart apache2

Protocols h2 http/1.1 is already set in http2.conf but I've also tried adding it to an individual site conf.

What else should I be looking for, please?

lucy24

3:47 pm on May 24, 2021 (gmt 0)

Status 550 (does not support the HTTP protocol)

On all requests, or only HTTP/2 requests?

When you say “individual site conf” do you mean VirtualHost or something else?

Do mod_auth-thingy and mod_alias still work as intended? (Loopy question, I know, but alphabetical order jumped out at me.)

dstiles

5:33 pm on May 24, 2021 (gmt 0)

550 - don't know. I didn't check from an HTTP/1.1 device.

VirtualHost.

Aliases - I have none. Not sure what auth-(thingy) is supposed to do. They are all enabled, though.

lammert

6:07 pm on May 24, 2021 (gmt 0)

Can you create a small PHP file with the content below and run it? It might give you a hint what you are missing.

<?php
phpinfo();
?>

If this PHP file doesn't run either, you should increase the LogLevel setting in your Apache config file to info or debug, and search in the log-files for any clues.

dstiles

9:39 am on May 25, 2021 (gmt 0)

Thanks. I already have that set up but I always forget to look at it. :( I'll change the setup in a day or so, as opportunity premits, and see what it says.

dstiles

2:47 pm on Jul 27, 2021 (gmt 0)

Finally got around to revisiting this problem.

Setup used for debian apache was:

sudo a2dismod php7.3
sudo a2enconf php7.3-fpm
sudo a2enmod proxy_fcgi
sudo a2dismod mpm_prefork
sudo a2enmod mpm_event
sudo a2enmod ssl
sudo a2enmod http2

First hurdle: apache_getenv does not work, which throws a fatal error. I use this about three times on each page to manage bots/users/rubbish and direct accordingly to logs. Haven't really searched into this yet but it could be a stumbling block?

Second hurdle: fnally tracked down to an included file that handles all setenvs on all sites. It's included in the primary vhost for each site (the default URL - other URLs such as http:// are redirected to this one). This has worked well since it was set up some time ago. This, in part, feeds the apache_getenv above. When included as usual it throws a 505 error for any file content (apart from pure comments). Remove the content and it works, but of course there is no bot control etc.

<VirtualHost 185.35.148.127:443>
 ServerAdmin alert@ssph.org.uk
 ServerName www.example.co.uk
 DocumentRoot /srv/mysite
 Header edit Set-Cookie ^(.*)$ __Host-$1;HttpOnly;Secure;SameSite=Strict
 <Directory "/">
 AllowOverride None
 Require all denied
 </Directory>
 <Directory "/srv/mysite">
 DirectoryIndex index.php
 AllowOverride All
 Include /etc/apache2/use-setenv.conf
 </Directory>
 CustomLog ${APACHE_LOG_DIR}/mysite/access.log combined env=!dontlog
 SSLEngine on
 <FilesMatch "\.(php)$">
 SSLOptions +StdEnvVars
 </FilesMatch>
 SetEnv nokeepalive ssl-unclean-shutdown
 Include/etc/letsencrypt/options-ssl-apache.conf
 SSLCertificateFile /etc/letsencrypt/live/www.example.co.uk/fullchain.pem
 SSLCertificateKeyFile /etc/letsencrypt/live/www.example.co.uk/privkey.pem
</VirtualHost>

Third problem is: this is a live server and it's been down for a couple of hours whilst I got this far. Gods know how many baddies have accessed it since! If only I'd realised apache wasn't by default http/2 when I set up the server last year! :(

Any help much appreciated, folks!

lucy24

3:38 pm on Jul 27, 2021 (gmt 0)

First hurdle: apache_getenv does not work, which throws a fatal error. I use this about three times on each page to manage bots/users/rubbish and direct accordingly to logs. Haven't really searched into this yet but it could be a stumbling block?

This jumped out at me because I remember having to tweak parts of my logheaders code when the server was upgraded. What's the exact syntax of the lines that use this function? (Mine uses a combination of getenv() and $SERVER(blahblah), but I’ve got a vague notion apache_getenv was one of the things I could never get to work.)

dstiles

3:52 pm on Jul 27, 2021 (gmt 0)

I may have fixed that one. A comment on the PHP site for apach_getenv (and for getenv) says...

apache_getenv(key) does not work on an php cgi installation, in this case rather use $_SERVER["REDIRECT_key"]

so I created a function for it, plus one for setenv. No idea if they work yet because of the other problem...

function apache_getenv($key) {
 return(isset($_SERVER['REDIRECT_$key'])?$_SERVER['REDIRECT_URL']:"");
 }
function apache_setenv($key,$val) {
 $_SERVER['REDIRECT_$key']=$val; return(apache_getenv($key) );
 }

dstiles

9:30 pm on Jul 27, 2021 (gmt 0)

Those functions should, of course, be

function apache_getenv($key) { return(isset($_SERVER[$key])?$_SERVER[$key]:""); }
function apache_setenv($key,$val) { $_SERVER[$key]=$val; return(apache_getenv($key) ); }

but I'm still not sure the setenv one works; PHP definitions and comments can be a bit confusing.

I think I've got a handle on the other problem. I was wrong about ANY content being fatal. I gradually rebuilt the file and found certain constructs failed, although they worked with the pre-cgi installation. The construct for letsencrypt is difficult to enclose within <if>...</if>, as is possible with other bots, so...

# letsencrypt
SetEnvIf Referer .well-known/acme-challenge/ letsencrypt bot=letsencryptr
SetEnvIf REQUEST_URI ^/\.well-known/acme-challenge/ letsencrypt bot=letsencryptu
BrowserMatch letsencrypt.org letsencrypt bot=letsencryptb
Require env letsencrypt

...fails where...

# mojeek
<if "-R '5.102.173.64/28' ">
 SetEnvIfExpr "%{REMOTE_ADDR} =~ /(.+)/" ips=mojeek:$0
 BrowserMatch MojeekBot mojeek bot=mojeek
 Require env mojeek
</if>

...works. I will need to work on that one.

More critical is my method of defining env vars and then Requiring them...

BrowserMatchNoCase analy[sz]|CensysInspect|legs|search|seo bot_is=seo:$0
...
Require expr %{REQUEST_URI} =~ m#/robots\.txt#
<RequireAll>
 Require method GET POST HEAD
 <RequireNone>
 ....
 Require env bot_is
 ....
 </RequireNone>
</RequireAll>

This fails and needs a bit more work but I'm beginning to see the light. Still don't know why it all worked with http/1.1 and fails with http/2. :(

By the way, the server now runs http/2. :)

lucy24

12:53 am on Jul 28, 2021 (gmt 0)

Hee. I can see where we're in personal-coding-style territory again, because I'd be doing it the other way around: a <Directory> or perhaps <Location> envelope for the let's-encrypt stuff, containing Require directives amounting to deny from all except the relevant robot.

dstiles

9:20 am on Jul 28, 2021 (gmt 0)

The file containing letsencrypt is called from every vhost's directory clause; putting it into each vhost is too much to maintain.

I'm currently going through the blocking file changing all browsermatch statements (so convenient!) to (whatever)

<if " (%{HTTP_USER_AGENT} =~ m#(<|>|%0A|%0D|%27|%3C|%3E|%00)#i)">
 SetEnvIfExpr "%{HTTP_USER_AGENT} =~ /(.+)/" useragent=inject:$0
 Require all denied
</if>

For some reason the 403's are not triggering. More digging to do. <sigh>

lucy24

4:47 pm on Jul 28, 2021 (gmt 0)

Are you currently logging environmental variables? If so, any clues?

Tangentially, is there anything inside those /acme-challenge/ directories? My host uses them too, but if there's any content, it is invisible, and the size computes to zero. (Not a leading-dot issue, since the containing /.well-known/ is in plain view.) Mystifying.

dstiles

2:18 pm on Jul 29, 2021 (gmt 0)

I'm (trying!) to log env values, yes. Mostly succeeding, depite having to rewrite the file they are all defined in. And having redefined much of it I find the old mechanism starting to come back to life in places. Weird.

What I can't get working at the moment is non-200 logging. Used to work, different (dated) log file for different codes plus separate groups of files for bots and non-bots. Now all I have is 200-bots and 200-nonbots. Looking at apache's error.log shows a rejection of errordoc.php (sometimes!), which is where all the non-200 trickery happens, but no idea why; still working on that. Downside: most baddies are currently getting 200's, good bots are sometimes rejected (don't know why) and my back-stop database of server farms can't be used.

Letsencrypt - took a bit of working out. To begin with, as far as I can tell, the internally triggered certbot does not get logged; what comes in as a bot seems to be a rather arbitrary external bot at random times unconnected with certbot and it does not seem worried if it gets a 403. The hidden folders are populated during a cert update and then emptied again (I guess a working folder for certbot). Why the external bot hits them I can only guess, maybe to see if anything's left behind? The folders are not always present anyway.

lucy24

4:35 pm on Jul 29, 2021 (gmt 0)

Looking at apache's error.log shows a rejection of errordoc.php (sometimes!), which is where all the non-200 trickery happens, but no idea why

Are you getting the ludicrously unhelpful “client denied by server configuration”? That seems to be the one thing that’s unchanged between 2.2 and 2.4. Yes, thank you Apache, I realize the request must have run into some kind of rule, but WHAT rule?

In the specific case of errordoc.php, I assume you've remembered to poke holes for this file in each separate place that could potentially issue a 403--and then also for for the logheaders function itself if it lives in a different file.

The hidden folders are populated during a cert update and then emptied again (I guess a working folder for certbot).

Aha, that explains it. Yes, my host used to use a different verification system for letsencrypt but I think they have now moved everything to the /.well-known/ method.

:: detour to read up on <Location> ::

It does seem as this could be used generically for any /.well-known/ anywhere. But the same article sheds light on a long-standing head-scratcher, involving URLs with erroneous double-slash at some point. (Thanks to an error on my part, the Googlebot was requesting a fair number of these and I had to figure out how to redirect them. Turns out there's a config-level MergeSlashes directive that is on by default.)

dstiles

10:30 am on Jul 30, 2021 (gmt 0)

They do seem to follow MS's method of error reporting, yes. Very unhelpful.

The relevant error handling files are visible and working to some degree (now that I've fixed a glaring error - see below). In any case, they are unaltered (apart from get/setenv) from the previous incarnation.

Wasn't aware of MergeSlashes but I've seen some doubles, mainly from bots looking for wordpress etc.

Yet another correction to the getenv/setenv problem. The two sets of code I gave above are both in error. In the second case there is no allowance for an optional extra parameter walk_to_top - I really MUST pay attendtion to documentation. :( And, of course, my usage was randomly optional.

The corrected (for now!) functions are:

function apache_getenv($key,$walk_to_top=true) { return(isset($_SERVER[$key])?$_SERVER[$key]:""); }
function apache_setenv($key,$val,$walk_to_top=true) { $_SERVER[$key]=$val; return(apache_getenv($key,false) ); }

Have to say that at this time I'm still not convinced that apache_getenv is fully functional; I get the impression some defined envs are not being passed on. So that is this afternoon's task. :(

I'm also trying to use a defined env within the "setenv" file. I have, for example, a number of tests for the standard good bots which all set an env named bot - bot=duckduck, bot=bing etc. Some ignorant bots (we all know which!) still include things like "compatible;" in the UA and there are a couple of others that rely on bot to control undesirables, typically randomly named bots and crawlers. I am having trouble using bot to exclude genuine users of the terms. Currently I'm trying:

<if " ! (%{ENV:bot} =~ m#^$# ) && ! (%{REQUEST_URI} =~ m#/robots\.txt#) && (%{HTTP_USER_AGENT} =~ m#.{0,10}([Bb]ot|[Cc]rawl|rank|review|spider).{0,10}#) ">
 SetEnv bot_is=bad_robot:$0
 Require all denied
</if>

I've also tried ! (%{ENV:bot} == '' ) with no success.

lucy24

5:09 pm on Jul 30, 2021 (gmt 0)

:: detour to Apache docs ::

Ah, I see. <If> is evaluated after <Files>, so you can't just make a little <Files> envelope for robots.txt and let everyone in that way*. I suppose you could set an environmental variable at some earlier stage when the request is for robots.txt, rather than do the REQUEST_URI business, but it wouldn't necessarily save the server any work.

Is ENV:bot the part where you poke holes for specific named robots?

* For a given definition of “let in”, at least. I currently have a long version of robots.txt, which names User-Agents and lists directories and does the usual robots.txt stuff, and a short verion that simply Disallows everything, because if you're lying about who you are, you’re not entitled to any further information.

dstiles

9:52 am on Jul 31, 2021 (gmt 0)

Robots.txt isn't the problem, although I seem to see bing trying to find it with an incorrect protocol (or something) and then using that as an excuse to ignore it.

The problem: I have, at the start of the file, set up a number of bot tests (duckduck, apple, bing etc) and, for example, set...

SetEnvIfExpr "%{REMOTE_ADDR} =~ /(.+)/" bot=duck ips=duck:$0

where ips logs the IP used by the bot. SetEnvIfExpr (and the other SetEnvIf but NOT SetEnv) is supposed to be available for evaluation by <if> but it seems not. Later I try to use env bot to refrain from testing something - eg

<if " ! (%{ENV:bot} == '') && (%{HTTP_USER_AGENT} =~ m#compatible;#i)">
 SetEnvIfExpr "%{HTTP_USER_AGENT} =~ /(.+)/" badua=compatible
 Require all denied
</if>

This fails, as does a test for m#^$#. Using ! (%{ENV:bot} =~ m#bing|google# ) also fails - I managed to block all B and G bots with that one last night. :(

This is my fourth day working on this mess. It should have been a simple switch from HTTP/1.1 to HTTP/2 but somehow apache has managed to make the two systems incompatible in several places - and not always told anyone. :(

As to robots.txt - the few "good" bots I accept are given an entry therein and a couple of prolific seo-type bots are forbidden but as to the rest - wouldn't be so bad if the bots could be forced to obey it - a kind of htaccess - but as it is I just block anything I don't like, starting with...

<if " ! (%{ENV:bot} =~ m#^$# ) && ! (%{REQUEST_URI} =~ m#/robots\.txt#) && (%{HTTP_USER_AGENT} =~ m#.{0,10}([Bb]ot|[Cc]rawl|rank|review|spider).{0,10}#) ">
 SetEnv bot_is=bad_robot:$0
 Require all denied
</if>

(which again fails due to the ENV:bot problem).

lucy24

4:00 pm on Jul 31, 2021 (gmt 0)

An alternative approach to robots.txt is to let everyone in on the config level: that is, let everyone request example.com/robots.txt and be served content with a 200 response. But it doesn't have to be the same content, because all those environmental variables you've been setting are also available to, let's say, /physical/directory/path/robots.php which can then merrily pick & choose what to show. That includes environmental variables that aren't even used for access control, like my lying_bot above. So if you've already identified the visitor as (the real) Googlebot, you can serve up a “robots.txt” that makes no mention of any other rules except those that pertain to Googlebot. And there's no need to let unwanted robots know that “I would let you in if you were the GoodBot”, because it just puts ideas into their head.

The reason I brought this up in the first place is that your <If> envelopes include a bit that says "if the request isn't for robots.txt", and it seems exasperating to have to specify this every time. But there may really be no alternative.

dstiles

6:05 pm on Jul 31, 2021 (gmt 0)

I've considered tailored robots.txt but so far not enough incentive. Apart from letting in and controlling the major few I see no reason to play games with it.

I've been looking closer at the error log, aided by LogLevel debug rewrite:trace5 (which does not improve things all that much). One noticeable thing is that there is a gap between the reject line and rejecting the errordoc, and that gap seems to suggest that errordoc is processed as a new page and THAT is what's stopping the error/logging process. Cutting all the rubbish from the error log, for a single page request:

Setting browser, referer: https://www.example.co.uk/about.php
Setting browser, referer: https://www.example.co.uk/about.php
client denied by server configuration: /srv/example/, referer: https://www.example.co.uk/about.php
Setting browser, referer: https://www.example.co.uk/about.php
Setting browser, referer: https://www.example.co.uk/about.php
client denied by server configuration: /srv/example/errdoc.php, referer: https://www.example.co.uk/about.php

... where the "setting" lines correspond to expected env settings.

The rejected errdoc.php is duplicated in every site's root and serves to include a common error processing file errordoc.php held in a "library". The only other thing apart from the include is a call to the error processing function in errordoc.php. The errdoc.php file is referenced as the error processing file in errordocs.conf.

I'm going to try working out a way to short-circuit the errdoc path and hopefully that will stop it being rejected.

I've also realised that, depending on the page content, there can be anything up to and beyond a dozen log lines for a single page hit. I guess that's fcgi working, which again suggests I may be correct in trying to bypass checking for errdoc.

I'm sure I must be doing something wrong, though, and that other people do not have this problem. :(

lucy24

12:56 am on Aug 1, 2021 (gmt 0)

that gap seems to suggest that errordoc is processed as a new page

In fact that is exactly what happens. First comes the external request for realpage.html, and then if the request is denied, there is a subsequent internal request for 403.html or whatever it may be. And this internal request is subject to exactly the same access-control rules as the preceding external request. That's why you need to make sure you have poked holes for errordoc.php in each and every place that is capable of issuing a 403, in the same way that you poke holes for robots.txt.

The same thing, mutatis mutandis, happens if a custom 404 page can't be found in the expected location: first the requested page isn't found, and then the 404 page also can't be found.

In each case, depending on server settings, there should be a whole series of "request denied", "request for errordoc denied", "request for errordoc because of the previous denied request also denied" ... and so on until the server puts its foot down and stops the loop.

dstiles

8:50 am on Aug 1, 2021 (gmt 0)

Last night I tried, without success...
RewriteRule /errdoc.php /errdoc.php [L]
...with and without escaped dots. Not well up on rewriterule, though.

I think it IS running errdoc as a separate process but surely apache KNOWS what it is doing? Sorry, I've recently come to realise that is an inane question. :( You would at least think there would be postings all over the forums about this, though, but none I could find (except this thread, right at the top!).

Anyway, I'll pursue that line today. Thanks for confirming the possibility.

lucy24

4:17 pm on Aug 1, 2021 (gmt 0)

RewriteRule /errdoc.php /errdoc.php [L]
...with and without escaped dots.

Strictly speaking the dot in the pattern should be escaped, but it's a non-lethal error, because what else could come in that location? The target, otoh, should be simply - (“stop here and take no further action”) or you risk going into an infinite loop.

But this will not really help here, because it only pertains to mod_rewrite. Access controls using mod_auththingummy are entirely separate, and need a separate hole poked. That's why I stress “each mod that is capable of issuing a 403”.

I think it IS running errdoc as a separate process but surely apache KNOWS what it is doing?

I'm sure you remember:
I really hate this damned machine
I wish that they would sell it
It never does quite what I want
But only what I tell it.

Apache doesn't know, unless you tell it so, that suchandsuch specific files are to be exempted from ordinary access-control rules. In htaccess this is easy enough: there's a <Files> envelope in my shared htaccess for mod_authwhatsit requests, and a preliminary [L] RewriteRule in each individual site's htaccess. (Since it is shared hosting, there is also a <Files> at the config level, which is one reason they tell us to use the exact name “forbidden.html”. But it seems safest to replicate the rule locally.)

I wonder if it would be easier to set an environmental variable right at the outset, where if the request is for certain files such as robots.txt or error documents, you set something like
SetEnvIf Request_URI (this|that|other) exempt=$1
And then in every <If> rule, there's a bit that says !exempt (“if this environmental variable does not have a non-empty value”). Then you wouldn’t have to keep track of everything that you need to make an exception for.

dstiles

8:17 am on Aug 2, 2021 (gmt 0)

Lucy, I will do as you suggest and poke holes this morning.

However, I cannot really believe apache would be so dumb as to negate their own advice re: error reporting and screw up the transition from http/1 to http/2. If there were an important condition in the change to fcgi then surely they could have included it in the relevant module. I must have missed something, I'm sure. The error document is so fundamental I would have expected an automatic exemption from re-checking (with a possible switch to enable it if desperate). And the filename IS known to apache through errordocs.conf.

Late yesterday I discovered a variety of code snippets that were recommended on various forums, most of them for early versions (php5) and seemingly incorrect module names. When I've poked the holes I will look further into that.

lucy24

2:36 pm on Aug 2, 2021 (gmt 0)

The error document is so fundamental

It isn't. Required by the canons of ordinary human decency, yes. Built-in, no. The default is
:: shuffling papers ::
“output a simple hardcoded error message”
which can also be achieved by saying
ErrorDocument default
in case you want to override an inherited setting. This might be useful for testing purposes, but is liable to frighten humans.

Tangentially, I was intrigued to see that this oldie but goodie is still present in the documentation (note sarcastic quotation marks):

Microsoft Internet Explorer (MSIE) will by default ignore server-generated error messages when they are "too small" and substitute its own "friendly" error messages. The size threshold varies depending on the type of error, but in general, if you make your error document greater than 512 bytes, then MSIE will show the server-generated error rather than masking it.

Does MSIE really still do this? What a good thing it is on its way out; I can only hope Edge isn't similarly too smart for its own good.

dstiles

3:19 pm on Aug 2, 2021 (gmt 0)

Ok, Lucy, I'll bow to your extensive knowledge. :) I still think it should be a fundamental.

The "drill a hole" tactic has hit a wall. Despite (my understanding of) apache documentation, <if> does not seem to read SetEnvIf. I THINK I've got it right...

SetEnvIf Request_URI /errdoc\.php exempt=errdoc
<if " %{ENV:exempt} == 'errdoc' ">
 SetEnv test
</if>

... but although I get entries in the log for exempt I get none for test, which suggests either I've made a mistake or it doesn't work. I've also tried "SetEnvIf Remote_Addr .* test" with no result. I don't see much point in progressing that if the basic test fails. :(

Nor am I convinced that's the way to go. It seems too messy. I'm sure there must be a handler or something that should work; that's where I'm looking now. One suggestion I've seen is to add the following to vhost but it's a fairly old posting so...

<LocationMatch "^(.*\.php)$">
 ProxyPass fcgi://127.0.0.1:9000/your/site/webroot
</LocationMatch>

When you switched to http/2 did you have to modify your domains' vhosts?

lucy24

4:28 pm on Aug 2, 2021 (gmt 0)

<LocationMatch "^(.*\.php)$">

Urk. Pretty sure that could be expressed simply as

<LocationMatch "\.php$">

except are you even allowed to do that with Location? Seems like a sneaky attempt to use what is functionally a FilesMatch, only tricking it into executing later, after all <Directory> and <Files> sections. Does it work? Make up something non-essential and test it. (And are you looking at the content of the envelope, or only its <Location> syntax? The content seems like something that belongs in mod_alias or possibly an AddHandler statement.)

When you switched to http/2 did you have to modify your domains' vhosts?

I'm on shared hosting, so the Server Fairies did it all for me ;) And, needless to say, didn't bother to tell us, so my first clue was when all my log processing scripts broke due to repeated occurrences of things like "HTTP/1\.[01]". My only contact with config files is in MAMP--most recently yesterday when I laboriously figured out that the reason one local site wasn't working as intended was that its directory name (on my HD, I'd never do it on a server) contains an apostrophe ' that I'd erroneously entered as ’ (curly apostrophe) in two different places. I use htaccess even on local sites, precisely because I want to replicate whatever I'm doing on the real sites.

dstiles

8:04 am on Aug 3, 2021 (gmt 0)

> are you even allowed to do that with Location?

I have no idea! It's getting too deep for me. I'm going back to the beginning-ish and reconsider WHY errdoc is being blocked - or even being rescanned. There must be something simple I've missed.

dstiles

2:19 pm on Aug 6, 2021 (gmt 0)

I always said it was something stupid!

It's working now, subject to fine tuning and a reappraisal of the setenv script. I was just about to turn off h2push when I made a brief reappraisal of the errordoc script in a five-minute period before I was due for a walk.

Reason #1: what were originally simple env vars reported in $_SERVER are now prepended with REDIRECT_ which is obvious when you dump $_SERVER AND actually read closely what it says. :( This mistake resulted in not selecting the appropriate log - bot and non-bot.

I was, early on, forced, by the lack of the function apache_getenv, to write my own. I have now added another function which checks for REDIRECT_ keys as well. Needs to be used cautiously as some non-env keys are in both raw and REDIRECT_ forms.

Reason #2: buried in the errordoc function was a test for valid protocols, looked at many times and always missed until my pre-walk scrutiny:

if (!in_array($_SERVER["SERVER_PROTOCOL"], array('HTTP/0.9','HTTP/1.0','HTTP/1.1'))) $sc = 505;

Now corrected and with 0.9 removed; I see no point in it.

if (!in_array($_SERVER["SERVER_PROTOCOL"], array('HTTP/2.0','HTTP/1.0','HTTP/1.1'))) $sc = 505;

My apologies to apache for SOME of the things I said; a few things, though, were justified (I think).

Lucy, many, many thanks for your assistance! Much appreciated and I learned a few new tricks from the conversation.

My initial impetus for converting to HTTP/2 was bing, who sometimes sent their robots with an uninterpretable protocol. This conversion hasn't help with that: they really do sometimes NOT supply the protocol (HTTP/whatever).

lucy24

5:54 pm on Aug 6, 2021 (gmt 0)

Now corrected and with 0.9 removed; I see no point in it.

If ever there were an unequivocal 403 that would be it! Robots themselves must have abandoned it years ago; I don't find a single HTTP/0 in logs. I still see a few 1.0, but they're getting rarer--rare enough that I set an environmental variable bad_protocol and unset as needed.

Memo to self: Take a closer look and see which authorized robots are still using HTTP/1.0. Quick eyeballing reveals that less than 2% of 1.0 requests are authorized--and even that number is misleading, because most of them are robots.txt.

dstiles

7:43 am on Aug 7, 2021 (gmt 0)

I did consider removing 1.0 as well but haven't checked on its usage - later, whan I've got this thing completely sorted. Pushing errdoc through the setenv file again is still troublesome.

This 42 message thread spans 2 pages: 42