Forum Moderators: phranque

Message Too Old, No Replies

Upgrading Apache to HTTP/2

         

dstiles

1:16 pm on May 24, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What am I doing wrong? I have a recently-installed apache web server running HTTPS on all sites. I thought, since it was a new installation, that it would automatically handle HTTP/2 but in my eagerness to transfer the sites from an old server I forgot to check.

I have now tried twice to upgrade to HTTP/2 and both times succeeded only in downing the web server for a few hours during the process. I have now returned it to HTTP/1.1 whilst I work on other things. The second time I began by setting up a new server on a different VPS as HTTP/2 from scratch. It only has a single page (total content, "Site Unavailable") but I took that as a good sign and ploughed ahead with a similar setup.

Problems encountered on live web server:
Loss of apache_getenv()
Status 550 (does not support the HTTP protocol)

Upgrade procedure (Debian 10):
sudo apt-get install php7.3-fpm
sudo a2dismod php7.3
sudo a2enconf php7.3-fpm
sudo a2enmod proxy_fcgi
sudo a2dismod mpm_prefork
sudo a2enmod mpm_event
sudo a2enmod ssl
sudo a2enmod http2
sudo systemctl restart apache2

Protocols h2 http/1.1 is already set in http2.conf but I've also tried adding it to an individual site conf.

What else should I be looking for, please?

lucy24

4:39 pm on Aug 7, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did consider removing 1.0 as well but haven't checked on its usage
On my sites, 1.0 is used by a handful of YMMV operators:

-- Mail.RU only when asking for robots.txt (which is exempt anyway)
-- two Internet Archive robots, though they may finally be moving up to 1.1
-- SafeDNSBot

Also, as noted earlier, lots and lots of malign robots. On closer inspection, I find that for more than half of those malign robots, bad_protocol is the only reason they were blocked. So that’s something to consider.

dstiles

9:39 am on Aug 13, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A week on and I'm still making minor adjustments. Most problematic are bots with outdated UAs - eg "compatible;" - trying to stop them being classed as nasty "I look like a browser" bots.

Bing is also a trial. It sometimes comes in, as far as I can tell, straight into the errordoc mode, without first being a legitimate bot hit. I can't see anything in the logs to indicate it ever, in this mode, comes in as a legit visitor. As a result it somehow by-passes the bot-testing mechanism that sets the env flag "bot" and so gets mis-filed in the wrong log. I need to review that. I suppose it's possible that errdoc is being presented to bing and other bots; if so there is no content for them.

Another anomaly is a test for unwanted languages...
 SetEnvIfExpr "%{HTTP:Accept-Language} =~ m#^en-us$|^ru$|zh-c#i " accept_lang=foreign:$0

which persists in writing "foreign:HTTP/1.1" (whatever) to the log instead of the trigger expression. Tried changing the word "foreign" to various others but still the same.

The setenv-based file begins...
# stop processing of errdocs
<if " (%{REQUEST_URI} =~ m#errdoc#) ">
SetEnvIf Request_URI /errdoc\.php exempt=errdoc
SetEnvIfExpr "%{REQUEST_URI} =~ m#robots\.txt# " bot=robot
Require all granted
</if>

where the robots clause is supposed to deal with the bing problem; Ha! I suspect it never hits the setenv file and goes straight to errdoc.php. That's one possibility to look at, though why, if it does, is another problem.

One of the biggest problems, writing this script, is that there seems to be no way to test previously set env vars. That is a major disability, leading to the neccessity of repeatedly re-testing server conditions such as UA.

lucy24

6:06 pm on Aug 13, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bing is also a trial. It sometimes comes in, as far as I can tell, straight into the errordoc mode, without first being a legitimate bot hit.
Some years ago, I used to find direct bingbot requests for one of my more specialized error documents. I never did figure out the how or why. But I do know from painful experience that if I have a typo in a link, and dash back an hour later to correct it, bingbot will have come by during that hour, and will continue requesting that incorrect URL for years to come.

Crystal ball says "foreign:HTTP/1.1" will turn out to have one of those “D’oh!” forehead-slapping explanations.

<if " (%{REQUEST_URI} =~ m#errdoc#) ">
SetEnvIf Request_URI /errdoc\.php exempt=errdoc
SetEnvIfExpr "%{REQUEST_URI} =~ m#robots\.txt# " bot=robot
Require all granted
</if>
I'm missing something. How can REQUEST_URI be concurrently errdoc.php and robots.txt? And, secondarily, why does it require a SetEnvIfExpr instead of a simple SetEnvIf like the previous line?

dstiles

9:11 am on Aug 14, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



> Crystal ball says "foreign:HTTP/1.1" will turn out to have one of those “D’oh!” forehead-slapping explanations.

I'm almost sure of it, but so far still can't find it. Just changed $0 to $1 to see what happens - probably not what I want. :(

You are correct, of course. It can't be both errdoc.php AND robots.txt. I was getting desperate when I put that in and didn't reason it out. The expr - I've been using that a lot, since the non-expr version sometimes fails validation (configtest) and I haven't had time to figure out the differences and why's - I know it's sometimes to do with the keyword not being a basic one. It's another "when I get around to it" job.

lucy24

6:18 pm on Aug 14, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just changed $0 to $1
But is there a $1? Did you also add (parentheses) to the pattern?

dstiles

8:31 am on Aug 15, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It was just an experiment. Result was the numeric part of http/whatever. Given up on it for now. Started converting some of the Expr back to the original browsermatch and setenvif.

Still getting stupid bingbot log entries, still trying to discover why. :( I looks to be hitting errdoc specifically but since that is happening on all sites I suspect it's not.

dstiles

9:39 am on Aug 16, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why only bingbot and not other bots?
I found a couple of online postings from people who had the same problem. There were no solutions and the response was basically, "It's doing no harm, forget it". So, apart from reassigning it to the correct log, I'm forgetting it. Sample log entries (to the same site):
157.55.39.43 - - [15/Aug/2021:09:40:07 +0100] "GET /robots.txt HTTP/2.0" 403 212 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.43 - - [15/Aug/2021:09:40:08 +0100] "GET /robots.txt HTTP/2.0" 200 1063 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Trapping seems to be almost correct now, although I made an alteration last night that resulted in losing all non-bot traffic overnight...
BrowserMatch ^$ useragent=none

... USED to work but now is ALWAYS set. I replaced it for now with...
SetEnvIfExpr "%{HTTP_USER_AGENT} =~ m#^$# " useragent=none

lucy24

5:02 pm on Aug 16, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Still getting stupid bingbot log entries, still trying to discover why.
Yeah, stupidity and success don’t usually go together, but bing seems to have mastered the formula.

Currently, they are on a roll of attaching some other site’s URLpaths to my domain name, leading to 404s that I can’t do a thing about because the error is 100% theirs.

:: shrug ::

BrowserMatch ^$ useragent=none
Huh. But then you've got an env var called "useragent" which has content regardless of whether the UA exists or not. Or is this the only circumstance in which you set it? Might be a good idea to track environmental variables and make sure "useragent" isn't coming out with unexpected values, as with "foreign" a few posts back.

:: detour to pore over docs ::

Oh. THAT'S what m#blahblah# means. Got it.

My equivalent of the no-user-agent rule says
BrowserMatch ^-?$ noagent
where the -? part is a leftover from misunderstanding logs. (If a header field is empty, logs say "" while if it is absent logs say "-".) File under: Not needed, but does no harm. And then noagent becomes one of the listed RequireNone conditions. For a while I had to unset it for facebook, but mercifully they seem to have dropped this unwise idea.

dstiles

12:37 pm on Aug 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BrowserMatch ^$ useragent=none

I've replaced it with the previous one...
SetEnvIfExpr "%{HTTP_USER_AGENT} =~ m#^$# " useragent=none

I hadn't consider useragent as being pre-existent - ie as used in the browsermatch keyword - but it works fine for most things. A few of them didn't populate $0 until I enclosed the m#...# in parentheses as in m#(...)#.
THAT'S what m#blahblah# means

:) And don't forget the optional suffix i for nocase... m#...#i

lucy24

5:19 pm on Aug 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A few of them didn't populate $0 until I enclosed the m#...# in parentheses as in m#(...)#.
Oh! I had a thought. I learned a while back, by direct painful experience, that in vanilla SetEnvIf, $0 only works as intended if the pattern includes something that could hypothetically be interpreted as a Regular Expression, like for example a . period. If the pattern consists only of characters that have no special RegEx meaning--i.e. no anchors, no parentheses and so on--then the intended $0 comes through as the literal string "$0". Perhaps SetEnvIfExpr works analogously.

"m" is a funny choice of letters, because I associate it with the /m “multiline” option in javascript. Obviously not applicable in a request, since there are no line breaks.

dstiles

8:43 am on Aug 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I had a vague memory of someone mentioning the $0 problem some time back. SetEnvIfExpr, I suspect, is closely modelled on SetEnvIf.

I also wonder why m but life is too short. :)

I have to say re-building the SetEnv file is far more difficult for fcgi, especially now it also parses errordoc, which has to be omitted from most tests but without the facility to nest <if> clauses. It's occurred to me it may be more satisfactory to just parse the basics that could control image usage and pass the rest into a PHP script. I have two scripts anyway, one of which is invoked for every hit depending on the result of the SetEnvs, plus a logging controller, so loading the script would not take a lot longer and it may even run quicker than apache's parser; certainly more compactly as it would be easier to structure in a more efficient manner. Maybe someday.

dstiles

5:52 pm on Aug 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This works. The additional parentheses does the trick.
SetEnvIfExpr "%{HTTP:Accept-Language} =~ m#(^ru$|zh-c)#i " accept_lang=foreign:$0

You were right about non-obvious regexes failing.
This 42 message thread spans 2 pages: 42