Forum Moderators: phranque

Message Too Old, No Replies

Should I block requests that don't support compression?

         

Zippy1970

11:40 am on Aug 7, 2024 (gmt 0)

10+ Year Member Top Contributors Of The Month



My apache server is configured to compress output before it's send back to the client. This works perfect and saves me tons of bandwidth. However, if I look at my log files, I see a lot or requests from clients that apparently don't support compression. Looking at what they are accessing, and doing a reverse IP lookup, all of these request come from what I suspect to be misbehaving bots.

These requests make up more than 80% of the total traffic of my server.

I can block any request from a client that does not support compression (gzip, deflate). But my question is, should I? Am I also blocking legit traffic if I do? All modern browsers support compression. Also, all legit bots (Google, Bing, et al) do too.

lucy24

4:45 pm on Aug 7, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: detour to htaccess ::
SetEnvIf Accept-Encoding ^[-*]?$ noencoding
SetEnvIf Accept-Encoding ^gzip$ gzip_only
SetEnvIf Accept-Encoding ^identity$ identity_only
with, however, a great many holes punched (notably Applebot, but also most mobiles, because many of my rules date back to 2015).

In the requests you’re seeing, is the Accept-Encoding header empty, absent or “none”? (“none” is not very common, and the ones I find were all blocked on other grounds already). The pattern with ^[-*]?$ covers requests in which the header isn’t sent at all, which in my case is around

:: business with calculator ::

around 22% of all requests.

Yesterday I was reminded that I really need to check links from old pages. Today I am reminded that I need to check if my hole-poking rules are up-to-date. In between, the roomba ran over a cherry* and is deaf to all attempts to sort it out.

Sigh.

* To a cat, a bowl of cherries is just a bowl of cat toys.

Zippy1970

6:18 pm on Aug 7, 2024 (gmt 0)

10+ Year Member Top Contributors Of The Month



I actually did it as follows (in my .htaccess):

RewriteCond %{HTTP:Accept-Encoding} !gzip [NC]
RewriteCond %{HTTP:Accept-Encoding} !deflate [NC]
RewriteRule .* - [F,L]

lucy24

8:29 pm on Aug 7, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That should work. Conditions default to [AND], so "not A" along with "not B" means neither A nor B, which I assume is what you intended.

Incidentally, an [F] doesn't require an [L]. It does not harm, but it’s an extra two bytes for your server to read on every request. And I would call [NC] counterproductive, because if someone wrongly says "Gzip" or "GZIP" you can be pretty confident they’re a bad actor. Besides, wrongly cased forms are vanishingly rare--I checked--so again it’s extra work for the server, flattening the case on every request.

Zippy1970

9:42 pm on Aug 7, 2024 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you for your comments. I changed it to

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} \.(html?|php|cgi|js|css|xml|json|txt|svg)$ [NC]
RewriteCond %{HTTP:Accept-Encoding} ^$ [OR]
RewriteCond %{HTTP:Accept-Encoding} !gzip [OR]
RewriteCond %{HTTP:Accept-Encoding} !deflate
RewriteRule .* - [F]
</IfModule>


The reason I included the file types is because I noticed that some things were no longer working. For instance, I'm using Amplitude.js as an audio player on some pages. It retrieves the audio files locally but it doesn't have an Accept-Encoding in its header. I could just accept requests that don't have an Accept-Encoding header, but a lot of misbehaving misbehaving bots don't include it either.

lucy24

10:12 pm on Aug 7, 2024 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a good general rule in mod_rewrite: Never put anything in a Condition that could go in the body of a Rule. Why? Because mod_rewrite works on a “two steps forward, one back” pattern, where conditions are only evaluated if the rule initially matches. I've got a good slew of rules in the form
RewriteRule (^|\.html|/)$ - [F]
i.e. “only evaluate the Conditions if the request is for a page”. Although Conditions are physically placed before the Rule, they are evaluated after--if at all.

But now I think you’ve goofed: by using [OR] instead of the default (implied) [AND], you are saying “Block this request if the Accept-Encoding header is empty, OR it doesn't include gzip, OR it doesn't include deflate. That means requests must have both gzip and deflate. Surely that isn't what you intended? The ^$ condition by itself (it can also be expressed as !.) should do what you want, if the object is to deny requests that don't send the header at all.

:: wandering off to check something on test site ::

Edit: I double-checked that the conditions in ^$ or !. also work if the header in question is absent rather than empty. It does in mod_setenvif, but I’ve never had occasion to try it in mod_rewrite.