Forum Moderators: open
Any reason not to block AmazonAWS to prevent them from chewing up server resources?Nope. Block away!
SetEnvIf Remote_Addr ^5\.253\.19\b bad_range=$0
...
BrowserMatch goodrobot !bad_rangeas opposed to Require ip 5.253.19
(inside a RequireNone envelope, of course) if I don't need to poke holes. Any chance you could send me your complete set of rules/directives I would need to add to my htaccess file?Sorry, no, it's all just too site-specific. In fact, user-space-specific, since I have SetEnvIf and Require directives in a shared htaccess, and then each site has some further RewriteRules for things that would only apply to one site, or specific filenames. (If your hosting setup uses the primary/addon structure instead of userspace, you would put the shared rules in the "primary" site's htaccess, which is seen by all sites.)
-A INPUT -s 52.64.0.0/12 -p tcp -m multiport --dports 443 -j DROP Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/76.0.3803.0 Safari/537.36 I know I could block by user agent using a rewritecondYou could. Or you could say
BrowserMatch HeadlessChrome bad_agent
...
Require env bad_agentMatter of fact, I just checked my own htaccess--the UA sounded familiar but I couldn't remember if it was common enough to block by name--and I actually have this line, except that it winds up with
BrowserMatch HeadlessChrome bad_agent
Order Allow,Deny
Allow from ALL
Deny from env=bad_agent
Require not env bad_agent
BrowserMatch HeadlessChrome bad_agent
<RequireAll>
Require all granted
Require not env bad_agent
</RequireAll>
# Block Bad Agents
BrowserMatch HeadlessChrome bad_agent
# Block Bad IP Ranges
SetEnvIf Remote_Addr ^5\.253\.19\b bad_range
# Block Bad IPs
SetEnvIf Remote_Addr 5.253.19 bad_ip
<RequireAll>
Require all granted
Require not env bad_agent
Require not env bad_range
Require not env bad_ip
</RequireAll>
So "HeadlessChrome" is not used by any actual browsers then and it is safe enough to block on that?When in doubt about some user-agent, I do a search like this in logs (text editor with RegEx):
\.css .+?HeadlessChrome
Other than search engines, the vast majority of robots only request pages. So if a given UA has never requested a stylesheet, it's safe to assume it is not used by humans. SetEnvIf Remote_Addr 5.253.19 bad_ipCareful! SetEnvIf uses regular expressions, so you need anchors and escapes. But you can put IPs in standard form into “Require (not) ip” directives as well.
should have had a "not" in it like so right?This begins to get into personal-coding-style territory. A line by itself--or a line inside a RequireAll or RequireAny envelope--would need “Require not” if you’re blocking the request. A line inside a RequireNone envelope would have “Require” alone. (Yes, it’s confusing until you are used to it.)
Order Allow,Deny
Allow from all
Deny from blahblah
meaning “Let everyone in by default, unless they meet one of this long list of conditions”. The equivalent in 2.4 could be EITHER <RequireAll>
Require all granted
Require not blahblah
</RequireAll>
OR <RequireAll>
Require all granted
<RequireNone>
Require blahblah
</RequireNone>
</RequireAll>
Counting on fingers reveals that if you have 8 or more negative conditions, you start saving bytes by using the <RequireNone> version (29 or 31 bytes for the envelope, vs. 4 each for the “not”.)