Welcome to WebmasterWorld Guest from 54.80.93.19

Forum Moderators: Ocean10000 & phranque

Getting started: header-based access controls

     
8:24 pm on Jun 28, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


How do I get started on using header-based access controls?

Currently I am on an Apache server, shared host, using an htaccess, robots.txt, error file and this works pretty well. I run multiple web sites, all in respective directories, with the .htaccess in public_html, where my SetEndIfs cascade down to all subdirectories (or inheritance up). I regularly read my raw access log, find the bad guys and ban using UAs (SetEndIf) or IPs (deny from) using htaccess. I do have an error file, which I do review, but it reveals very little.

How do I set up header-based access controls? Can someone point me to a link or two to get me started?

Thanks All!
5:58 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


I am at the poking holes stage, looking at my log to see what is inadvertently banned, then making exceptions. This is going well. I found an "interesting" usecase, where I inadvertently banned myself:

Same computer, same IP, using Firefox no issue but using Opera (non-VPN) I gave myself 403s. I had to poke a hole for my IP, but I need to think of a solution to possibly let in Opera? My Opera dropped the request Accept header? Of course I can even vouch that I am a human, I am pretty sure.

2018-07-31:13:35:08
URL: /example/roseytoes/
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64
----

2018-07-31:13:35:08
URL: /favicon.ico
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64
7:28 pm on July 31, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


It is very, very unusual for a human request to be missing the Accept: header--so unusual, in fact, that it may be a transmission glitch. Did it happen to you consistently?
8:09 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


This only happened today, once. I will try again tomorrow. I find this very disconcerting that the Opera browser would do this. Firefox and Chrome behave very well.
9:28 pm on July 31, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


It definitely isn't a universal Opera behavior; I checked some header logs before posting.
9:33 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


I will try Opera on Windows as well as check my Ubuntu 16.04 Opera. Thanks for checking.

Loaded Opera on an old Win machine. It is sends request Accept headers from Firefox and Opera. On my Ubuntu Opera it is fickle, sometimes it sends request Accept headers and sometimes not. I can toggle the VPN setting on and it sends request Accept headers. Then I turn VPN off and half the time it does not send request Accept headers and I get a 403. I shut down the Opera browser and restart, and it now works. This is definitely an Opera issue. I will research it.

It is good to know that Opera might be the problem and not the request Accept header.
3:35 am on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


I am noticing that many spam comment posters come with no language header. To ban these comment POSTers would this work?
SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang
deny from env=bad_post_nolang
4:49 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


You can also say
.
and
!.
for “has content” and “doesn’t have content” respectively.

But why would you set a nolang flag in response to a request method? Is it important to distinguish between this category and other requests that are missing the Accept-Language header?

To answer the question: Yes, the stated rules would do what you want them to do.
5:49 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


I find many GET requests, especially from search engines, also have nolang, and this is pretty normal behaviour for a search engine. If I ban nolang this would be an issue.
2018-08-02:19:49:00
URL: /example/2016/11/21/first-snowfall-2016-2017-winter-toronto/
IP: 66.249.69.*
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,br
Connection: keep-alive
From: googlebot(at)googlebot.com
Host: example.com
If-Modified-Since: Sun, 29 Jul 2018 03:57:26 GMT
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
X-Https: 1

I am constantly getting comment spammed by non-human entities that have nolang. This might be a hurdle for them.

Why do Russian bots have lang as English, but spam me in Russian? I cannot detect the language within the spam comment, but wish I could.

Again, thanks.
6:13 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


If I ban nolang this would be an issue.
You can also poke holes by name.
BrowserMatch Googlebot !bad-condition !other-bad-condition
and so on. In fact, this type of hole-poking is now the biggest part of my htaccess (at rough count, over 1/3 of all directives).

Why do Russian bots have lang as English, but spam me in Russian?
They probably think setting their language to English is more likely to get them past filters--and then they have to spam in Russian because they don’t know any other language. Or because their spam is targeted at other Russian-speaking members of the forum.

I cannot detect the language within the spam comment, but wish I could.
Not even when it uses a non-Roman script? I should think it would be pretty trivial to screen posts for selected character ranges, alongside whatever screening is already in place (checking for malicious code injection attempts or similar). And then, if you have a lot of commentors who like throwing the odd Russian, Hebrew or Thai word into an otherwise legitimate English-language post, cross-check for [a-z] characters and put those few posts on hold for manual checking.
6:25 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


BrowserMatch Googlebot !bad-condition !other-bad-condition
Wow, I did not know that. Hole poking by name is very convenient! My hole poking is growing.

There is no request header for the actual language used within the message, so I'll need to figure something else out. Chinese spam posters also use lang = EN but can post in Russian, Japanese or English, but rarely Chinese.
8:02 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


Here I wasn't talking about the request, but the post content itself. Doesn't it go through any kind of filter before it goes live?
8:07 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


No, not really. It is Wordpress, so straight into the comments spam area, caught by the Akismet filter.
8:46 pm on Aug 4, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


BrowserMatch Googlebot !bad-condition !other-bad-condition

Does the header-field-values such as bad-condition other-bad-condition need to be specified before this statement? Can you poke the hole before you build the dam? Is the order first build the dam then poke a hole?

BrowserMatch Googlebot !bad-condition !other-bad-condition
SetEnvIf header-field-name header-field-value bad-condition
SetEnvIf header-field-name header-field-value other-bad-condition

Is this order ok?
8:58 pm on Aug 4, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


Can you poke the hole before you build the dam?
No, because it's all the same module, so commands execute in sequential order. You can only unset things that have already been set; otherwise they'll just get set all over again later on.
9:01 pm on Aug 4, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


Thank you!
1:36 am on Aug 5, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang

did not work and blocked all POSTS, but your
SetEnvIf Accept-Language . !bad_post_nolang

worked. I'm unsure why.
2:54 am on Aug 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


It's possible the server--that is, the whatever-it-is that parses Regular Expressions within mod_setenvif--doesn’t like the conjunction of ! with ^$. I don’t personally use !^$ anywhere (I just did a massive search of my local files in case I'd overlooked something), but I don't know whether that's because I once tried it and it didn't work, or just because it never occurred to me in the first place ;)
2:49 pm on Aug 6, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


I am slowly seeing the benefits of these request headers. It looks like I can cut down on my IP bans if I can detect bots using their badly written headers. There is different info in the headers that is unavailable with the raw access log. You still need to read the raw access log, but the headers logs are really useful.

I will monitor for a while longer. It may be possible to considerably reduce my IP ban list and rely more on SetEnvIf instead.
8:39 pm on Aug 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15019
votes: 665


If you are regularly studying access logs, you are already ahead of 90% (95%? 99%?) of webmasters. If, to that, you add headers ... you are so far ahead of the game, you can take a day off and go fishing.

I don't know where that came from, but I had to end the sentence in some way.
11:43 pm on Aug 6, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 444
votes: 35


Thanks Lucy. I am hoping to learn more about bot patterns in an effort to block their new methods!
This 50 message thread spans 2 pages: 50
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members