Forum Moderators: phranque

Message Too Old, No Replies

Getting started: header-based access controls

         

TorontoBoy

8:24 pm on Jun 28, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



How do I get started on using header-based access controls?

Currently I am on an Apache server, shared host, using an htaccess, robots.txt, error file and this works pretty well. I run multiple web sites, all in respective directories, with the .htaccess in public_html, where my SetEndIfs cascade down to all subdirectories (or inheritance up). I regularly read my raw access log, find the bad guys and ban using UAs (SetEndIf) or IPs (deny from) using htaccess. I do have an error file, which I do review, but it reveals very little.

How do I set up header-based access controls? Can someone point me to a link or two to get me started?

Thanks All!

TorontoBoy

5:58 pm on Jul 31, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I am at the poking holes stage, looking at my log to see what is inadvertently banned, then making exceptions. This is going well. I found an "interesting" usecase, where I inadvertently banned myself:

Same computer, same IP, using Firefox no issue but using Opera (non-VPN) I gave myself 403s. I had to poke a hole for my IP, but I need to think of a solution to possibly let in Opera? My Opera dropped the request Accept header? Of course I can even vouch that I am a human, I am pretty sure.

2018-07-31:13:35:08
URL: /example/roseytoes/
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64
----

2018-07-31:13:35:08
URL: /favicon.ico
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64

lucy24

7:28 pm on Jul 31, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is very, very unusual for a human request to be missing the Accept: header--so unusual, in fact, that it may be a transmission glitch. Did it happen to you consistently?

TorontoBoy

8:09 pm on Jul 31, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



This only happened today, once. I will try again tomorrow. I find this very disconcerting that the Opera browser would do this. Firefox and Chrome behave very well.

lucy24

9:28 pm on Jul 31, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It definitely isn't a universal Opera behavior; I checked some header logs before posting.

TorontoBoy

9:33 pm on Jul 31, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I will try Opera on Windows as well as check my Ubuntu 16.04 Opera. Thanks for checking.

Loaded Opera on an old Win machine. It is sends request Accept headers from Firefox and Opera. On my Ubuntu Opera it is fickle, sometimes it sends request Accept headers and sometimes not. I can toggle the VPN setting on and it sends request Accept headers. Then I turn VPN off and half the time it does not send request Accept headers and I get a 403. I shut down the Opera browser and restart, and it now works. This is definitely an Opera issue. I will research it.

It is good to know that Opera might be the problem and not the request Accept header.

TorontoBoy

3:35 am on Aug 3, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I am noticing that many spam comment posters come with no language header. To ban these comment POSTers would this work?
SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang
deny from env=bad_post_nolang

lucy24

4:49 pm on Aug 3, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can also say
.
and
!.
for “has content” and “doesn’t have content” respectively.

But why would you set a nolang flag in response to a request method? Is it important to distinguish between this category and other requests that are missing the Accept-Language header?

To answer the question: Yes, the stated rules would do what you want them to do.

TorontoBoy

5:49 pm on Aug 3, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I find many GET requests, especially from search engines, also have nolang, and this is pretty normal behaviour for a search engine. If I ban nolang this would be an issue.
2018-08-02:19:49:00
URL: /example/2016/11/21/first-snowfall-2016-2017-winter-toronto/
IP: 66.249.69.*
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,br
Connection: keep-alive
From: googlebot(at)googlebot.com
Host: example.com
If-Modified-Since: Sun, 29 Jul 2018 03:57:26 GMT
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
X-Https: 1

I am constantly getting comment spammed by non-human entities that have nolang. This might be a hurdle for them.

Why do Russian bots have lang as English, but spam me in Russian? I cannot detect the language within the spam comment, but wish I could.

Again, thanks.

lucy24

6:13 pm on Aug 3, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If I ban nolang this would be an issue.
You can also poke holes by name.
BrowserMatch Googlebot !bad-condition !other-bad-condition
and so on. In fact, this type of hole-poking is now the biggest part of my htaccess (at rough count, over 1/3 of all directives).

Why do Russian bots have lang as English, but spam me in Russian?
They probably think setting their language to English is more likely to get them past filters--and then they have to spam in Russian because they don’t know any other language. Or because their spam is targeted at other Russian-speaking members of the forum.

I cannot detect the language within the spam comment, but wish I could.
Not even when it uses a non-Roman script? I should think it would be pretty trivial to screen posts for selected character ranges, alongside whatever screening is already in place (checking for malicious code injection attempts or similar). And then, if you have a lot of commentors who like throwing the odd Russian, Hebrew or Thai word into an otherwise legitimate English-language post, cross-check for [a-z] characters and put those few posts on hold for manual checking.

TorontoBoy

6:25 pm on Aug 3, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



BrowserMatch Googlebot !bad-condition !other-bad-condition
Wow, I did not know that. Hole poking by name is very convenient! My hole poking is growing.

There is no request header for the actual language used within the message, so I'll need to figure something else out. Chinese spam posters also use lang = EN but can post in Russian, Japanese or English, but rarely Chinese.

lucy24

8:02 pm on Aug 3, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here I wasn't talking about the request, but the post content itself. Doesn't it go through any kind of filter before it goes live?

TorontoBoy

8:07 pm on Aug 3, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



No, not really. It is Wordpress, so straight into the comments spam area, caught by the Akismet filter.

TorontoBoy

8:46 pm on Aug 4, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



BrowserMatch Googlebot !bad-condition !other-bad-condition

Does the header-field-values such as bad-condition other-bad-condition need to be specified before this statement? Can you poke the hole before you build the dam? Is the order first build the dam then poke a hole?

BrowserMatch Googlebot !bad-condition !other-bad-condition
SetEnvIf header-field-name header-field-value bad-condition
SetEnvIf header-field-name header-field-value other-bad-condition

Is this order ok?

lucy24

8:58 pm on Aug 4, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you poke the hole before you build the dam?
No, because it's all the same module, so commands execute in sequential order. You can only unset things that have already been set; otherwise they'll just get set all over again later on.

TorontoBoy

9:01 pm on Aug 4, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thank you!

TorontoBoy

1:36 am on Aug 5, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang

did not work and blocked all POSTS, but your
SetEnvIf Accept-Language . !bad_post_nolang

worked. I'm unsure why.

lucy24

2:54 am on Aug 5, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's possible the server--that is, the whatever-it-is that parses Regular Expressions within mod_setenvif--doesn’t like the conjunction of ! with ^$. I don’t personally use !^$ anywhere (I just did a massive search of my local files in case I'd overlooked something), but I don't know whether that's because I once tried it and it didn't work, or just because it never occurred to me in the first place ;)

TorontoBoy

2:49 pm on Aug 6, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I am slowly seeing the benefits of these request headers. It looks like I can cut down on my IP bans if I can detect bots using their badly written headers. There is different info in the headers that is unavailable with the raw access log. You still need to read the raw access log, but the headers logs are really useful.

I will monitor for a while longer. It may be possible to considerably reduce my IP ban list and rely more on SetEnvIf instead.

lucy24

8:39 pm on Aug 6, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you are regularly studying access logs, you are already ahead of 90% (95%? 99%?) of webmasters. If, to that, you add headers ... you are so far ahead of the game, you can take a day off and go fishing.

I don't know where that came from, but I had to end the sentence in some way.

TorontoBoy

11:43 pm on Aug 6, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks Lucy. I am hoping to learn more about bot patterns in an effort to block their new methods!

JamesSC

11:47 am on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



TorontoBoy,

Thanks for all of this. Because of the responses to my thread here [webmasterworld.com ] I'm trying to follow along in your footsteps here as well.

In my case I'm only trying to run one simple WordPress site on a shared server.

I'm not clear yet on where to place, first, the logging code itself (mine = logheaders.php) and, second, the includes code referencing it:

<?php include ($_SERVER['DOCUMENT_ROOT'] . "/includes/logheaders.php"); ?>


Does the logging code (logheaders.php) and the directory I place it in (/logheaders/) - if I can even do that - need to be in the same /home/user/ directory my WordPress installation is in? Within the WordPress installation directory itself? Deeper?

I initially assumed from your narrative that there was no reason a /logheaders/ directory couldn't contain both files, both the operational logheaders.php and the logheaders.log it consequently creates, but the includes code above clearly seems to indicate the logheaders.php code file should reside in one of the WordPress includes directories.

But which one? The wp-admin/includes/? The wp-includes? Somewhere else, such as in my child theme functions file (you had also mentioned "deep within my theme")? So far the different placements of each I've tried from trying to follow your narrative have done no harm, but neither have they generated any apparent response at all.

I understand from this and from Lucy24's comment on my search_engine_spiders post above why I want to do what I want to do as you are doing, but because I'm still behind you in the learning curve the how of the placements of the codes is still eluding me.

Any additional clarifications of your narrative above about these two basic setup placements would be appreciated.

TorontoBoy

1:52 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



The code include goes into header.php of your WP theme, right after the <body> tag. You should use a child theme, otherwise this will be wiped out when you upgrade your theme.

I have my WP in a subdir, under public_html. I have /logheaders/ also as a subdir under public_html. For WP I use logheaders.php that is inside /logheaders/. For my error 404.php, which is under public_html I could not get it to work with the subdir, so had to put an additional copy of logheaders.php directly into public_html.

Just to be clear, logheaders.php and the logs all go into /logheaders/, a subdir in public_html. I've tried to keep my WP install completely separate from logging, as I have multiple WP installs.

This request headers logging has helped me out a lot. It is worthwhile figuring out.

JamesSC

2:48 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Hey, thanks so much.

Now to get it to work.

JamesSC

9:06 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Success, then self-created failure, and now maybe a curious question: is this setup somehow limited to once per day/clock period?

Here's why I ask. You know the kid who would take his father's expensive fishing reel apart and then not be able to put it back together again? Yeah, that was me.

Anyway, my directory structure is /user/example.com/wordpress files

I inserted the includes code into the body of my child theme header.php file.

I put a copy of the logging script logheaders.php with the code corrected to "/headerlogs/headers-" into /user/headerlogs/. Nothing.

I then added a second copy /user/logheaders.php. Nothing.

I then added a third copy /user/example.com/logheaders.php. Bingo! There was now a log file alongside logheaders.php in /user/headerlogs/.

But three seemed like too many copies to me, because I also had no idea which one had actually triggered the log creation, whether it was the third one, or whether it was one of the first two on some sort of delayed basis.

So I deleted the newly created log file and tried to start over, assuming it would simply be regenerated in proper course, in order to determine which copy of logheaders.php where was actually responsible for generating the log file I had successfully generated previously.

So far, trying all combinations, nothing. The magic is gone.

Unless, of course, there is some controlling clock genie somewhere which has already ticked off today's log file creation and sees no reason for another one until the day/clock rolls over.

Or something else.

I don't generally believe in petulant software, and from accounts hereabouts this should not be this finicky.

What obviousness am I missing?

Thanks,

James

TorontoBoy

10:03 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



is this setup somehow limited to once per day/clock period?

No. If you did some error in setup, like the php includes but the logheaders.php is not found, it will be logged in your recent errors file, available on cPanel. Logging is immediate.

Even if you did everything properly, you still need to wait for someone to visit your site. After you inserted the code, and put the logheaders.php in the proper place, did you visit your site? If no one visits your site right after you put your setup together, there will be nothing to log.

If you delete the log for the day, or rename it, the logheaders.php checks for a file and if it does not find it, will recreate it. no worries.

JamesSC

11:18 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Okay, so it should be working.

Let me first check that I have my includes code right.

I had noticed above that you and Lucy24 differed on whether there was a space between include and the opening parenthesis and whether there was a semicolon before the closing ?>. A search online for PHP include/require coding seems to show both, so that my includes code currently is

<?php include ($_SERVER['DOCUMENT_ROOT'] . '/headerlogs/logheaders.php') ;?>


Is this in fact correct, or should one or more elements (space, semicolon) be removed or changed? When I successfully generated the log file initially I may have toggled the space following include (and of course I can't say which was operational when success struck), but I can definitely say the closing semicolon wasn't included.

My error logs are also currently warning me that no file is found at /user/example.com/headerlogs/logheaders.php, although when I had successfully generated the log file initially I had not included logheaders.php within a /headerlogs/ directory under /example.com/; logheaders.php was instead simply copied naked into /example.com/.

Following what you say, I suppose my situation now is making sure I have the includes code right - including whether another directory like /user/ or /user/example.com/ should precede /headerlogs/logheaders.php. Once that variable is solved, it may just become a matter of where /headerlogs/logheaders.php is located, using my error log as a guide.

justpassing

11:39 pm on Oct 12, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



Sorry to butt in.
hether there was a space between include and the opening parenthesis

It doesn't matter at all. Since "include" is a language construct, it doesn't require parenthesis, (same as "echo").

Also, you should always use "require" instead of "include", this is better coding practice.

"include" should only be used when there is a possibility the file is missing, and that it wouldn't impact the script.

lucy24

12:59 am on Oct 13, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Also, you should always use "require" instead of "include", this is better coding practice.
Huh. I would have thought the opposite: that you only use “require” when there is other stuff in the script that won’t work properly if the part-to-be-included has gone missing. I see it like a clause in a contract: Require means if this one clause is invalid, the whole contract is void, while Include means the rest of the contract still applies, so good luck trying to weasel out of it ;)

If you're doing a php include, you need either the full physical filepath or $_SERVER['DOCUMENT_ROOT'] -- whichever is appropriate for your situation. If it's an SSI, use “include virtual” with a path starting in / which is identical to the document-root business. If you're using a CMS and you're worried that something will get rewritten, you could always throw in a line like
RewriteRule ^includes/logheaders.php - [L]
(using, ahem, the actual physical URL of the logheaders file) before the CMS stuff. It shouldn't matter, though, since your average CMS ignores requests for files that physically exist on the server.

My personal coding style, whether html or css or php, is to use spaces anywhere that a space is optional. In most cases, it makes no difference; it's just for the look of the thing.

TorontoBoy

3:32 am on Oct 13, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



@justpassing <?php include vs require is interesting. It would be useful to use require when debugging, but then if this is a live site then human browsers would also see a broken site. You would see first hand confirmation your code worked though.

Include would generate a warning error in your error log, but then still render the page. This seems better to me.

Is there other benefits in the require? Thanks for the suggestion.

JamesSC

4:25 am on Oct 13, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I think I've got it now.

@justpassing, in this instance Lucy24 and TorontoBoy are right; this was a thing I'm just trying out which wasn't yet worth breaking anything over if I couldn't get it right.

The key turned out to be listening to the warning's in my error log: the system was looking for the file-within-the-directory one level deeper than I had had it placed. When I put it there, everything worked, and I was able to test it by tweaking it back and forth.

Thanks additionally to Lucy24 for the alternate formulation to $_SERVER['DOCUMENT_ROOT']. My next move is to try that alternative in my includes to see if I can position the directory where I wanted it in the first place.

Again, thanks to the help from everyone who contributed.
This 80 message thread spans 3 pages: 80