Welcome to WebmasterWorld Guest from 54.162.133.222

Forum Moderators: Ocean10000 & phranque

Getting started: header-based access controls

     
8:24 pm on Jun 28, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


How do I get started on using header-based access controls?

Currently I am on an Apache server, shared host, using an htaccess, robots.txt, error file and this works pretty well. I run multiple web sites, all in respective directories, with the .htaccess in public_html, where my SetEndIfs cascade down to all subdirectories (or inheritance up). I regularly read my raw access log, find the bad guys and ban using UAs (SetEndIf) or IPs (deny from) using htaccess. I do have an error file, which I do review, but it reveals very little.

How do I set up header-based access controls? Can someone point me to a link or two to get me started?

Thanks All!
5:58 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


I am at the poking holes stage, looking at my log to see what is inadvertently banned, then making exceptions. This is going well. I found an "interesting" usecase, where I inadvertently banned myself:

Same computer, same IP, using Firefox no issue but using Opera (non-VPN) I gave myself 403s. I had to poke a hole for my IP, but I need to think of a solution to possibly let in Opera? My Opera dropped the request Accept header? Of course I can even vouch that I am a human, I am pretty sure.

2018-07-31:13:35:08
URL: /example/roseytoes/
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64
----

2018-07-31:13:35:08
URL: /favicon.ico
IP: 76.66.*.*
Accept-Encoding: gzip, deflate
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Connection: keep-alive
Host: example.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 OPR/54.0.2952.64
7:28 pm on July 31, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


It is very, very unusual for a human request to be missing the Accept: header--so unusual, in fact, that it may be a transmission glitch. Did it happen to you consistently?
8:09 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


This only happened today, once. I will try again tomorrow. I find this very disconcerting that the Opera browser would do this. Firefox and Chrome behave very well.
9:28 pm on July 31, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


It definitely isn't a universal Opera behavior; I checked some header logs before posting.
9:33 pm on July 31, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


I will try Opera on Windows as well as check my Ubuntu 16.04 Opera. Thanks for checking.

Loaded Opera on an old Win machine. It is sends request Accept headers from Firefox and Opera. On my Ubuntu Opera it is fickle, sometimes it sends request Accept headers and sometimes not. I can toggle the VPN setting on and it sends request Accept headers. Then I turn VPN off and half the time it does not send request Accept headers and I get a 403. I shut down the Opera browser and restart, and it now works. This is definitely an Opera issue. I will research it.

It is good to know that Opera might be the problem and not the request Accept header.
3:35 am on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


I am noticing that many spam comment posters come with no language header. To ban these comment POSTers would this work?
SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang
deny from env=bad_post_nolang
4:49 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


You can also say
.
and
!.
for “has content” and “doesn’t have content” respectively.

But why would you set a nolang flag in response to a request method? Is it important to distinguish between this category and other requests that are missing the Accept-Language header?

To answer the question: Yes, the stated rules would do what you want them to do.
5:49 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


I find many GET requests, especially from search engines, also have nolang, and this is pretty normal behaviour for a search engine. If I ban nolang this would be an issue.
2018-08-02:19:49:00
URL: /example/2016/11/21/first-snowfall-2016-2017-winter-toronto/
IP: 66.249.69.*
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,br
Connection: keep-alive
From: googlebot(at)googlebot.com
Host: example.com
If-Modified-Since: Sun, 29 Jul 2018 03:57:26 GMT
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
X-Https: 1

I am constantly getting comment spammed by non-human entities that have nolang. This might be a hurdle for them.

Why do Russian bots have lang as English, but spam me in Russian? I cannot detect the language within the spam comment, but wish I could.

Again, thanks.
6:13 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


If I ban nolang this would be an issue.
You can also poke holes by name.
BrowserMatch Googlebot !bad-condition !other-bad-condition
and so on. In fact, this type of hole-poking is now the biggest part of my htaccess (at rough count, over 1/3 of all directives).

Why do Russian bots have lang as English, but spam me in Russian?
They probably think setting their language to English is more likely to get them past filters--and then they have to spam in Russian because they don’t know any other language. Or because their spam is targeted at other Russian-speaking members of the forum.

I cannot detect the language within the spam comment, but wish I could.
Not even when it uses a non-Roman script? I should think it would be pretty trivial to screen posts for selected character ranges, alongside whatever screening is already in place (checking for malicious code injection attempts or similar). And then, if you have a lot of commentors who like throwing the odd Russian, Hebrew or Thai word into an otherwise legitimate English-language post, cross-check for [a-z] characters and put those few posts on hold for manual checking.
6:25 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


BrowserMatch Googlebot !bad-condition !other-bad-condition
Wow, I did not know that. Hole poking by name is very convenient! My hole poking is growing.

There is no request header for the actual language used within the message, so I'll need to figure something else out. Chinese spam posters also use lang = EN but can post in Russian, Japanese or English, but rarely Chinese.
8:02 pm on Aug 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


Here I wasn't talking about the request, but the post content itself. Doesn't it go through any kind of filter before it goes live?
8:07 pm on Aug 3, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


No, not really. It is Wordpress, so straight into the comments spam area, caught by the Akismet filter.
8:46 pm on Aug 4, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


BrowserMatch Googlebot !bad-condition !other-bad-condition

Does the header-field-values such as bad-condition other-bad-condition need to be specified before this statement? Can you poke the hole before you build the dam? Is the order first build the dam then poke a hole?

BrowserMatch Googlebot !bad-condition !other-bad-condition
SetEnvIf header-field-name header-field-value bad-condition
SetEnvIf header-field-name header-field-value other-bad-condition

Is this order ok?
8:58 pm on Aug 4, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


Can you poke the hole before you build the dam?
No, because it's all the same module, so commands execute in sequential order. You can only unset things that have already been set; otherwise they'll just get set all over again later on.
9:01 pm on Aug 4, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


Thank you!
1:36 am on Aug 5, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


SetEnvIf Request_Method ^POST bad_post_nolang
SetEnvIf Accept-Language !^$ !bad_post_nolang

did not work and blocked all POSTS, but your
SetEnvIf Accept-Language . !bad_post_nolang

worked. I'm unsure why.
2:54 am on Aug 5, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


It's possible the server--that is, the whatever-it-is that parses Regular Expressions within mod_setenvif--doesn’t like the conjunction of ! with ^$. I don’t personally use !^$ anywhere (I just did a massive search of my local files in case I'd overlooked something), but I don't know whether that's because I once tried it and it didn't work, or just because it never occurred to me in the first place ;)
2:49 pm on Aug 6, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


I am slowly seeing the benefits of these request headers. It looks like I can cut down on my IP bans if I can detect bots using their badly written headers. There is different info in the headers that is unavailable with the raw access log. You still need to read the raw access log, but the headers logs are really useful.

I will monitor for a while longer. It may be possible to considerably reduce my IP ban list and rely more on SetEnvIf instead.
8:39 pm on Aug 6, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


If you are regularly studying access logs, you are already ahead of 90% (95%? 99%?) of webmasters. If, to that, you add headers ... you are so far ahead of the game, you can take a day off and go fishing.

I don't know where that came from, but I had to end the sentence in some way.
11:43 pm on Aug 6, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 483
votes: 43


Thanks Lucy. I am hoping to learn more about bot patterns in an effort to block their new methods!
11:47 am on Oct 12, 2018 (gmt 0)

New User

joined:Oct 4, 2018
posts: 14
votes: 0


TorontoBoy,

Thanks for all of this. Because of the responses to my thread here [webmasterworld.com ] I'm trying to follow along in your footsteps here as well.

In my case I'm only trying to run one simple WordPress site on a shared server.

I'm not clear yet on where to place, first, the logging code itself (mine = logheaders.php) and, second, the includes code referencing it:

<?php include ($_SERVER['DOCUMENT_ROOT'] . "/includes/logheaders.php"); ?>


Does the logging code (logheaders.php) and the directory I place it in (/logheaders/) - if I can even do that - need to be in the same /home/user/ directory my WordPress installation is in? Within the WordPress installation directory itself? Deeper?

I initially assumed from your narrative that there was no reason a /logheaders/ directory couldn't contain both files, both the operational logheaders.php and the logheaders.log it consequently creates, but the includes code above clearly seems to indicate the logheaders.php code file should reside in one of the WordPress includes directories.

But which one? The wp-admin/includes/? The wp-includes? Somewhere else, such as in my child theme functions file (you had also mentioned "deep within my theme")? So far the different placements of each I've tried from trying to follow your narrative have done no harm, but neither have they generated any apparent response at all.

I understand from this and from Lucy24's comment on my search_engine_spiders post above why I want to do what I want to do as you are doing, but because I'm still behind you in the learning curve the how of the placements of the codes is still eluding me.

Any additional clarifications of your narrative above about these two basic setup placements would be appreciated.
1:52 pm on Oct 12, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts:483
votes: 43


The code include goes into header.php of your WP theme, right after the <body> tag. You should use a child theme, otherwise this will be wiped out when you upgrade your theme.

I have my WP in a subdir, under public_html. I have /logheaders/ also as a subdir under public_html. For WP I use logheaders.php that is inside /logheaders/. For my error 404.php, which is under public_html I could not get it to work with the subdir, so had to put an additional copy of logheaders.php directly into public_html.

Just to be clear, logheaders.php and the logs all go into /logheaders/, a subdir in public_html. I've tried to keep my WP install completely separate from logging, as I have multiple WP installs.

This request headers logging has helped me out a lot. It is worthwhile figuring out.
2:48 pm on Oct 12, 2018 (gmt 0)

New User

joined:Oct 4, 2018
posts: 14
votes: 0


Hey, thanks so much.

Now to get it to work.
9:06 pm on Oct 12, 2018 (gmt 0)

New User

joined:Oct 4, 2018
posts: 14
votes: 0


Success, then self-created failure, and now maybe a curious question: is this setup somehow limited to once per day/clock period?

Here's why I ask. You know the kid who would take his father's expensive fishing reel apart and then not be able to put it back together again? Yeah, that was me.

Anyway, my directory structure is /user/example.com/wordpress files

I inserted the includes code into the body of my child theme header.php file.

I put a copy of the logging script logheaders.php with the code corrected to "/headerlogs/headers-" into /user/headerlogs/. Nothing.

I then added a second copy /user/logheaders.php. Nothing.

I then added a third copy /user/example.com/logheaders.php. Bingo! There was now a log file alongside logheaders.php in /user/headerlogs/.

But three seemed like too many copies to me, because I also had no idea which one had actually triggered the log creation, whether it was the third one, or whether it was one of the first two on some sort of delayed basis.

So I deleted the newly created log file and tried to start over, assuming it would simply be regenerated in proper course, in order to determine which copy of logheaders.php where was actually responsible for generating the log file I had successfully generated previously.

So far, trying all combinations, nothing. The magic is gone.

Unless, of course, there is some controlling clock genie somewhere which has already ticked off today's log file creation and sees no reason for another one until the day/clock rolls over.

Or something else.

I don't generally believe in petulant software, and from accounts hereabouts this should not be this finicky.

What obviousness am I missing?

Thanks,

James
10:03 pm on Oct 12, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts:483
votes: 43


is this setup somehow limited to once per day/clock period?

No. If you did some error in setup, like the php includes but the logheaders.php is not found, it will be logged in your recent errors file, available on cPanel. Logging is immediate.

Even if you did everything properly, you still need to wait for someone to visit your site. After you inserted the code, and put the logheaders.php in the proper place, did you visit your site? If no one visits your site right after you put your setup together, there will be nothing to log.

If you delete the log for the day, or rename it, the logheaders.php checks for a file and if it does not find it, will recreate it. no worries.
11:18 pm on Oct 12, 2018 (gmt 0)

New User

joined:Oct 4, 2018
posts: 14
votes: 0


Okay, so it should be working.

Let me first check that I have my includes code right.

I had noticed above that you and Lucy24 differed on whether there was a space between include and the opening parenthesis and whether there was a semicolon before the closing ?>. A search online for PHP include/require coding seems to show both, so that my includes code currently is

<?php include ($_SERVER['DOCUMENT_ROOT'] . '/headerlogs/logheaders.php') ;?>


Is this in fact correct, or should one or more elements (space, semicolon) be removed or changed? When I successfully generated the log file initially I may have toggled the space following include (and of course I can't say which was operational when success struck), but I can definitely say the closing semicolon wasn't included.

My error logs are also currently warning me that no file is found at /user/example.com/headerlogs/logheaders.php, although when I had successfully generated the log file initially I had not included logheaders.php within a /headerlogs/ directory under /example.com/; logheaders.php was instead simply copied naked into /example.com/.

Following what you say, I suppose my situation now is making sure I have the includes code right - including whether another directory like /user/ or /user/example.com/ should precede /headerlogs/logheaders.php. Once that variable is solved, it may just become a matter of where /headerlogs/logheaders.php is located, using my error log as a guide.
11:39 pm on Oct 12, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Sept 13, 2018
posts:147
votes: 28


Sorry to butt in.
hether there was a space between include and the opening parenthesis

It doesn't matter at all. Since "include" is a language construct, it doesn't require parenthesis, (same as "echo").

Also, you should always use "require" instead of "include", this is better coding practice.

"include" should only be used when there is a possibility the file is missing, and that it wouldn't impact the script.
12:59 am on Oct 13, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15181
votes: 681


Also, you should always use "require" instead of "include", this is better coding practice.
Huh. I would have thought the opposite: that you only use “require” when there is other stuff in the script that won’t work properly if the part-to-be-included has gone missing. I see it like a clause in a contract: Require means if this one clause is invalid, the whole contract is void, while Include means the rest of the contract still applies, so good luck trying to weasel out of it ;)

If you're doing a php include, you need either the full physical filepath or $_SERVER['DOCUMENT_ROOT'] -- whichever is appropriate for your situation. If it's an SSI, use “include virtual” with a path starting in / which is identical to the document-root business. If you're using a CMS and you're worried that something will get rewritten, you could always throw in a line like
RewriteRule ^includes/logheaders.php - [L]
(using, ahem, the actual physical URL of the logheaders file) before the CMS stuff. It shouldn't matter, though, since your average CMS ignores requests for files that physically exist on the server.

My personal coding style, whether html or css or php, is to use spaces anywhere that a space is optional. In most cases, it makes no difference; it's just for the look of the thing.
3:32 am on Oct 13, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts:483
votes: 43


@justpassing <?php include vs require is interesting. It would be useful to use require when debugging, but then if this is a live site then human browsers would also see a broken site. You would see first hand confirmation your code worked though.

Include would generate a warning error in your error log, but then still render the page. This seems better to me.

Is there other benefits in the require? Thanks for the suggestion.
4:25 am on Oct 13, 2018 (gmt 0)

New User

joined:Oct 4, 2018
posts: 14
votes: 0


I think I've got it now.

@justpassing, in this instance Lucy24 and TorontoBoy are right; this was a thing I'm just trying out which wasn't yet worth breaking anything over if I couldn't get it right.

The key turned out to be listening to the warning's in my error log: the system was looking for the file-within-the-directory one level deeper than I had had it placed. When I put it there, everything worked, and I was able to test it by tweaking it back and forth.

Thanks additionally to Lucy24 for the alternate formulation to $_SERVER['DOCUMENT_ROOT']. My next move is to try that alternative in my includes to see if I can position the directory where I wanted it in the first place.

Again, thanks to the help from everyone who contributed.
This 60 message thread spans 2 pages: 60
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members