Forum Moderators: phranque

Message Too Old, No Replies

logging issue behind proxy

X-Forwarded-For not recognized?

         

LifeinAsia

6:43 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I have a pound running on 192.168.1.12 as a proxy server. Most requests will hit the web server through the proxy, but sometimes I will be bypassing the proxy. I want the actual client IP to be logged.

Here's the relevant part of apache2.conf (version 2.2):
LogFormat "c %{X-Forwarded-For}i %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "p %{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy_log
SetEnvIf X-Forwarded-For 192.168.1.12 proxfor
CustomLog "access" combined env=!proxfor
CustomLog "access" proxy_log env=proxfor
(Note: the initial c/p is just for testing so I can tell at a glance which format is being used to log the request).

Bypassing the proxy gives:"c - 192.168.1.100 - -..."
Going through the proxy gives:"c 192.168.1.100 192.168.1.12 - - ..."

What am I doing wrong?

lucy24

7:46 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Isn't that backward? If 192.168.etcetera is your proxy, then that's what would show up in the IP slot, while the "real" IP (the X-Forwarded-For) is something else. So when would the environmental variable ever get set?

Tangentially: what's "access" in quotes?

LifeinAsia

8:04 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Originally, it was
SetEnvIf X-Forwarded-For "^.*\..*\..*\..*" proxfor
I guess I got confused when I was trying to customize it.

I also tried the following (separately):
SetEnvIf Remote_Addr 192.168.1.12 proxfor
SetEnvIf Remote_Host 192.168.1.12 proxfor
Still getting the same logging.

"access" is the name of the log file.

whitespace

8:21 pm on Apr 7, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I agree with lucy24, the proxy server IP you are seeing in the logs would seem to be the result of the %h format string. (I would expect the X-Forwarded-For header to be a comma+space separated list, with the last (proxy server) IP omitted.)

So, I would have perhaps expected the following to work:


SetEnvIf Remote_Addr 192\.168\.1\.12 proxfor


Can you confirm that the "proxfor" environment variable is being set (or not). Maybe it's the CustomLog directives that are failing to check this correctly?

LifeinAsia

8:39 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've tried
SetEnvIf Remote_Addr 192\.168\.1\.12 proxfor
SetEnvIf Remote_Addr "192\.168\.1\.12" proxfor
SetEnvIf Remote_Host 192.168.1.12 proxfor
Nada.

Can you confirm that the "proxfor" environment variable is being set (or not).
How else can I check this?

LifeinAsia

8:43 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Just for fun, I reverse things and tried
CustomLog access proxy_log env=!proxfor
CustomLog access combined env=proxfor

And I'm getting the exact same results.

lucy24

8:53 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Incidentally...
SetEnvIf X-Forwarded-For . proxfor
is all you need if you're saying "this header field exists and has content". But I, in turn, agree with whitespace that you're more likely to get the intended results if you begin with the IP ("Remote_Addr" in setenvif-speak). Otherwise you'll be getting white noise from visitors sending the X-Forwarded-For header for other reasons, like coming in from someone else's proxy or sending a completely bogus value (uncommon but attested among robots).

In the original post, there were at least three places where things could go astray. The first step is verifying that the environmental variable is getting set at all. The <!--#printenv --> command is probably easiest, since it doesn't involve any config-file work; just stick it in some made-up page created for the purpose. If you don't currently use SSIs, give your made-up page the shtml extension.

Edit:
And I'm getting the exact same results.

If the lines aren't present at all-- delete or comment them out-- where do access logs go? Is the custom logging intended to be instead of or in addition to whatever the default is?

Further edit:
Are you really allowed to specify more than one logging format for the same file? I thought the point was to maintain separate log files for different situations, and then each file has its own format.

:: detour to pore over docs ::

whitespace

9:42 pm on Apr 7, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



How else can I check this?


Or, if using PHP:


<?php
var_dump(getenv('proxfor'));
?>


bool(false) if it's not set, or string(1) "1" if it is.

whitespace

9:46 pm on Apr 7, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Also, if you are including %{X-Forwarded-For}i in the LogFormat I think you'll need to enclose it in quotes - otherwise it's going to be impossible to parse.

LifeinAsia

10:04 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Is the custom logging intended to be instead of or in addition to whatever the default is?
Instead of.

Checking the variables- proxfor IS set when going through the proxy, and NOT set when bypassing- so it is being set correctly.

you're more likely to get the intended results if you begin with the IP ("Remote_Addr" in setenvif-speak)
Agree- I've switched to using that instead for further testing:
SetEnvIf Remote_Addr "192\.168\.1\.12" proxfor

LifeinAsia

11:06 pm on Apr 7, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



So this is what I have now:
LogFormat "p %{X-Forwarded-For}i %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" c_proxy
LogFormat "c %{X-Forwarded-For}i %a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
SetEnvIf Remote_Addr "192\.168\.1\.12" proxfor
CustomLog /var/log/apache2/access.log c_proxy env=!proxfor
CustomLog /var/log/apache2/access.log c_proxy env=proxfor

No matter what, it uses the combined named log format, even though the selections don't allow for it.

And if I don't specify a LogFormat called combined, I get lines of "combined" (without quotes) in the log file.

lucy24

12:03 am on Apr 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if I don't specify a LogFormat called combined, I get lines of "combined" (without quotes) in the log file.

Well, that part at least seems like Apache doing what it's supposed to. If the nickname has been defined, use whatever it's the nickname for; otherwise use the nickname itself. So it recognizes that "combined" is a defined nickname-- but for some reason it's unable to parse the definition and instead has to use the one it does understand. The leading "c" is useful here, because it makes it unambiguous that the server is using something you've defined, rather than throwing up its hands and using the default.

:: detour to paste into text editor, because sometimes a monospaced font will bring new things to light ::

Nope, that looks perfectly clean.

What happens if you specify two different logfiles, like blahblah/proxy.log and blahblah/noproxy.log? Also, did you at some point try reversing the two LogFormat lines to verify that nothing changes?

%{FOOBAR}e
Oh, well that's handy. It offers yet another way to verify that your environmental variable has not only been set, but is recognized by mod_log_config. Why not throw it into both LogFormat definitions to double-check. (Err... This IS all happening on something other than a live site, right?)

whitespace

1:02 am on Apr 8, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



...did you at some point try reversing the two LogFormat lines


Well, the LogFormat lines just above are the reverse of the lines specified in the OP.

No matter what, it uses the combined named log format, even though the selections don't allow for it.


Ok, that's getting weird... it's using the "combined" LogFormat even if you don't actually specify "combined" in any of the CustomLog directives?!

That would seem to suggest there is perhaps a CustomLog directive somewhere else in the server config? Or the server hasn't been restarted, or something....?

LifeinAsia

1:42 am on Apr 8, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



This IS all happening on something other than a live site, right?
Most definitely yes!
there is perhaps a CustomLog directive somewhere else in the server config?
Nope- searched it several times, as well as included files.
Or the server hasn't been restarted
It has been. Each time I made a change:
- upload changed apache2.conf
- restart Apache
- reload test page
- download log file
- edit log file (and get disappointed again)
Just to make sure I was doing everything on the right server, I purposely put an error in apache2.conf and did all the regular steps to make sure Apache choked when trying to restart and that the test page didn't render.

Barring some major flash of insight from someone, at this point, I'm leaning towards just routing all the connections through pound and taking out the %a variable from the logging format. But I hate having yet another "something's not working quite right, but no one knows why" issue.

And mucho thanks to both of you for helping me work through this!

lucy24

3:46 am on Apr 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wait, wait, don't throw in the towel just yet. I'd really like to exclude gremlins,* if nothing else.

What happens if you remove (comment-out) all CustomLog lines? And all LogFormat lines? What happens if you include the bit in LogFormat that names the value of your environmental variable? (Just to make double-sure that prox-whatever is getting correctly logged, and that your most recently edited log format is being used.) Can you physically unplug the server? (That is, the next step beyond restarting.)

the LogFormat lines just above are the reverse of the lines specified in the OP.

Mm, so they are. Does the server simply dislike the specific names being given to the logging formats?

Now, personally I'm leaning toward the explanation that there is some completely unrelated issue--something that could not possibly ever lead to this particular set of symptoms, so nobody would even think of looking at it. It may or may not lead to a "D'oh!" moment followed by vigorously kicking yourself; it depends on what the problem turns out to be.


* The ones that introduce fresh errors into ebooks that have been proofed, re-proofed, and tested six ways from Sunday. I can generally, but not always, trick them by letting new titles marinade for a few weeks, so they get bored and introduce their last-minute errors before I go live.

LifeinAsia

4:40 pm on Apr 8, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What happens if you remove (comment-out) all CustomLog lines?
No change
And all LogFormat lines?
All the lines in the log are nothing but the word "combined."
What happens if you include the bit in LogFormat that names the value of your environmental variable?
Yes- it's being set.
Can you physically unplug the server?
In all the testing, I was just restarting Apache. For fun, today I also tried shutting down the server and restarting. No change. And the server is actually a VM, so unplugging would mean taking some live sites offline.
Does the server simply dislike the specific names being given to the logging formats?
I tried several different names. No matter what, it seems to only want to use the one called combined.

lucy24

8:44 pm on Apr 8, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it seems to only want to use the one called combined

What does it use if there is nothing called "combined"? Or did you mean that it only likes the form with output beginning in "c blahblah", no matter what you call it?

:: wondering if "combined" is a Reserved Word, but if so, why is it letting you redefine it? ::

Oh, wait, one last question (for now). If you set it up to write to two separate log files -- like "withproxy.log" and "withoutproxy.log" -- does it do as you ask? And do both files still use the same format? The object here is to figure out whether it's taking issue with the CustomLog or with LogFormat.

:: memo to self: play with MAMP tonight, when I'll have lots of time, and verify that things happen as they ought ::

Edit: A cursory look at MAMP's built-in config file (which is presumably very similar to the generic config file for a real Apache installation before you start editing it) confirms that yes indeedy, they do use the word "combined". So if you don't define something with the name "combined", it may still exist somewhere.

LifeinAsia

10:05 pm on Apr 8, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What does it use if there is nothing called "combined"?
That's the case where it just prints "combined" in the log file (no actual data).
If you set it up to write to two separate log files
It ignored and continued to write everything to the default log (although it did create empty files with those names).

lucy24

1:41 am on Apr 9, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's the case where it just prints "combined" in the log file (no actual data).

Well, that's something at least. (i.e. it's doing what you would expect it to do, and there's no lurking line of code defining "combined" elsewhere.) I guess what I should have asked is: what happens if you change your "combined" nickname to something else, like "superlogcombo", in both CustomLog and LogFormat lines? Will it then use this new style to the exclusion of anything else, or will it revert to the default logging format?

Another question that arises after poring over MAMP's config file: is all this logging business lying loose in the config file, or is it inside a Virtual Host section that's specific to this site?

It ignored and continued to write everything to the default log (although it did create empty files with those names).

For some reason this leads to a vivid mental picture of whitespace leaping up and shouting Eureka!

We Shall See.

LifeinAsia

5:51 pm on Apr 9, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



what happens if you change your "combined" nickname to something else
That's the case where it just prints "combined" in the log file (no actual data).
is all this logging business lying loose in the config file, or is it inside a Virtual Host section that's specific to this site?
Lying loose. They are the last lines before
Include conf.d/
Include sites-enabled/

We may be getting close to an AHA moment...
I looked in sites-enabled and the default has the following line in it inside the default VirtualHost section:
CustomLog ${APACHE_LOG_DIR}/access.log combined

Commented that out and moved the new stuff there instead.
For some reason, the proxfor variable is being set even when not going through the proxy. If I change that variable name to proxfor1, but still use the variable checking to determine which log format to use (env=/=!proxfor), that part works (using the non-proxy format).

So now we're down to why it's always setting the proxfor variable. I have tried the following:
SetEnvIf Remote_Addr 192.168.1.10 proxfor
SetEnvIf Remote_Addr "192.168.1.10" proxfor
SetEnvIf Remote_Addr "192\.168\.1\.10" proxfor
(And yes, changing the .12 from the original post to .10 was intentional- I went with a different proxy and changed IPs around.)
Just for fun, I tried
SetEnvIf Remote_Addr "192.168.1.11" proxfor
proxfor was not set in this case. So good, it's working and not just setting the variable always. But the question is WHY is it setting it when the remote address is NOT .10?

Just for fun, I setup my firewall to point to the proxy server (case A) then point to the DEV server (case B) and hit it from outside the firewall. In case A, proxfor was set, but not in case B.

So it seems that the 3rd element ("192.168.1.10" in this case) is treated as a regex expression. Internally, I was hitting the DEV site from .100, and it looks like Apache was saying, "Hey, .10 looks like .100, so let's set the variable!"

I was unable to figure out a regex expression that would work properly, so I ended up changing the IP of the proxy to .26. The logging seems to work as expected now!

I would prefer to change the IP back to .10 if anyone can suggest a working regex that will pass for .10 but fail for .100. But I can live with the proxy on .26 if I have to.

lucy24

8:24 pm on Apr 9, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's the case where it just prints "combined" in the log file (no actual data).

But, but, splutter-- where's it getting the word "combined" from, if the CustomLog line uses some other word? I thought you meant
CustomLog blahblah combined
and then if the nickname "combined" hasn't been defined, it just prints the word, as if that itself were the logging format.

:: quick detour to MAMP to verify that this is what would happen ::

I looked in sites-enabled and the default has the following line in it inside the default VirtualHost section:
CustomLog ${APACHE_LOG_DIR}/access.log combined
I would expect things inside envelopes-- including, in this case, the VirtualHosts envelope-- to override things lying loose in config.

SetEnvIf Remote_Addr 192.168.1.10 proxfor
SetEnvIf Remote_Addr "192.168.1.10" proxfor
SetEnvIf Remote_Addr "192\.168\.1\.10" proxfor

The argument is treated as a Regular Expression, so yup, you'd need anchors fore and aft if you needed to exclude things ending in 10x. In fact opening anchors are always good with a Remote_Addr condition, so the server can get out of there faster if it doesn't match right away. ^ and $ as with any RegEx; it means "the beginning or end of the string being evaluated".

The escaping of periods is, here, a non-lethal error: it's impossible for anything other than a . to come after "192" and "168", and in the full sequence "1.10" the second character cannot be anything other than a dot. (Something beginning in, say, 121 would match up to "1.1", but would fail at "0".)

Quotation marks don't generally matter in mod_setenvif. They're most useful for protecting literal spaces (can't be escaped for some reason, though some mods do allow it).

The second element (Remote_Addr) is not a regular expression. So, for example, "Accept" means only that, and not "Accept-Language" or "Accept-Encoding" or "Accept-Charset".

LifeinAsia

10:17 pm on Apr 9, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



So you're saying
SetEnvIf Remote_Addr ^192.168.1.10$ proxfor
should pass for .10 and fail for .100?

And in testing, yes, it does seem to work! :)

Changed the proxy IP back to .10 and updated everything, and it seems like it's FINALLY good to go!

Again, thanks for all your help!