Forum Moderators: phranque

Message Too Old, No Replies

Obscuring data in Apache Logs

Piped logging, logformats, conditional logging

         

timster

3:44 pm on Feb 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am looking for the best way to obscure sensitive data being written to our Apache access logs. There is some personal data that can be posted in querystrings.

One option is to use piped logging to run the access log messages through a regular expression, to blot out the sensitive data. The upside is having a lot of control over what gets logged or not. The downsides are it adds complexity, and some load to the webserver boxes.

Here is Apache's info on piped logs:

[httpd.apache.org...]

The regex might look something like this:

s/(sensitive_param\=)([^;]+)/$1#*$!/g;

Another option is to eschew logging the querystrings entirely, by changing the Apache Logformat. This might obscure some useful information, but avoids adding load to the server. Another potential upside is it obscures other data that users might not prefer to be recorded in the logs.

A good discussion of this method is here:

[webmasterworld.com...]

bird suggested:
\"%{REQUEST_METHOD}e %{SCRIPT_NAME}e %{SERVER_PROTOCOL}e\"

A third option considerd was conditional logging. Although it would be possible to accomplish the goal this way, it became clear it was not the appropriate tool for the problem. It is for skipping entire log entries lines.

[httpd.apache.org...]

Any feedback is welcome.

jdMorgan

4:08 pm on Feb 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest backing up a step or two, and considering the bigger picture: "Sensitive data" does not belong in query strings. While you are focused on preventing people who already have access to the server from seeing this data, you're apparently neglecting other concerns such as a "man in the middle" capturing this data as it passes through various ISPs and network routers, etc. And what about un-secured wireless internet connections?

My recommendation would be to POST this data to an SSL-secured page, and then generate a single-session pseudo-random lookup key to put in the query string so that you can associate subsequent user transactions with that previously-posted "sensitive data" and can thereby track the user's session.

Alternatively, if for some reason you can't implement that, you should at least consider munging (re-ordering characters and then encrypting or encoding) the sensitive data, so that it cannot be read as clear text anywhere in the transmission path.

Webmasters are responsible for their visitors' security. This is an ethical responsibility in all cases, but a legal responsibility in many cases -- and especially in the EU. I suggest that you balance the cost and complexity of improving your site's security against the very real possibility of future legal costs...

Jim

timster

7:21 pm on Feb 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, thanks for the concern. I didn't mention the data is being sent via HTTPS, and we require strong encryption in the browser, so the query strings are pretty well protected on the wire. Plus lawyers and security pros are making sure we're in compliance, so I'm pretty confident I'm attacking the right problem.