Forum Moderators: phranque
One option is to use piped logging to run the access log messages through a regular expression, to blot out the sensitive data. The upside is having a lot of control over what gets logged or not. The downsides are it adds complexity, and some load to the webserver boxes.
Here is Apache's info on piped logs:
[httpd.apache.org...]
The regex might look something like this:
s/(sensitive_param\=)([^;]+)/$1#*$!/g;
Another option is to eschew logging the querystrings entirely, by changing the Apache Logformat. This might obscure some useful information, but avoids adding load to the server. Another potential upside is it obscures other data that users might not prefer to be recorded in the logs.
A good discussion of this method is here:
[webmasterworld.com...]
bird suggested:
\"%{REQUEST_METHOD}e %{SCRIPT_NAME}e %{SERVER_PROTOCOL}e\"
A third option considerd was conditional logging. Although it would be possible to accomplish the goal this way, it became clear it was not the appropriate tool for the problem. It is for skipping entire log entries lines.
[httpd.apache.org...]
Any feedback is welcome.
My recommendation would be to POST this data to an SSL-secured page, and then generate a single-session pseudo-random lookup key to put in the query string so that you can associate subsequent user transactions with that previously-posted "sensitive data" and can thereby track the user's session.
Alternatively, if for some reason you can't implement that, you should at least consider munging (re-ordering characters and then encrypting or encoding) the sensitive data, so that it cannot be read as clear text anywhere in the transmission path.
Webmasters are responsible for their visitors' security. This is an ethical responsibility in all cases, but a legal responsibility in many cases -- and especially in the EU. I suggest that you balance the cost and complexity of improving your site's security against the very real possibility of future legal costs...
Jim