Forum Moderators: DixonJones
This isn't a technical question, as I'm already experienced with GET and POST programming. It's more like a question that regards possible policy issues.
EPIC is interested because 1) European legislators are talking about mandatory data retention for sysadmins, and 2) I pointed out that the QUERY_STRING in httpd logging is vulnerable to new language in the U.S. Patriot Act. This affects virtually all logging of search engines, and logging of dynamic sites with local search options.
I recommended that they approach Apache and ask them to add a compilation switch to their logging modules that would truncate the log line at the question mark, thereby leaving off the QUERY_STRING in the log line. This QUERY_STRING, when used by search engines, contains search terms and therefore has privacy implications.
By allowing a compilation switch in the logging module, sysadmins would be able to keep this information out of httpd logging. Since we're talking about dynamic page generation in most cases, programmers still have the option of saving whatever interests them in their own CGI logs. And the switch itself, of course, is an option during installation.
Does anyone know who should be approached at Apache about this situation? It's a no-brainer and very simple to do from a programming perspective, but this is more of a policy question. I have no experience with the good folks at Apache.
Another area of interest is software used by librarians. The feds are scooping up data about borrowing habits of library users, and it's illegal for the librarian to even mention that the FBI requested the information. Therefore, the American Library Association is talking about the need to destroy library records as soon as practicable.
Library terminal software and book checkout software ought to have options to customize record keeping and data retention. Does anyone have any familiarity with software used in libraries?
Your question doesn't require me to understand but I am curious, the Apache option could easily be circumvented even if they provided it, it's open source after all. So is the idea to create a method to disable logging of post data to make it reasonable to comply with such a requirement, and to simply reduce the number of places where private information might be stored?
Your question doesn't require me to understand but I am curious, the Apache option could easily be circumvented even if they provided it, it's open source after all. So is the idea to create a method to disable logging of post data to make it reasonable to comply with such a requirement, and to simply reduce the number of places where private information might be stored?
The Apache option would be for sysadmins who prefer not to have the QUERY_STRING in the logs. If you don't have the information in the first place, no legal authority can compel you to produce it. The option is not designed to make it easy to comply with a potential legal requirement, but rather to make it easy to argue that any given legal order is irrelevant with respect to this information, because you haven't been retaining this information in the first place. You tell the judge that such information does not exist, or is not under your control if it does exist elsewhere.
Obviously, if a sysadmin wants to log this information, then that's their right. They merely keep their logs the same way they do now. This is an extra option for sysadmins who prefer smaller logs, and less vulnerability to demands for logging information of a sensitive nature.
QUERY_STRING logging is superfluous because anyone running CGI scripts can easily write their own customized logs in the process of executing that script. Many sysadmins might prefer less bulky logs simply for their own sake.
With the script writer responsible for his own logging of QUERY_STRING data, and an ISP that does not log this information with standard httpd logging, the legal responsibility for complying with a court order for this information would fall only on the script writer, and only if the script writer chooses to retain this the information in the form of his own custom logging.
Without the option to install logging modules that pre-truncate the QUERY_STRING, the sysadmin who wants to rid his logs of this data must take positive steps to filter his logs. In other words, delete the data. This would have to be done as part of a regular program of maintenance (such as a log rotation program). If done after receiving a court order, it would be destruction of evidence. Therefore, it's always preferable that the evidence not exist in the first place. That's the purpose of having an installation switch for compiling log modules that truncate this data.
With the LogFormat directive, you can define as many formats as you like, and with the CustomLog directive you can apply them to different situations (domains, protocols, etc.). Out of the box, the logs will usually be set to the "common" format (of NCSA heritage), or to the "combined common" format, which also includes the user agent string:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
What you seem to be interested in is the \"%r\" part. There is no direct replacement for that without the cgi parameters, but it is easy to combine the shortened string in other ways.
The \"%r\" is aequivalent to:
\"%{REQUEST_METHOD}e %{REQUEST_URI}e %{SERVER_PROTOCOL}e\"
and results in this log entry:
"GET /cgi-bin/search.cgi?argument=value HTTP/1.1"
If you want to remove the cgi parameters, then there are two cases.
The normal situation would look like this:
\"%{REQUEST_METHOD}e %{SCRIPT_NAME}e %{SERVER_PROTOCOL}e\"
which results in this log entry (for the same request as above):
"GET /cgi-bin/search.cgi HTTP/1.1"
If you redirected from the "normal" URL (eg. "http://example.com/search.html?argument=value") to your CGI script, but would like to see the requested URL in the logs (instead of the redirected one), then you'll need this:
\"%{REQUEST_METHOD}e %{REDIRECT_URL}e %{SERVER_PROTOCOL}e\"
which will result in this log entry:
"GET /search.html HTTP/1.1"
I think what I will advise EPIC to do is to ask Apache to change their "out of the box" logging to truncate the CGI parameters, and then those sysadmins who need this stuff can go in and mess with the config files.
As we all know, having learned it from Microsoft, the key to hegemony is power over the "default" setting.