Forum Moderators: phranque

Message Too Old, No Replies

README.html not logged?

         

Dan99

2:29 am on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



So when I offer a link to a directory, I sometimes include a README.html file in that directory, the contents of which Apache very conveniently appends to the directory list. When I GET that directory, the listing of the directory contents pops up (along with all the boilerplate icon gifs) with the README.html file displayed below. OK, fine. But what has me puzzled is that while that README.html file is always properly served and displayed, it never shows up in the Apache access_log as having been served. Huh? I see a GET for the directory in the logs, and GETs for all the icons there as well. But not for the README.html file. It's as if the Apache log is unaware that file was ever served.

Strikes me as a bit weird. Nothing wrong. Just weird. Why would that file not be served in response to a loggable GET?

phranque

3:58 am on Mar 12, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



are you sure it isn't a browser cache issue?

have you tried the Live HTTP Headers FF plugin to see if that reveals anything interesting?

lucy24

4:00 am on Mar 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why would that file not be served in response to a loggable GET?

Is it not getting served or not getting logged? If you're asking purely about times when the file content is displayed alongside a directory index, that would never show up as a GET request anyway. Internal requests aren't included in access logs. Similarly if you have SSIs, logs will never list them by name. The directory index itself is created by a bit of php that isn't logged under its own name. (This is one of many, many things I discovered purely by accident.)

In addition, you can tell Apache to exclude certain files from logging. But on rereading, I don't think that was your issue.

:: wandering off in search of further enlightenment ::

Oh, oops, that was easier than I thought.

[httpd.apache.org...]

The whole thing really is closely analogous to indexing-in-general. When you request a directory that does have an index.html file, and the request is properly formed, your logs will never show a request for /directory/index.html. Only for /directory/. mod_dir and mod_autoindex take care of the rest behind the scenes.

Dan99

1:15 pm on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



The README.html file getting served, but that service is not getting logged. I've certainly never told my Apache not to log them. But you say that "internal requests aren't included in access logs". Really? So the answer is basically, "because that's the way it is". Yes, I think it's also true that when one of my index.html files is served, that service isn't logged.

I have to wonder then, what defines an "internal request"? Is that a request that wasn't made explicitly, but is just tacked on to another request?

This certainly isn't a big deal. Just a curiosity.

lucy24

7:13 pm on Mar 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is that a request that wasn't made explicitly, but is just tacked on to another request?

Yes, exactly. It's any request that was not sent in by the browser (or Googlebot or Ukrainian scraper or...). In mod_rewrite terms, it's anything constrained by %{THE_REQUEST}.

Just make sure you distinguish between browser requests and human-user requests. When you type in the name of a web page, all the stylesheets and pictures come along for the ride-- along with assorted scripts that you don't even know about. You didn't have to type in all their names separately. But even though the human didn't have to take any further action, the browser is sending in all those requests. On the other hand if you've got something like
/includes/navfooter.html
as an SSI or php include, that file will not be logged, because the browser doesn't know it's a separate file.

Now, if you manually type into your browser's address bar the exact path to your /readme.html file, then that will show up in logs, because you're explicitly requesting it.

Dan99

7:24 pm on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



That's fascinating. Thank you.

Of course, as I said, when someone requests a folder, and is presented with the folder listing index, there is no human-request for the gif icons that go with that listing -- back.gif, blank.gif, layout.gif, etc., but those are sent anyway by the server. Serves of those gifs ARE logged. So that's an example of serves that are logged that the human client didn't explicitly request.

lucy24

8:36 pm on Mar 12, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So that's an example of serves that are logged that the human client didn't explicitly request.

It isn't whether the human requested them, it's whether the browser requested them. Anything represented in the html with <img src blahblah> or <link rel="stylesheet" blahblah> will be separately requested by the browser, even though the human user doesn't know it's happening. If you go to one of your auto-index pages in your browser and View Source, you'll see the image links.

Dan99

9:32 pm on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



OK, now I understand what you meant about distinguishing human requests from browser requests. Not that one will be logged and the other not. They both are. But it's the self-generated requests from the server (as in, sending the README.html) that are not logged. That is, what are logged are just requests from the client, and those will include icons that are represented in the html.

Very interesting.