|Logfile Tools and Partial Downloads (200 vs 206 status code)|
Any help on logfile analysis tools that accurately report pdf downloads?
I apologize if this question has been asked before I have searched but not found an answer yet.
I work for an organization that has many pdf files on its website and we need to track how often these pdfs are downloaded. We recently realized that when large pdf files are downloaded, the client software will issue a download request and then a series of partial download requests. In the webserver logs this shows as a single entry with status of 200 (successful download) and then a series of 206 (partial download) codes.
The problem is our current logfile analysis tool (Summary.net) treats them all as requests for the file, thus greatly inflating the actual number of actual downloads. For example, on a recent day a specific document was listed as being "downloaded" 135. However when viewing the logfile it was clear it was only downloaded three times; the other 132 were "partial download" entries.
So my question to the forum is does anyone know of logfile analysis tools that correctly report downloads (e.g. ignore the 206 status codes when they are linked to a successful 200 code)? I have been to MANY analysis tool websites (e.g. AWStats, others) but have not found any that specifically claim to deal with, or even acknowledge this problem.
We have considered retooling our website to write to a database when a file is requested, and other such workarounds, but this seems like a problem logfile analysis tools should be able to handle.
Wow, I see that in my logs for PDF files.
I just thought there were not successful attempts.
My PDFs tend to be large, about 5 megs and scanning logs after I've separated out bots (a prog does this automatically)
I tend to realize the PDFs download OK if they reach the max bytes of the file.
I write logs by IP plus time to a file.
I may modify this based on your comment:
Probably IP plus date plus hour plus file name accessed.
(so I'd update the record if exists instead of writing new sql record, by checking first, trying to retrieve the rec)
That sounds like would give accurate totals.
I seem to have a problem also that some users never let the download complete. At least the log entries don't total to the file bytes, which might make sense: What they want is in my PDF but just on one page, and they don't want to download all the other pages to get to it, they were expecting just a small PDF perhaps.
The first piece of the pdf is a 200 status code in your logs and the rest are 206's, so if you can filter out 206's in this stats program, you will have pdf download starts. Or, if your stats program provides "visits" numbers, then use that number instead of hits or files. If your stats program won't do either of them, then going into the logs manually is your main choice.
Mega, there's a lot of compression and stuff going on with those pdf download manager programs and the total file sizes will have little to do with the actual file size. Furthermore, the number of pieces a pdf is broken into will vary, for the same pdf.
cgrantski: yes I concur with your analysis.
So the question is does anyone know of logfile analysis programs that HAVE the functionality to filter out the 206 codes?