Welcome to WebmasterWorld Guest from 220.127.116.11
I apologize if this question has been asked before I have searched but not found an answer yet.
I work for an organization that has many pdf files on its website and we need to track how often these pdfs are downloaded. We recently realized that when large pdf files are downloaded, the client software will issue a download request and then a series of partial download requests. In the webserver logs this shows as a single entry with status of 200 (successful download) and then a series of 206 (partial download) codes.
The problem is our current logfile analysis tool (Summary.net) treats them all as requests for the file, thus greatly inflating the actual number of actual downloads. For example, on a recent day a specific document was listed as being "downloaded" 135. However when viewing the logfile it was clear it was only downloaded three times; the other 132 were "partial download" entries.
So my question to the forum is does anyone know of logfile analysis tools that correctly report downloads (e.g. ignore the 206 status codes when they are linked to a successful 200 code)? I have been to MANY analysis tool websites (e.g. AWStats, others) but have not found any that specifically claim to deal with, or even acknowledge this problem.
We have considered retooling our website to write to a database when a file is requested, and other such workarounds, but this seems like a problem logfile analysis tools should be able to handle.
I just thought there were not successful attempts.
My PDFs tend to be large, about 5 megs and scanning logs after I've separated out bots (a prog does this automatically)
I tend to realize the PDFs download OK if they reach the max bytes of the file.
I write logs by IP plus time to a file.
I may modify this based on your comment:
Probably IP plus date plus hour plus file name accessed.
(so I'd update the record if exists instead of writing new sql record, by checking first, trying to retrieve the rec)
That sounds like would give accurate totals.
I seem to have a problem also that some users never let the download complete. At least the log entries don't total to the file bytes, which might make sense: What they want is in my PDF but just on one page, and they don't want to download all the other pages to get to it, they were expecting just a small PDF perhaps.
Mega, there's a lot of compression and stuff going on with those pdf download manager programs and the total file sizes will have little to do with the actual file size. Furthermore, the number of pieces a pdf is broken into will vary, for the same pdf.