Forum Moderators: DixonJones

Message Too Old, No Replies

Tracking downloads

Huge discrepancies in reported PDF downloads

         

fom2001uk

4:13 pm on Dec 18, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On a couple of WebTrends reports, I've noticed some major discrepancies between the number of PDF downloads reported, and the total visits. For example, one report has around 6000 downloads of a PDF file listed. But this seems crazy when there were only 300 visits.

Something is amiss I fear.

Any ideas would be appreciated.

rcjordan

4:20 pm on Dec 18, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



While I suspect that this is really a tracking problem and typically dislike a pat answer, I think pdf files are particularly at risk for being referenced from other sites. I generate thousands of pdf downloads for a few sites by deep-linking but never refer any traffic (government schedules and maps).

Hannu

9:06 am on Dec 28, 2001 (gmt 0)

10+ Year Member



If a user reads the PDF document in the browser window (inline-version of Acrobat Reader) instead of downloading it, the Reader will only download as much as it need to display the first page.

When the user scrolls to the next page this will generate an additional request on the webserver, and so on. In this way a 10 page PDF read inline will count for 10 downloads.

See WebTrends support note on issue [webtrends.com].

Another explanation is: when eg. two users that come from the same company/firewall (=same IP number) download a PDF document within the same 30 minute time-slot, they count for 1 visit but 2 downloads.

Brett_Tabke

2:02 pm on Jan 4, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Don't trust Webtrends on the issue. Open up the log file in a text editor and start looking at the raw lines for those 6000 dl's. Look for referrals, look at the ips, and look for abuse from a rogue spider.

fom2001uk

3:02 pm on Jan 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great info guys. But this begs the question, how do you know how many visitors are actually saving the PDFs to their hard drive.

I know whenever I come across a PDF file, I always right click and save it for reading offline. That's a more useful statistic to me.

Is there a way of telling from the logs how many times the PDF was actually saved, rather than just viewed.

Hannu

2:43 pm on Feb 21, 2002 (gmt 0)

10+ Year Member



After a bit of research I have come up with this:

As far as I know you can't see how many actually saved the pdf-doc. However, you can get a more precise number on how many saved the doc OR viewed the doc online.

Every time a user (succesfully) downloads the doc, the webserver returns the return code 200 (=OK).

When a user reads the doc online the webserver first returns the return code 200. When the user goes from page one to page two in the doc the webserver returns a code 206 (=Partial Content).

So, what you do is filtering the report for requests with return code 206:

In WebTrends, make a new profile only for this purpose. Add two filters:

1) Include: file = *.pdf (or a specific pdf-doc)
2) Exclude: return code = 206

Now the "No of downloads" will show a more precise number. The following is stat from our own website:

Downloads without filtering: 619
Downloads with filtering: 100
(user sessions: 80).

/Hannu

ktmatu

10:49 am on Feb 22, 2002 (gmt 0)

10+ Year Member



I have used the following command line to remove 206 requests from a log file in Apache combined format:

grep -v '\.pdf .* 206 ' access_log > new_access_log

Hannu

2:03 pm on Feb 22, 2002 (gmt 0)

10+ Year Member



>> to remove 206 requests from a log file

Basically I don't think it is a good idea to remove anything from the logfile. It is better to make the proper filters.

And, in this specific case the 206 requests can show how many pages in the PDF file the users read.

By making an include filter in WebTrends with 206 requests you can see how many pages have been "turned" and thereby get an average on how many pages the users read.

/Hannu

fom2001uk

3:05 pm on Feb 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hannu, that is brilliant, thanks :-)

You seem to know your WebTrends filters, so do you mind if I pick your brains on something ?

One of my clients wants to know how many externally referred visitors to the homepage (search engine visitors mostly) exited the site at the homepage, after 10 seconds or less.

Top Exit Pages report doesn't give you time online. Any ideas ?

bruhaha

4:00 pm on Feb 22, 2002 (gmt 0)

10+ Year Member



what you do is filtering the report for requests with return code 206

There is a quicker way to get count of the number of times a pdf file was "really" downloaded (discounting page views).

WebTrends lists not only the total "number of downloads" for each file, but also the number of "download sessions". The first of these includes all those extra page views in IE, but the latter should give you the net number you are after.

Hannu

8:25 pm on Feb 25, 2002 (gmt 0)

10+ Year Member



fom2001uk, you're welcome :-)

I think I know the answer for your next question - just have to check something out - get back to you.

bruhaha, you are right, but I think there is a problem with these numbers. The number of download sessions is based on ip addresses and (relevant for B2B sites) if 5 users from within the same domain (firewall/proxy = same ip addresses) download a pdf within a timeframe of 30 minutes it will count only 1 user session.

/Hannu

mivox

8:40 pm on Feb 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's the 206 issue... I have the same thing happen ALL the time. Someone will rack up 100+ log entries for ONE pdf file. I'll have to try some of the solutions given here.

You could also compress the PDF files into ZIP format, which would force your visitors to download the files... then they could read them offline at their leisure.

Hannu

1:28 pm on Feb 26, 2002 (gmt 0)

10+ Year Member



>>...PDF files into ZIP format, which would force your visitors to download the files...

From a tracking point of view it's a good idea.

But from a usability point of view I don't think it is recommendable that you force users to download and then unzip before they can read the doc.

Maybe it would be a good idea to provide an alternative link for a zip file??

/Hannu

Hannu

4:05 pm on Feb 28, 2002 (gmt 0)

10+ Year Member



One of my clients wants to know how many externally referred visitors to the homepage (search engine visitors mostly) exited the site at the homepage, after 10 seconds or less.

A bit tricky...

As far as I can figure it out you can only get numbers for how many who exited after < 1 minute.

Create one include filter:

Referring URL: http:* www*

And one exclude filter:

Referring URL: *clientsdomain.com*

These two filters will exclude all internal referrers and "No Referrer".

Run a report with the tables "Activity Level by Length of Visit" and "Number of Views per Visit".

I think this is the closest you can get...

If you want to see the numbers for specific referrers just replace the include filter with eg. http://www.google*.