Forum Moderators: DixonJones

Message Too Old, No Replies

Redirect page not logging properly

Urgent help needed!

         

MHes

12:51 am on Aug 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi

We sell traffic and set up a day trial for a big new client. They asked us to send the traffic to a simple redirect page with 'no cache'. They then look at their log file which is processed and shows page views.

We log out ip addresses and can see the database click throughs for their link on our directory. They both show 718 clicks out, a few people clicked twice but 90% unique ips.

Their redirect log shows 270 page views, so they are saying that is all we sent. I asked if they could see different people coming from the same ISP and they said yes, the program they use resolves ip addresses so they can see 20 from aol etc.

Is a redirect page view count a reliable source? I am positive we have sent this traffic and if we can agree figures we have a good contract....HELP!

ritualcoffee

1:49 pm on Aug 15, 2002 (gmt 0)

10+ Year Member



I would take the number of page views to the redirect page and compare that. Also - check the session length on the redirect page to make sure it is resolving quickly.

Can you get a breakdown from them and compare it against yours - that will help locate the issue.

MHes

2:17 pm on Aug 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ritualcoffee Thanks for the suggestion, I have sent the following to the client and I think it may be the answer....

Dear......

Your assumption that page requests for the redirect page will show an
accurate number of visitors from us is correct. However, I suspect the way
you are processing the raw logs to arrive at that figure is causing the problem.

The software you are using I think is firstly resolving ip addresses and
then giving you a page view total. This process makes sense when
determining multiple page views from one unique in the analysis of an
overall 'site' statistics. In a low traffic scenario the figures can be
reasonably accurate but once higher volumes are applied the figures
becoming very inaccurate.

By example ....the software is assuming, that in cases such as
a visitor from AOL using their browser and coming in via a proxy cache
server, they will appear as a different server ip address for each page they
request (this happens!). Therefore by resolving the ip and making all those ip
numbers count as 1 visitor, it hopes to solve this problem. If the software is
sophisticated enough, it will only do this over a limited time span, say 10
minutes. However, the more traffic a site/page has, the more AOL users
could be on at the same time, causing all of them to be counted as 1 in an
extreme situation, or the number is limited to the number of proxy servers in
use.

SImilar things will be happening from traffic from other big ISPs like
BT, Freeserve, NTL, POL, Demon etc etc.. Resolving will merge users when
it is applied to just a single page's log.

The other possible problem is that the 'no cache' may be ignored on
certain servers.

I am sure our stats are accurate. We have double checked with arrangements
we have with 3rd party traffic monitors such as Tradedoubler and other
affiliate trackers. The figures over the last few months agree exactly.
The data we are using is taken from raw logs, as sent to you. We also have
a back up count from our database requests, which also matches. I believe
if you check your own raw logs for any time period you will be able to
match all our ip addresses as coming in (note our quoted times are
'European' and thus vary between +100 and +30 mins gmt). We
extract any ip addresses which show twice together and remove any showing no referer as a possible spider (unlikely but we are prepared to accept this) in order to arrive at a
final visitor count. As, in this case, we are only dealing with one page,
the problem of a visitor requesting multiple pages is not an issue, so the
application of resolving ip's is causing a massive under count.

On 13/08/02 we show: (There domain)-020813.dat : 698 : 688 ....where the second
figure is the number of unique ip's minus duplicate ip's next to each
other.

A quick look at your raw logs should show all our ip's coming in and
demonstrate that the processed 'page view' figure you have is
undercounting.

I am sure you will find our logs to be accurate if you do not rely on the
'page view' count processed by your stats application. The real figure
will be in your raw logs.

I noticed, to support our theory further, the following:

The ip's
62.253.64.7
62.253.128.6
62.253.64.7
62.253.176.60
62.253.9.195
62.253.128.6

these are coming in throughout the day, but I suspect you only count the first
62.253 in your 'page view' count as the software will have already
'resolved' them into one. It is also interesting to note that the discrepancy appears to increase during the day, confirming that as an isp gets logged, all further referrers are ignored. This could be on an hourly basis.

Good theory?

ritualcoffee

2:49 pm on Aug 15, 2002 (gmt 0)

10+ Year Member



well - it makes sense to me but that is also because I'm responsible for online metrics at my company. But will it make sense to the person you are sending this to? Are the descrepincies being brought to your attention by marketing folk or IT?

Either way - I think you're arguement is good.

MHes

11:08 pm on Aug 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ritualcoffee

Thanks for your help, but the plot thickens!

They have checked their raw logs and the ip's we say we are sending are not there.... the only reason I can now think of is that the '<meta HTTP-EQUIV="Pragma" CONTENT="no-cache"> which they have put on the redirect page is being ignored. I have read that this can happen, and that servers will still cache the page. I am a bit out of my depth, but I think if the redirect page has no content, then the 'pragma' has nothing not to cache! Double negative but you see the logic. Therefore it ignores the command as there is nothing there 'not to cache', and thus caches the page. Eitherway, searches on google brings up loads of debate on how unreliable this tag is. But how can I prove it? Is there a tag or a way to insure no cache?

They sent me a list of the 'resolved isp's' they did recieve. 99% occur once and examples are:
in-addr.btopenworld.com
host213-1-77-117.in-addr.btopenworld.com
host213-123-133-76.in-addr.btopenworld.com
inktomi1-cdf.server.ntl.com
webcache-04.staffs.ifl.net

Could this be an indication that the pages are being cached?

ritualcoffee

1:35 pm on Aug 16, 2002 (gmt 0)

10+ Year Member



eeeewwwwwww - now that is out of my range of knowledge. I guess what you can do is supply them with the two sides of the debate and come to an agreement before any other campaigns or what have you and just move forward. Sorry - I am of no help!

MHes

8:01 pm on Aug 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ritual coffee

Thought I would let you know the outcome...

The redirect page had a no cache tag, BUT proxy servers do not parse html so ignore it. Hence they only saw one refer from each isp server!

There are two good ways to get around this. One is to change the http headers for a site by setting an expiry date of minus one day, or, as we are doing, putting a cgi date on the end of our link url to them. This does not prevent the link working, but makes it look a different url each time it is clicked.

The episode shows that most sites get dramatically less hits than they think, if their pages are cached. We reckon 500 referers may produce 250 log entries. However, if a site is dynamic, the pages probably will not get cached....

now here is the interesting bit, a site with a static index page may show half the visitors it actually gets, but if the rest of the site is dynamic, it will show all the pageviews. Thus the pageviews per visitors is completely wrong!

The company I was dealing with are a serious UK internet operation... and they are using stats that are completely wrong.

Scary stuff.

ritualcoffee

11:51 am on Aug 19, 2002 (gmt 0)

10+ Year Member



wow - I printed this sucker out for my files!

What was their response to this - did they have a heart attack! ;)