Forum Moderators: DixonJones

Message Too Old, No Replies

Hitslink going wrong?

That's The WPG one...

         

Receptional

5:23 pm on Jan 20, 2003 (gmt 0)



Those of you that use the WPG tracking system are using Hitslink. We are doing some worrying here...

In late November, Hitslink updated its software, but it now seems to be poorly identifying a daily unique - something it did quite admirably before.

This was really expensive for us for a while - we were getting visitors via PPC at one rate and LOSING MONEY because not all visitors were getting charged on to the customer. We are still trying to figure it out, but here's what we THINK the new system may be getting wrong:

If two people arrive on the same IP within 24 hours, they are identified as the same user. This means a University or company may well only be identified as one user. Previously it seemed dependent on the cookie.

Previously this did not seem to be the case - we are comparing with other log systems and have not yet decided it is Hitslink or new IE6 default cookie settings but either way, it's a problem.

Receptional Andy

1:56 pm on Jan 28, 2003 (gmt 0)



Any update on this, Melissa? bearing in mind that it is an ongoing problem.

melissa99

5:14 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



I am documenting the exact algorithm - I should have it done sometime today.

melissa99

10:11 pm on Jan 28, 2003 (gmt 0)

10+ Year Member



Hello,

I stepped through the code and the prior algorithm I posted was accurate.

So... (here is the controversial part)...Although both algorithms are not completely accurate, I think the prior algorithm was worse than the new one. Let me explain:

Prior to Nov., we did not properly identify visitors that would not accept cookies. This was not a problem in the past, since the vast majority of browsers accepted cookies ( 99% + ). In this case, I think we counted every page view as a new visitor, which is wrong.

Since IE6, there has been a proliferation of cookie-blocking corporate policies, firewalls and users. Therefore, using the cookie approach became more and more inaccurate, causing a higher visitor total than reality.

Now that we first check the cookie, then the IP address, I believe we are closer to reality, however, we now ~underreport~ visitors coming from behind proxies, which I understand is a big, big problem for SEO consultants.

The only way we can uniquely identify users is cookies and IP addresses. Large organizations that block cookies will all show up as the same visitor. The variable we can control is the length of time to consider an IP address as a unique visitor, which is now 24 hours.

An improvement we can make to is to consider the client data for each visitor. For example, different screen resolutions on different hits should count as different visitors. This will help a little, but not completely solve the problem. We are going to implement this in the future (no idea when yet).

I am open to bringing suggestions to the company.

PS: We use P3P compliant CPP's, so IE6 by default will not block our cookies.

Receptional

12:28 pm on Jan 29, 2003 (gmt 0)



Mellisa, Hi again, and thanks for clarifying,

My thought would be to say two things in the algorithm:

1) If an IP number does not download/refresh a page on the site within 15 minutes of its previous activity, it is probably a new user. I believe Livestats uses this, although I accept that this changes your definition of daily uniques.

2) (This is a Bolean OR not AND) if the IP number is on the site for over say an hour the chances are that it is not the initial user.

I don't know if this is better or not. In the meantime, I have done some comparisons on 3 sites with 3 alternative counters - you might wish to see if you feel any of them have cracked it.. The results are:

All stats are for 28th January:
Site 1: Hitslink recorded 479 daily uniques, counted.com recorded 630
Site 2:Hitslink recorded 108 daily uniques, webstat.com recorded 118
Site 3: Hitslink.com recorded 686 daily uniques, web-stat.com recorded 670!

The last one is odd, given that the site is busy and therefore should have the error compounded, but the other two counters might be worth looking at to see what they do different.

I guess that short term, the algorithm isn't going to be changed though, which means those of us using it to get paid based on daily uniques and SEO should work out an alternative.

Dixon.

Receptional

4:13 pm on Feb 11, 2003 (gmt 0)



Melissa

I think I have worked out a much better way of measuring daily uniques, (and god have I been thinking...).

Where a person DOES allow cookies, you have the following good data:
a) You know the number of daily uniques and
b) You know the average number of pages viewed.
c) You know that they accepted cookies.

Where a person does NOT allow cookies you only know
a) That they don't accept cookies and
b) The total number of pages viewed, but not per user.

So - am VERY ACCURATE approximation of daily uniques would be:

Number of dialy uniques from cookie users +
(Number of page views from non-cookie users / average pages viewed by COOKIE USERS).

Provided cookie users remain a substantial percentage (like half) of all visitors, then your daily unique figure should be fairly accurate. The only distortion would be abnormal behaviour from non-humans, like spiders caught in traps, but if you identify spiders that is minimized anyway.

Does this make sense? If so, how soon could you fix the rapidly degeneratind algo?

Dixon.

This 35 message thread spans 2 pages: 35