Forum Moderators: DixonJones
In late November, Hitslink updated its software, but it now seems to be poorly identifying a daily unique - something it did quite admirably before.
This was really expensive for us for a while - we were getting visitors via PPC at one rate and LOSING MONEY because not all visitors were getting charged on to the customer. We are still trying to figure it out, but here's what we THINK the new system may be getting wrong:
If two people arrive on the same IP within 24 hours, they are identified as the same user. This means a University or company may well only be identified as one user. Previously it seemed dependent on the cookie.
Previously this did not seem to be the case - we are comparing with other log systems and have not yet decided it is Hitslink or new IE6 default cookie settings but either way, it's a problem.
Exactly. Only Hitslink seems to think it is one daily unique. Since the client in question pays on the number of daily uniques, based on hitslink, that is an expensive error.
<P3P> Can't see anything about this on the hitslink FAQs and have never had a technically proficient reply to support yet. Time to take my white label elsewhere methinks...
Paying the cash on a LOT of sites as well. The page views data seems accurate, but returning visitors is becoming an unrealistic percentage on most sites.
So are you saying the freebie one doesn't record daily uniques? I am sure that we do have two sites that aren't in our white label which have the hitslink logo (so presumably free) that have daily uniques recorded. I don't want to pay for bad data.
Dixon.
quite rare these days as many companies once they have your money they tend not to care less - "customer service" usually being just a buzz word
The product is (has been) excellent in many respects - I dumped Webtrends AND LiveStats to run with it. But if IE6 is changing their boundaries, then they and we need to react quickly.
Of course, the problem here is pinpointing exactly what the problem is - whether it is p3p or proxy servers or a bug in the new database.
Dixon.
To be very fair, support replied instantly and effectively. Guess what the issue was? hyphens in the account names!
So - just goes to show that trying to second guess on the web can often lead to bad assumptions.
Dixon.
Here's my latest email to their support... For anyone who cares to listen.
Hitslink support:
Over the last 24 hours we have been removing hyphens from our account ids. However, this has not resolved the issue.
The issue must have arrived during the 27th of November. (Was this is the day you updated your programme?)
before that date, "New Visitors" on the repeat visitor chart was running at about 88%. By the 28th this was down to about 66%, which it continues to run at. If the hyphen had resolved the issue, then the new visitor % would have returned.
This is accross several accountIDs, The most expensive for us being XXXXX and is clearly not correct when we start seeing that our PPC traffic that we are being charged for is in excess of ALL traffic on the hitslink logs.
What next to fix this very expensive problem?
As before... Hitslinks is very receptive, and further (from my own experience) know what customer loyalty means to them.
I have on various issues received full refunds on monthly fees (3 times) and an account(s) 10% cost reduction for life due to error cause by their systems.
If you work with them - and the root cause is a problem at their end - I'm sure they will want to retain your business.
Anyway, I hope it is resolved soon. I am installing my own ssi tracking script - so at least if it goes wrong I've no-one to blame but myself!
I agree with your assessment, Receptional. The fact that the change in your data was sudden and that it has been consistent since the time of the Hitslink update points to a change in the definition of a daily unique - if it were a change in cookie days or something the data change would have been more gradual. Can you provide any more data that would help us deduce what might have changed in the definition. Are you seeing any other dramatic data changes between the 27th and 28th, i.e. total visits.
I would generally say that the data pattern does not match an IE 6 setting since that would manifest itself in more gradual data shifting . . . one possibility is the P3P . . . if in their code update they called a new file and that file didn't have a P3P policy (Whereas the old one did.) It could suddenly have been rejected by a segment of the population. Check the http headers of the file called in their webbug to see if a compact policy is in place.
I suggest selecting "expand menus" on the left before trying to follow our Observations:
Observation 1 (As described in previous posts): Select "repeat visitors" in the visitors column and see that (for example) on 23rd January 2001 the "New visitors" were 61% of all traffic. Now select 24th November 2002 in the date range. Back then new visitors were 92% of traffic! The change seemed to happen to many or all sites on or around 27th November.
Observation 2:
Select "Latest Visitors" from the visitors menu and maybe increase the list to the last 100. You will see that most visitors have one or two visits. A few have 5. But certain ones have 19 / 20 or even more. Certain colleges and if you see the NHS in the list are up in the hundreds. pol.co.uk is an AOL proxy by the way, which is one of our culprits. Hence my assumption that AOL users and others are no longer being identified as uniques - if an IP number is coming to the site many times per day it seems to be identified wrongly as a repeat visitor. We dumped webtrends for the same issue years back.
The issue only becomes more apparent on busier sites, but this one has enough traffic to make the point.
Over to the Statisticians and hitslink developers...
Dixon.
Er, it's Freeserve actually :)
However, your points are correct.
Apparently I'm getting someone from NTL who has now found our site from a search engine referral for the 137th time!
In fact, checking into this a little further, it appears that a staggering 20% of all my visitors are now visiting for the 26th or more time and this percentage is growing daily. Whereas it was previously always around the 1 or 2 per day figure. So, this appears to be getting worse - day by day!
Which is probably about the percentage of users in the UK using proxy based connections through major internet providers would you guess?
POL.co.uk ... I'll check when we've figured out how to fix the main issue. Should be talking to Jon at Hitslink by phone when he wakes up.
Dixon.
Hopefully he is scratching his head to work it out, but who knows.
Dixon.
I appreciate reading an open discussion on HitsLink - It gives me more insight to what our users are doing and what issues they face.
I would like to clear up one important thing - the price is based on page views only, not visitors. If there is an issue with higher visitor counts, it will not affect your price.
Also, you are correct about the algorithm change in November. We have a multi-step process to determine uniqueness, which I will explain here:
- Every page view drops a cookie with a unique visitor id
- On subsequent page views, we use the cookie value, if it exists
- If the cookie does not exist (due to a variety of reasons), we use the IP address for 24 hours
While not perfect (it is impossible to be perfect in this regard), it is a method we adopted due to:
- The proliferation of cookie-blocking firewalls
- The increase in cookie-blocking browsers
The toughest situation to detect unique visitors, which is unfortunately becoming more prevalent, is when an organization has both a cookie-blocking firewall AND uses a proxy.
The last time I checked, cookies were being blocked on around 6-7% of visitors, which is a significant increase.
I would like to take a look at some of your accounts to view the behavior - could you reply with your account ids?
I have emailed the main account IDs that are hurting through the internal email system.
You said: Yes the algorithm changed and you measure dialy uniques by:
- Every page view drops a cookie with a unique visitor id
- On subsequent page views, we use the cookie value, if it exists
- If the cookie does not exist (due to a variety of reasons), we use the IP address for 24 hours.
If that is how you measure it now, how did you measure it before the algorithm change? For quieter sites this method will work, but when the traffic hits 4,000 page views per day, all targeted at UK users, then treating an IP number as a unique users starts to become absurd, as many MAJOR ISPs re-use the IP number time after time as a dynamic IP. Moreover, (and I have to guess here) when someone clicks on a PPC result from a populer search, some ISPs seem to think hmm - cached on this IP number - I'll deliver the cached version. The result - we record an extra page view but not an extra visitor. Whether my interpretation is right or wrong, it is clear that on busy sites the stats are significantly out, whilst on quiter sites the error is not noticeable.
You also pointed out that this discrepancy was not costing us money. I can assure you that I know of five companies including ourselves where it is costing us thousands of dollars since we made deals with clients based on the old algorithm. I accept that you charge on page views, not uniques. However, WE are obliged to charge on daily uniques. This means we are now paying for Overture and similar traffic that we cannot charge on. Since Overture traffic almost always comes from the very ISPs that are using dynamic IPs, You can see the issue.
I would have thought that treating an IP number as unique for a shorter period would at least be more accurate. However, on really heavy sites I bet this would fall apart totally. I can't see the stats for MSN, but my guess is that some IP numbers are continually hitting the site even though every other hit is from a different user logging on with the same default home page. Makemetop's 13& visits from an NTL user VIA the same search engine referral seems incredibly unlike a cookie tracking issue.
Perhaps another question in your logic is this - if about 6-7% of users are not accepting cookies, why has the daily "new vistors" as a percentage of "returning visitors" dropped for high eighty percents to mid fifties percents?
Anyway - can you confirm how the new algorithm has changed from the previous alogorithm please? At least then we can have meaningful discussions with our clients about how we re-set the value of a daily unique.
Dixon.
My point was that it was 136 visits from an NTL user using different search engine referrals - hardly likely, I think! More likely to be 136 NTL different users coming in using the same IP over a 24 hour period.
A couple of points on this. We were able to identify the time of the problem as at the end of November, as this is when the stats started to become inaccurate. We were right in this supposition.
So, what changed in November? And whatever it was that changed, were the improvements enough to compensate for the fact that a number of people are now seeing inaccurate numbers?
When I first tested hitslink, it's ability to track proxies and unique visits accurately (as opposed to server log files) was what impressed me the most. This is now the very thing that's going wrong. My conclusion would be - use the old algorithm again, because it seemed to be better at achieving the objectives mentioned above than the new one...
Number of Visits--------Daily Unique Visitors
New Visitors------------11167
From 2-5 Visits---------2606
From 6-10 Visits--------569
From 11-25 Visits-------633
From 26-100 Visits------2931
Over 100 Visits---------0
So how can it be that I have 633 daily uniques this month by peopel comin up to 25 times but 2931 from people ccoming 26 times plus. Can only be repeating IP numbers from difrent users. Statistically implausible and I am betting that 100 days after 17th November 2002 that final "over 100 visits" will start jumping up day by day unless hitslink go back to the old calculation, whatever it was.
Of course, that 2931 is almost certainly many more than 2931 daily uniques who have only visited once or twice, but as the "repeat count" is set to 24 hours, we have no way to extrapolate back.
The comparable table for November was MUCH more believeable:
Number of Visits------Daily Unique Visitors
New Visitors----------18417
From 2-5 Visits-------2435
From 6-10 Visits------138
From 11-25 Visits-----196
From 26-100 Visits----83
Over 100 Visits-------0
Statistically speaking a lovely looking normal distribution curve I'm willing to bet.
Ergo - null hypothesis: The algorithm was correct before the update and is incorrect now.
Dixon.
January:
New Visitors 8726
From 2-5 Visits 1795
From 6-10 Visits 416
From 11-25 Visits 851
From 26-100 Visits 2094
Over 100 Visits 37
November:
New Visitors 17903
From 2-5 Visits 2112
From 6-10 Visits 116
From 11-25 Visits 148
From 26-100 Visits 397
Over 100 Visits 0
Something is certainly wrong here!