How accurate/reliable is your web analytic tool?

Forum Moderators: DixonJones

Message Too Old, No Replies

How accurate/reliable is your web analytic tool?

kellyC

8:46 pm on Jun 26, 2006 (gmt 0)

Hi All,
We use webtrend 8.0 and found the number tracked is constantly off from the number we could track from google adword, overture, shopping.com, nextag, pricegrabber........ Sometimes, when the number is small, it could track up to 100%. Sometimes, Webtrend tracks only about 50%-80%.

We used ClickTrack before and it is similar situation, track 50-80%. It is normal? How much the your web analytic tool could track?

cgrantski

10:09 pm on Jun 26, 2006 (gmt 0)

There are lots of reasons for differences. To understand them you need to know how each is collecting data, identifying target events, and how it is counting what it collects. Which Google statistic are you using (clicks, I assume) and which WebTrends or ClickTracks stat (visits?). Where are you getting your WT, CT data from? JS tags, logs? From the referrer field or from a marker in the URL? And so on.

For example, differences that large would be consistent with tracking via log files and the referrer field and using the visits statistic, while the venue is tracking clicks.

kellyC

10:29 pm on Jun 26, 2006 (gmt 0)

Hi Sticky,

Thanks for your quick response. We use tracking codes in all of our campaign URLs. So whenever, a visitor click on them and came to our site, it should be tracked. We look at visits, click-throughs and conversions ( orders) and revenue.

Take google aword for example, WT track 109% of clicks (guess this is because we are not billed for mutiple clicks a visitor clicks at a short period of time). Yet WT only tracked 73% of conversions that adword shown us for the month of May. It is similar for our overture campaign. What do you think could cause this?

AjiNIMC

2:26 pm on Jun 27, 2006 (gmt 0)

Some differences happens because of Time Zone differences but overall help me understand your tracking system.

You mean you place your url like http://example.com/product.php?gtse=goto&gtkw=key+word

where gtse indicates the Search Engine where you have advertised your website. This is different for each search engine. For example for Google USA it is goog and for Overture/Yahoo USA it is ovus.
gtkw indicates the related keyphrase for the specific ad campaign. This needs to be attached to the rest of the string.

And then what are you doing?

If the fraud clicker has disabled js and referral then you need a better tracking system and a better engg :). I know how difficult it was when we worked on it.

AjiNIMC

[edited by: engine at 2:03 pm (utc) on July 4, 2006]
[edit reason] examplified [/edit]

SiteTutor

6:33 pm on Jun 27, 2006 (gmt 0)

I am also looking at getting a more accurate handle of a users onsite experience.

I utilize monster commerce for my ecommerce platform and was wondering if anyone uses any plug-in program to better analyze on site traffic.

The numbers I get from urchin that is provided by monster commerce and google analytics are very different.

Just trying to find out if there is something out there people have used and really like. Thanks.

kellyC

2:04 am on Jun 28, 2006 (gmt 0)

Hi AjiNIMC,

Thank you for your help.

We put the traking link like this:
http://example.com/product.php?WT.mc_id=campaignID

For example, the link for our IPOD camaping in Google Adwords will be like this: http://example.com/product.php?WT.mc_id=googleadwordsIPOD

And we put a javascript code on our web pages for tracking. There are several options in webtrends for us to choose, such as "Track Sessions for Logs", "Track Sessions Using First Party Cookies", and "Track User Sessions using IP/User Agent".
We choose to track by using first party cookies.

About the fraud clicker, do you mean that if they diable js, we can't see those visits/clicks in webtrends report but Adwords will still count those clicks and there will be a gap? Yet, if so, I think we can't do anything about it so far.

[edited by: engine at 2:04 pm (utc) on July 4, 2006]
[edit reason] examplified & de-linked [/edit]

cgrantski

2:39 am on Jun 28, 2006 (gmt 0)

KellyC, run a WebTrends URL parameter analysis on the parameter WT.mc_id and look at the hits stat as well as the visits stat. The hits stat could be a lot higher (not necessarily, but could be). The WebTrends hits stat is closer to the way Google and other services count clicks.

If somebody clicks on an ad to get to your site then goes back to the search page after their visit, and clicks again, Google counts it as two while WT counts it as one, i.e. the one that happened at the very beginning of the visit. I've seen the difference be as low as a few percent and as high as 20 percent. It's worth a look.

kellyC

7:14 am on Jun 28, 2006 (gmt 0)

cgrantski, thanks!

I haven't compared the WebTrends "hits" stat with Google "clicks", but if so, the number of orders should be close, right?
But in webtrends, there is only 73% of conversions that adword shown us?

AjiNIMC

12:06 pm on Jun 28, 2006 (gmt 0)

http://example.com/product.php?WT.mc_id=campaignID

Since you are using js for tracking, people can easily disable js and then your system fails. Try checking the log files and see how many people are visiting this http://example.com/product.php?WT.mc_id=campaignID page every day.

About the fraud clicker, do you mean that if they diable js, we can't see those visits/clicks in webtrends report but Adwords will still count those clicks and there will be a gap?

Yes since adsense has counted a click and your tracking system failed due to js disable.

Yet, if so, I think we can't do anything about it so far.

Yes you can do many things :).

[edited by: engine at 2:06 pm (utc) on July 4, 2006]
[edit reason] examplified & de-linked [/edit]

cgrantski

3:46 pm on Jun 28, 2006 (gmt 0)

Regarding the number of conversions (absolute number, right? not a conversion rate?) I believe Google counts a conversion even if it happens in a different visit, AFTER the Google visit, even if the different visit was NOT through Google.

You'd have to set up WebTrends to capture the same kind of latent effects, and they'd both have to have the same latency period i.e. number of days before the ad no longer will get credit for a purchase. Also I think WebTrends will only give credit to a visitor's most recent campaign. Google on the other hand doesn't care or know about other campaigns that the visitor may have responded to since the Google Adwords visit.

AjiNIMC

4:23 pm on Jun 28, 2006 (gmt 0)

All PPC trackers do check the activity of an user across various campaigns.

cgrantski

4:28 pm on Jun 28, 2006 (gmt 0)

Ah. Can you explain that more? I need education.

kellyC

4:40 pm on Jun 28, 2006 (gmt 0)

Hi Cgrantski,

Yes, we are aware that our webtrend setting tracks last source ( the last tracking code customer clicks before entering our site) and Google adword or other campaigns tracks whoever clicks the campaign link ( and usually cookie duration is 30 days). Yet do you think this could make up to 20-50% difference in conversions?

AjiNIMC

5:31 pm on Jun 28, 2006 (gmt 0)

Ah. Can you explain that more? I need education.

One of the challenge any PPC campagin tracker faces is that of identifying the PC and the user uniquely. There are companies running fraud campagins promising $100 per part time team member.

They visit the page using

Different IPs
Different browsers
JS disabled, Cookies disabled
and few more variations.

Also these fraud clickers are sometimes hired by your competitor, who is aware of most of your campagins. You loss is his/her gain. Say if you run a campagin with overture and another with Google, these clicker army will click only few times everyday with various combinations.

To explain everything I will have to write few more para, but there is a continues fight between your tracking company and fraud clickers. We studied skewness of clocks, various component identification and many more things to track them properly.

incrediBILL

8:24 pm on Jul 1, 2006 (gmt 0)

I can shed some additional light on this problem.

If you have a traffic tool just trying to analyze actual visitors, and not robots, then Google Analytics and other traffic tools tracking from the web page is probably closer to the truth.

There are a lot of stealth spiders and web scrapers out there that use actual browser user agents which on a report from a native hosted web analyzer would show them as visitors, with LOTS of web pages, when in fact they are spiders and scrapers and ZERO VALUE.

Those stealth spiders and scrapers also don't run javascript, which may make you think the quantity of visitors with disabled javascript is higher than normal.

AjiNIMC

3:24 am on Jul 2, 2006 (gmt 0)

good point incrediBILL,

That is where we need AI, when we designed the model we separated one module as brain. Brain ate all the data and gained intelligence every single day. Only intelligent system can distinguish such visits. In fact we had various layers of intelligence defined for the brain.

AjiNIMC

gregbo

5:15 am on Jul 2, 2006 (gmt 0)

That is where we need AI, when we designed the model we separated one module as brain. Brain ate all the data and gained intelligence every single day. Only intelligent system can distinguish such visits. In fact we had various layers of intelligence defined for the brain.

AI systems can't distinguish between a person and a program that has been designed to create clickstreams that people would create. This is (among other reasons) why the click fraud problem is so difficult.

AjiNIMC

6:15 am on Jul 2, 2006 (gmt 0)

AI systems can't distinguish between a person and a program that has been designed to create clickstreams that people would create. This is (among other reasons) why the click fraud problem is so difficult.

When I say AI, I mean a system designed to track trends and derive needed conclusions. This can never be 100% accurate but certainly can filter a good percentage.

incrediBILL

9:10 am on Jul 2, 2006 (gmt 0)

AI systems can't distinguish between a person and a program that has been designed to create clickstreams that people would create. This is (among other reasons) why the click fraud problem is so difficult.

Hate to burst your bubble, but it's as simple as interjecting a CAPTCHA of sorts after X number of page views. The bots can't respond to the CAPTCHA, they are busted, problem solved.

I've been doing this for 6 months now and it works like a charm.

AjiNIMC

10:05 am on Jul 2, 2006 (gmt 0)

Hate to burst your bubble, but it's as simple as interjecting a CAPTCHA of sorts after X number of page views. The bots can't respond to the CAPTCHA, they are busted, problem solved.
I've been doing this for 6 months now and it works like a charm.

Not at my customer risk and when it comes to a general tracking tool I have not see anyone doing this. I rather go 100 bots unnoticed that troubling my one customer/visitor.

For PPC campagin there is a setting which does something similar based on different algo.

incrediBILL

7:41 pm on Jul 2, 2006 (gmt 0)

Not at my customer risk and when it comes to a general tracking tool I have not see anyone doing this. I rather go 100 bots unnoticed that troubling my one customer/visitor.

I would suggest redefining the concept of "customer risk" as allowing a 3rd party to use your own information against you in the search engines allows customers to find THEM before they find YOU. This happens every day, it's what AdSense has caused by incentivizing scraper sites to spam the search engines.

Besides, out of 15K visitors a day, less than 100 typically get challenged for behaving like a robot and most of them turn out to be robots.

Trust me, I don't want to annoy my customers either, and try to minimize it, but when you've had your big cash keywords hijacked a couple of times by AdSense scrapers or have been redirect hijacked by a proxy server, you'll be singing a different tune.

aspdaddy

8:28 pm on Jul 2, 2006 (gmt 0)

Even with AI you cant cover everything. I did R&D about 5 years ago and one of the problems I solved is called the sucessive search phenomenon, where a single clickstream contains muliple trips to Google using different keywords to search and return t othe same site, it happens a lot in B2C shopping. Most commercial products (at that time) defined this as multiple sessions as it contained >2 exit and entry records for a single domain.

Thats just one example, there are hundereds of flaws with web analytics, even the ones using AI.

gregbo

9:46 pm on Jul 2, 2006 (gmt 0)

Hate to burst your bubble, but it's as simple as interjecting a CAPTCHA of sorts after X number of page views. The bots can't respond to the CAPTCHA, they are busted, problem solved.

I am referring specifically here to systems that analyze clickstreams, not systems that have different user experiences.

gregbo

9:53 pm on Jul 2, 2006 (gmt 0)

When I say AI, I mean a system designed to track trends and derive needed conclusions. This can never be 100% accurate but certainly can filter a good percentage.

Yes, that's what I thought you meant. There are certainly patterns that an AI system can be taught to recognize, such as repeated IPs (or accesses concentrated within IP blocks), dictionary attacks, and the like. The problems begin when the usage pattern is not so predictable, such as that which some casual user does. The fraudsters are aware of this, and design their bots to operate as if a human being was generating the traffic.

incrediBILL

10:41 pm on Jul 2, 2006 (gmt 0)

The fraudsters are aware of this, and design their bots to operate as if a human being was generating the traffic.

Yup, and I've been stopping a lot of the fraudsters, they aren't as clever as they think.

Allowing open PROXY servers and CGI PROXY sites access to your server is a major point of vulnerability as those playing games don't want to be caught so you need to block 'em all.

And proxy servers explain:

# Different IPs
# Different browsers
# JS disabled, Cookies disabled

A single person or automated process using proxies can pull this off, which is why they should be blocked.

gregbo

11:05 pm on Jul 2, 2006 (gmt 0)

Yup, and I've been stopping a lot of the fraudsters, they aren't as clever as they think.

I'm sure you have caught some fraudsters. What I'm trying to say is that there is fraud that noone can catch (merely by looking at traffic logs, click logs, etc.) because it looks like ordinary traffic. Perhaps you can catch them if they're engaged in some other type of fraudulent activity and the authorities are tipped off about it.

Allowing open PROXY servers and CGI PROXY sites access to your server is a major point of vulnerability as those playing games don't want to be caught so you need to block 'em all.

There are fraudsters who are aware of this, and set up their own proxy servers.

And proxy servers explain:
# Different IPs
# Different browsers
# JS disabled, Cookies disabled

Actually, these things can be done without proxy servers. In fact individuals with appropriately written software can mimic the behavior of any existing user agent, or even create new user agents. Are you going to block a user agent you never heard of? (What if it's someone who's creating the next Firefox?)

incrediBILL

12:18 am on Jul 3, 2006 (gmt 0)

Are you going to block a user agent you never heard of?

I already do that, I whitelist all access to my server based on user agents and MSIE, FireFox and Opera get thru as well as 5 major search engines and EVERYTHING else is bounced.

No, I don't site around blocking things, they simply never get to my content in the first place unless I let them. I get a report of anything new trying to access my site and review who/what they are to see if I should let them in next time.

Only several major search engines and well behaved "browsers" are allowed.

There are fraudsters who are aware of this, and set up their own proxy servers.

Which is also why my web site checks all inbound IP's for an open proxy as well as blocking the known list, and I block accesses that come from many server hosting farms as well for the very fact people use cheap hosting to put various automated tasks and proxies online.

Basically, if it's coming from an IP address that humans don't use, like hosting farms, or where it's vulnerable like proxies, I block it.

gregbo

2:42 am on Jul 3, 2006 (gmt 0)

I whitelist all access to my server based on user agents and MSIE, FireFox and Opera get thru as well as 5 major search engines and EVERYTHING else is bounced.

I imagine this policy would reduce some click fraud but it doesn't work in the general case, and you could very well be turning away legit, even potentially converting traffic.

No, I don't site around blocking things, they simply never get to my content in the first place unless I let them. I get a report of anything new trying to access my site and review who/what they are to see if I should let them in next time.

This doesn't scale for a heavily trafficked site.

Which is also why my web site checks all inbound IP's for an open proxy as well as blocking the known list, and I block accesses that come from many server hosting farms as well for the very fact people use cheap hosting to put various automated tasks and proxies online.

IP addresses change hands. Companies are bought and sold; infrastructures are merged. What might be registered to a server farm one day might be registered to a block of broadband users the next day.

Basically, if it's coming from an IP address that humans don't use, like hosting farms, or where it's vulnerable like proxies, I block it.

How does one determine what is an IP address that humans don't use? (I'm restricting this to the set of addresses that are on publicly routable networks, not the private or link-local address spaces.)

gregbo

2:51 am on Jul 3, 2006 (gmt 0)

I should also point out that the HTTP user agent, referrer, and other things that come over in the HTTP request can be easily faked.

incrediBILL

3:38 am on Jul 3, 2006 (gmt 0)

can be easily faked.

That's why the search engines are whitelisted by IP address.

This doesn't scale for a heavily trafficked site.

I get 15K visitors a day, and increasing, would it work for 1M visitors, no clue

How does one determine what is an IP address that humans don't use?

It's not too hard to determine these things, even with IPs changing hands, but it's too complex to go into and will be off topic for this thread.

Trust me, been doing this almost a year now, and the false positives are VERY low, definitely less than 0.1%

This 41 message thread spans 2 pages: 41