homepage Welcome to WebmasterWorld Guest from 54.197.189.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

This 41 message thread spans 2 pages: < < 41 ( 1 [2]     
How accurate/reliable is your web analytic tool?
kellyC




msg:889933
 8:46 pm on Jun 26, 2006 (gmt 0)

Hi All,
We use webtrend 8.0 and found the number tracked is constantly off from the number we could track from google adword, overture, shopping.com, nextag, pricegrabber........ Sometimes, when the number is small, it could track up to 100%. Sometimes, Webtrend tracks only about 50%-80%.

We used ClickTrack before and it is similar situation, track 50-80%. It is normal? How much the your web analytic tool could track?

 

gregbo




msg:889963
 4:18 am on Jul 3, 2006 (gmt 0)

I get 15K visitors a day, and increasing, would it work for 1M visitors, no clue

This is very low traffic compared to the types of sites I'm thinking of.

How does one determine what is an IP address that humans don't use?

It's not too hard to determine these things, even with IPs changing hands, but it's too complex to go into and will be off topic for this thread.

Acutally, I think this is very much on topic for this thread. Anything that sheds light on what is/isn't fraudulent should be of interest to anyone who's concerned about the accuracy of tracking software. I don't agree with your claim as it flies in the face of knowledge I've had over many years of Internet architecture, and specific cases of having to deal with the implications of the exchange of IP addresses on access filtering.

incrediBILL




msg:889964
 8:25 am on Jul 3, 2006 (gmt 0)

I don't agree with your claim as it flies in the face of knowledge I've had over many years of Internet architecture, and specific cases of having to deal with the implications of the exchange of IP addresses on access filtering.

If it was that simple to explain it wouldn't even be an issue and everyone would be doing it. Instead, it's the scope of a whitepaper, way too much to explain in a thread.

I know all about internet architecture, been dealing with it's failings since about '90 and I know everything can be faked to a degree.

However, you take all of that into account, and come up with the best scenario for protection possible, and roll the dice and see what happens. Of course a few false positives happen and of course a few bad guys slip thru the cracks, but it's the high percentage of direct hits that make it worthwhile. Nothing you can ever do on the internet is 100% but there is a lot of things you can do to make sure it's not handed to the malicious netizens on a silver platter.

It's like virus protection software, you do the best you can and that's good enough most of the time.

gregbo




msg:889965
 9:50 pm on Jul 3, 2006 (gmt 0)

If it was that simple to explain it wouldn't even be an issue and everyone would be doing it. Instead, it's the scope of a whitepaper, way too much to explain in a thread.

If you ever do write a whitepaper, I'd be interested in reading it.

I know all about internet architecture, been dealing with it's failings since about '90 and I know everything can be faked to a degree.

However, you take all of that into account, and come up with the best scenario for protection possible, and roll the dice and see what happens.

Hmmm ... I guess I am somewhat wary of approaches such as yours for a few reasons. Some I gave earlier (lack of scalability, too much manual overhead, etc.) I wouldn't feel comfortable blocking access to the next popular browser, for example. Rather, I might suggest to my clients that we establish some thresholds that we consider reasonable, and if some activity exceeds those thresholds, we block the entities engaged in that activity. Thus, the person who's surfing the web from inside the datacenter isn't penalized, nor is the developer of the exciting new browser, etc.

incrediBILL




msg:889966
 10:15 pm on Jul 3, 2006 (gmt 0)

Thus, the person who's surfing the web from inside the datacenter isn't penalized, nor is the developer of the exciting new browser, etc.

Yeah, well, the problem with that approach is you also let all the proxy servers hosted by a datacenter hit your site which allows them to hijack you SERPs in Google via cloaked directory crawl-thrus, allows click frauds and competitors to access you without being tracked, and worse. By the time you've figured out bad behavior all the damage is already done which is why I adopted a proactive approach instead of a reactive approach and things have been improving dramatically.

It's much less manual maintenance blocking datacenters than chasing all the creepy crawlers, scrapers, proxies and other schume they host, that much I'm positive about. The techniques I'm using are way more scalable than a bulky .htaccess or a firewall list, since I'm using a database, because the lists in the web server and firewall are scanned linearly and too much data in those lists (3,000 entries plus) chokes the server and causes serious performance degradation.

gregbo




msg:889967
 10:53 pm on Jul 3, 2006 (gmt 0)

Yeah, well, the problem with that approach is you also let all the proxy servers hosted by a datacenter hit your site which allows them to hijack you SERPs in Google via cloaked directory crawl-thrus, allows click frauds and competitors to access you without being tracked, and worse.

Actually, this is the point of having policy discussions. Filtering takes place only when thresholds are reached. There may be click fraud but it falls within a range that has been considered acceptable by the people who are paying for the service.

I've actually proposed methods such as yours to filter out undesirable traffic. However, I've always warned that as a result of such measures, I could not guarantee that we would lose no business.

incrediBILL




msg:889968
 12:24 am on Jul 4, 2006 (gmt 0)

I could not guarantee that we would lose no business.

I can guarantee that leaving a site wide open does cost business, so it's a catch-22 of being proactive and controlling your information or being reactive and waiting days/weeks/months for issues to be corrected in search engines.

When done properly, false positives of legitimate visitors is an insignificant factor and there are ways of allowing them to resolve the false positive so nobody is left behind.

Demaestro




msg:889969
 4:03 pm on Jul 4, 2006 (gmt 0)

I like to use webolizer.

The info is still in a some what raw state, and requires a little more patience while trying to decern the info. But it's upside is that all it does is parse through the raw server logs and presents totals.

It doesn't discount bots, or anything like that and for that reason I like it. Couple that with urchin or what ahve you and between the 2 you should have everything you need, I feel like I do and any extra tracking I want to do I set up server side code for which I find to be the most reliable.

Draconian




msg:889970
 8:58 pm on Jul 4, 2006 (gmt 0)

Nothing you can ever do on the internet is 100%

Mind if I correct that for ya? "Nothing is 100%"

Gregbo, incrediBILL, we all appreciate your insight and you both have very valid points. It's a matter of philosophy. However, you're both taking a "reactive approach" to one another's posts, so either continue in a more professional matter, sticky mail, or get a room. ;)

incrediBILL




msg:889971
 4:05 am on Jul 5, 2006 (gmt 0)

you're both taking a "reactive approach" to one another's posts

That's called a debate and Gregbo did raise a couple of valid points that made me re-examine how I'm dealing with a couple of issues.

It's nice to hear someone with differing viewpoints as I often work alone, in a vacuum really, and bouncing ideas off others is very helpful.

Thanks Gregbo ;)

timster




msg:889972
 8:27 pm on Jul 5, 2006 (gmt 0)

so either continue in a more professional matter, sticky mail, or get a room.

And that's why they call him "Draconian."

gregbo and incredibill, your posts here made a great point-counterpoint that helped me get a better grasp on this subject which I (and a lot of webmasters) should not ignore any longer. Thanks.



How does one determine what is an IP address that humans don't use?

It's not too hard to determine these things, even with IPs changing hands, but it's too complex to go into and will be off topic for this thread.

But that's a thread I'd love to read. I'm pretty sure scrapers and scumbots are coming to my sites in force, but I don't have any real plan on how to stop them. I googled the subject but haven't found a workable Step #1.

incrediBILL




msg:889973
 7:07 am on Jul 6, 2006 (gmt 0)

I don't have any real plan on how to stop them

You can only stop about half of the nonsense with the tools the webservers provide.

The rest is roll-your-own scripts at this point.

AlexK has one thread about stopping some of this in WebmasterWorld:
[webmasterworld.com...]

There's a link the AlexK's current script on his server somewhere around here....

This 41 message thread spans 2 pages: < < 41 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved