Comparing Web Stats Programs

Forum Moderators: DixonJones

Message Too Old, No Replies

Comparing Web Stats Programs

Same site / different stats programs / much different results!

RTMac

1:08 am on Oct 8, 2004 (gmt 0)

I recently switched the Web stats program one of my clients was using from DeepMetrix LiveStats to AWStats.

I switched over because I found LiveStats to be EXTREMELY slow, and the stats it provided weren't very useful or informative.

Right now, both programs are analyzing the same Web site and LiveStats is reporting a huge difference in unique visits compared to awstats. Where awstats reported around 6700 visits, LiveStats reported 14,000. It works out to be inflated about 48%. Interestingly, the number of total Web hits is about the same, around 250,000.

Of course my client prefers the old numbers! I believe it's simply the fact that awstats is more accurate, but I'm just guessing.

Does anyone have any explaination for such a wide discrepancy?

ogletree

2:06 am on Oct 8, 2004 (gmt 0)

You can run several programs and get all kinds of numbers on the same stats. I'm sure the lower number is more acurate. It is impossible to get an acutate number of visitors without using cookies or some sort of tags. Logs were never designed to be used as they are today. The only number I even look at is how many visitors I get from se's. Who cares about those numbers anyhow. They are to be taken with a grain of salt anyhow. There are so many factors involved in determining how many visitors you have during a day. What is the def of a visitor? What about users who have ISP's that change their IP almost every time their computer makes an internet request or every min. (aol) There is no way to tell for sure what your real stats are. There are some security applications out there strip off all referer info. They just show up as no referer. Anybody who has looked at a lot of stat reports knows that there is no way there are that many people comeing to your site with no referer. Then their are the bots. You would be shocked how many of your visitors are really bots. If you look closely at the browser section you will see a bunch of different browsers. If you look at the host section you will see 1 ip that has hit your site that day for 10K hits.

dcheney

2:49 am on Oct 8, 2004 (gmt 0)

AWStats generally excludes search engine bots from the top line stats which can make a huge difference.

cgrantski

12:24 pm on Oct 8, 2004 (gmt 0)

There are a lot of other reasons for this kind of thing. There was a good thread or three on this but I guess they've been archived ... neither program is "wrong" but they're just dividing hits into visits differently. And, remember, when you look at the numbers with twice the visits, those visits show as being lower quality, i.e. half as long.

Matt Probert

4:55 pm on Oct 9, 2004 (gmt 0)

Right now, both programs are analyzing the same Web site and LiveStats is reporting a huge difference in unique visits compared to awstats. Where awstats reported around 6700 visits, LiveStats reported 14,000.

"You can't tell how many visitors you've had. You can guess by looking at the number of distinct hosts that have requested things from you. Indeed this is what many programs mean when they report "visitors". But this is not always a good estimate for three reasons. First, if users get your pages from a local cache server, you will never know about it. Secondly, sometimes many users appear to connect from the same host: either users from the same company or ISP, or users using the same cache server. Finally, sometimes one user appears to connect from many different hosts. AOL now allocates users a different hostname for every request. So if your home page has 10 graphics on, and an AOL user visits it, most programs will count that as 11 different visitors!"

From "How The Web Works" by Stephen Turner
[analog.cx...]

A document that is *very* highly recommended for an understanding of web site statistics based upon log file analysis.

Matt

cgrantski

4:31 pm on Oct 10, 2004 (gmt 0)

True, more or less, though I disagree about the "most programs" part. A lot of current software isn't as dumb as that paragraph suggests, because the mfgrs now understand those shortcomings and try to compensate to the extent possible. Note that I said "to the extent possible." Let's assume that RTMac would like to get an estimate of this statistic that's as good as possible, and that a non-perfect version of this statistic is, in fact, better than having no version of the statistic.

Given the shortcomings of what gets into logs, there are still many reasons why one perfectly good software package will produce different results from another perfectly good software package. We've talked about two different statistics, hits (possibly RTMac really means page views) and visits. RTMac says hits are about the same for the two stats packages, but visits are wildly different, so let's stick with that one measure: visits. And let�s assume the other measure really is hits and not page views; hits includes images, pdfs, .js files, .css files, and so forth.

Suppose program A filters out images etc before clumping hits into visits while program B does not. Then program A will show fewer visits in those cases where somebody else�s site is calling one of your images or scripts without actually sending visitors to your site. Program B will be counting any of these references to a non-page file as a visit and program A will not. I�ve seen a few cases where this kind of image makes an enormous difference.

Suppose program A doesn't count a visit if it consists of a 404 and nothing else, while program B still counts it. Program A will show fewer visits, especially if there's a 404 on a page referenced by a big-traffic link on another site.

Suppose program A considers a visit �closed� after 30 minutes of inactivity while program B uses 15 minutes. Then program A will show fewer visits because it will not be as likely to count a distracted person�s single visit as two visits.

Suppose program A is smart enough to know about AOL�s many proxy IPs and program B does not. Then program A will show fewer visits because an AOL visit won�t be broken up into lots of little visits. Of course, program A could also over-consolidate, which is why session cookies are nicer than IPs for determining visits. This is a big issue, affecting far more IP�s than just AOL�s.

Suppose program A combines the IP with the User Agent field in the logs, while program B just uses IP address. Then program A will show more visits because it will have less of the over-consolidation. In other words, it will be more able to detect different users coming from the same IP.

These are just a few of the more common reasons, and I can think of maybe 5 or 6 more. Both programs are doing their math correctly, but they are working with different assumptions and processes. The �correctness� of each number depends on what you want to account � do you want somebody who stops looking at your site for 16 minutes to be counted as 2 visits? Do you want to know about those other sites that are calling one of your graphics? Do you want to know about attempted visits that hit a 404 and quit?

All of this is one reason why, perhaps, you get what you pay for in a statistics program. I�m not saying that free or cheap ones are never as smart as not-so-cheap ones, but you have to admit that if you have revenue to pay for the extra programming required to deal with all this and the extra writing to document what�s going on, you have a somewhat better chance of producing a somewhat better program.

mincklerstraat

5:15 pm on Oct 10, 2004 (gmt 0)

To add to these complexities, there's also caching to be considered - if you have an html site or a site that produces smart cache headers, chances are many of your pages seen by visitors won't register a thing with your server since they're cached by that visitor's isp.

I like using a 1x1 pix non-cacheable image that's actually a php file for keeping stats - number of pageviews, which page, and referrer, sticks this into a db. Easy enough to implement by just adding a couple of lines to your page, I do it with js.

This also keeps track of the 'real referrers' which is nice - the actual urls of the site users come from - most stats packages I've come across don't re-assemble page request strings into actual url's, so you have no idea of which page off of google your visitor came from, even though you might know the search terms used.

claus

6:21 pm on Oct 10, 2004 (gmt 0)

I just finished four hours of writing on this issue for an internet industry body i do consulting for on these issues. Actually i've been working with these issues for seven years or so. I must say, that there's been some development during that time, and i agree that some stats manufacturers have become a lot better. There's still a lot of nonsense beliefs/statements/methods and generally useless cr*p around though (some even from very high priced and otherwise respected suppliers).

The best you can do is not to think of these figures as anything close to the number of human beings using your site. Ultimately, that's the figure we're all interested in, but logs as well as cookies simply fail to deliver, for a multitude of reasons.

I would recommend, though, to trust cookies over log files anytime. They will generally be closer to the truth, and (depending on specific methods) they might be the closest bet there is.

>> There was a good thread or three on this

In December, Receptional made a very nice post with several good points here:

How to track visitors [webmasterworld.com]

Do follow the links to earlier threads i posted in msg#2 - if you mangage to read through that collection of about 30 threads or so (minor threads are omitted, some of those selected are very long and very informative), you'll know quite a bit about the pitfalls, as well as alternative products/services.

RTMac

2:22 am on Oct 11, 2004 (gmt 0)

Who would have guessed a simple request about stats programs would result in such an informative discussion?

I feel as if I've just left a seminar on Web stats! Thank you to all of you that contributed.

Although I realize I've barely scratched the surface of all the variations and interpretations of Web site statistics, I can now give my client a much more educated reply to her questions about different results.

Thanks again!