Forum Moderators: DixonJones

Message Too Old, No Replies

Page based stats vs log stats

pros and cons

         

SmallTime

2:24 am on Aug 23, 2002 (gmt 0)

10+ Year Member



I am starting a seo project for a reasonably large asp site, and they are discussing adding a page call web stats service (Hitbox). I have a predjudice against adding the code, the call to a third party server, etc, and have preferred log based programs. Am I wrong?

What are the pros and cons of stats packages called from the page vs running log analysis?

bill

4:37 am on Aug 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well to begin with, you won't be seeing any of the robots/spider traffic with a HitBox type solution...nor will you see anything of the visitors with JavaScript off...basically all you'll see is the browser traffic...and then only the people that stay on the page long enough for the 3rd party script to run.

Mark_A

4:52 am on Aug 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But bill with just basic log style stats you will not be seeing use of cached pages for example in google .. unless you rotate graphics.
A counter (though I agree not on a commercial site) would give you that.

SmallTime

4:54 am on Aug 23, 2002 (gmt 0)

10+ Year Member



Yes, and one would also need the logs to get referer info, so it is clearly a both, or just log based stats question- I can imagine that tracking viewer paths through the site could be easier with page-based stats. If the only benefit is that it is easier to sell to folks who want a slick report, I'll recommend against it. So, can page based stats do anything better?

Woz

5:05 am on Aug 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>But ... with just basic log style stats you will not be seeing use of cached pages for example in google

you do if you know to keep an eye out for the relevant IP numbers as the refferers. EG, go to Google, do a search on something, and then click on the "Cached" link. See the IP number in the address bar.

It will all be in your logs. Getting good answers is about knowing what questions to ask. Looking for cached requests is about knowing which IP to look out for.

Onya
Woz

fathom

7:12 am on Aug 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The plus for hitbox, you have nothing to do and someone else is determining increased performance, whereas log base you do all the work (not alot of wasted time but it still takes time).

I think the greatest negative is the amount of gludge code added to the page that can weigh them down particualrly in google. For argument sake this would likely be 1 or 2 positions at least and possibly the difference between being on page 1 or page 2.

I've used something similar in the past, but cleaned up their gludge and reduce the added file size by almost half.

Log based is best for individual site stats IMO but if I remember correctly doesn't hitbox also offer users their trend reports (a collection of 3 or 4 million sites statistics) this is a big plus on the marketing side.

I don't think I helped, eh! Sitting on the fence! :)

c3oc3o

11:42 am on Aug 23, 2002 (gmt 0)

10+ Year Member



> "Yes, and one would also need the logs to get referer info"
Not true, you can easily get the referrer through JavaScript or any server-side language.

> "nor will you see anything of the visitors with JavaScript off."
Also untrue - a good service will include a <noscript> part with an image, so that the visitor is counted anyway, even though some stats for him (like referrer) may be missing.

> "I think the greatest negative is the amount of gludge code added to the page that can weigh them down particualrly in google."
Depends on the service, I know one where you only need to insert two lines :)

True, you'll miss the robots, but you can gain info about screen resolution, window size, timezone etc, that you wouldn't be able to get from a log file.
The slowdown of loading remote scripts is the major disadvantage. You should place the code at the end of the page, outside any <table>s (which are only rendered in the browser when their entire content has loaded, so the rest would have to "wait" for the counter code) or in a positioned <div> placed at the end of the source code. Of course now you might miss a few hits if people click a link too quickly...

jm_uk

12:08 pm on Aug 26, 2002 (gmt 0)

10+ Year Member



Good post c3oc3o!

Client side code that is executed each time the page loads has the capability of providing you with more accurate stats. The reason for this is caching. The more static a site is the more of their content that can be cached by proxies etc. With client side code you can ensure that every page request reaches the tracking server.

This is also true when site users click the back and forward buttons. The pages are often retrieved from their local browser cache - with client side code the code is executed each time the page is loaded ensuring you can track their whole journey around the site accurately.

One major issue to consider is whether the tracking service sets a third party cookie. This is very common with this type of ASP tracking service. The potential downside for your site's customers is that they may not want a cookie set for a site that they do not know or trust. They are probably more likely to accept a cookie from your (first party) site rather then from a remote site.

One option is to do it yourself. You will need to create your own client side code (not too hard). You can then collect the behavioural data yourself (perhaps in a seperate directory on your web server to seperate it from the regular traffic logs). All you then need is to modify the raw log to turn it into something a log based analyser (like Analogue/WebTrends etc) will understand (use something like perl) and off you go.

That being said, the benefits of someone else dealing with all of the data and just providing you with reports should not be underestimated. It can take a lot of time and trouble to process raw logs into meaninful business information and the task becomes exponentially harder when you have very large data volumes.

In summary, whether you or a third party uses client side code you are likely to get a more accurate picture of site activity and you have the capability to capture additional information that a regular web log will not provide.

statomatic

3:33 am on Aug 27, 2002 (gmt 0)

10+ Year Member



Do you really want others to know about your web site activity? Who knows how 3rd parties use the information that they collect about your website, that's just something to think about for paranoid people :) Also, do you really want to rely on a remote server to be up and running to collect your stats and view reports? If you're running a log analysis tool, it's guranteed to report traffic for when your site is up and running. One more thing, some log based programs keep a history of your web site performance which you can back up, restore, and look back on, I don't think it's true with services like hitbox.

bill

6:26 am on Aug 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One of the big reasons I dropped 3rd party tracking from a lot of sites was the third party cookie issue you mentioned jm_uk. ...and welcome to WebmasterWorld IE6's default settings, which we know few users ever change, was showing the little Privacy Report red icons in the status bar regardless of my fiddling with the P3P on the sites...the 3rd party tracking had to go.

Do any of the commercial stats packages out there have tracking codes that you can put on your pages and run yourself? Writing a whole system like that could be a bit time consuming.

statomatic

7:07 am on Aug 27, 2002 (gmt 0)

10+ Year Member



Bill, how exactly do you want your pages tracked?
There's really no need for tracking codes if you have server log files. But if you dont, you can place image code, like HitBox uses, or SSI code to track the visitors. I wrote a program <snip>, there are 3 versions of it.

Most accurate is version that works with log files, then one that works through SSI, and least accurate, for reasons described above, is version that tracks visitors using img tag.

[edited by: Marcia at 9:02 am (utc) on Aug. 27, 2002]
[edit reason] TOS provisions [/edit]

bill

7:27 am on Aug 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jm_uk
Client side code that is executed each time the page loads has the capability of providing you with more accurate stats. The reason for this is caching. The more static a site is the more of their content that can be cached by proxies etc. With client side code you can ensure that every page request reaches the tracking server.
Logically, this sounds like a good way to get more accurate reports than a log analyzer would. However, I really haven't seen any systems working this way, so I don't have any real world experience to judge from.

statomatic
I wrote a program ... there are 3 versions of it. Most accurate is version that works with log files, then one that works through SSI, and least accurate ... is version that tracks visitors using img tag
I am curious why the SSI is considered less accurate than the logs. Could you clarify this a bit?

statomatic

7:44 am on Aug 27, 2002 (gmt 0)

10+ Year Member



If you have logs, they contain information about access to the whole site, images, html files, text files, etc. With SSI you can only track HTML pages, so if someone requests a non-html type file from the server SSI version will not see that request. You can also extract more info from the logs, like data transfer, status codes, file types requests, etc.

jm_uk

9:52 pm on Aug 27, 2002 (gmt 0)

10+ Year Member



I'm responding to Bill's question:

"Do any of the commercial stats packages out there have tracking codes that you can put on your pages and run yourself? Writing a whole system like that could be a bit time consuming."

Commercial stats software vendors are increasing starting to understand the value of client side data collection and are starting to provide this type of code to use with their analysis software. As far as the software vendors are concerned, this is just another way of capturing data to feed into their program.

There is no reason why a site cannot insert client-side code that makes requests to a server on the same domain as the web server, thus eliminating the little third-party cookie security alerts in IE6.

Stats vendors should give this code to customers along with either a. a pre-processing script to turn the resultant log output into a standard web log or b. a new version of their software that supports the client side data that has been captured.

I think it is only a matter of time before WebTrends ships its page dot code to all customers of the log analyser program so they can collect their data in this way.

Some of the higher end analysis packages already offer this (like NetGenesis).

bill

3:23 am on Aug 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for your input jm_uk. It sounds like a new way to approach website statistics to me. I guess now I'm wondering why this sort of combination server log/client-side code solution hasn't been used more. Is it that difficult to implement even on a consumer level package? It seems that a system like this could reduce or eliminate a lot of proxy cache issues that exist with server logs. Wouldn't this result in more accurate statistics?

Birdman

11:43 am on Aug 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just for the record, I was using one of those client-side, third party scripts. I only had them on the index and subcats pages. Just rremoved them during a remodel and have to say that I miss the stats I got from them. My log stats are ok, but I can't get the full referer url. (eg everything after ?) Now I can't track what keywords are being used at Google. I believe our host offers access to raw logs(upon request), so maybe that is what I need to do to get more accurate stats. As stated earlier, I just did'nt like the code bloat. I also liked the idea of running your own script.