Forum Moderators: DixonJones
Mach 5 is cheaper but Summary has more bells and whistles.
Your 11th President,
James K Polk
I'm a fan of NetTracker Pro, although it's not cheap. It allows drilling down through the data from top-level stats to individual visits. When you are analyzing visitor behavior, this is a great thing to have. WebTrends is not bad for simple stats - it has nice graphs, etc., and answers the basic questions. If I need something quick to show a client, WebTrends works rather nicely. It doesn't let you drill down easily, though. If you see that you got 50 referrals from Lycos yesterday, you can't click to see what they are. You'll have to rerun the analysis with a filter, which can get rather tedious. WebTrends isn't all that cheap, though, and for the money I'd lean towards NetTracker.
Wusage is fairly low cost and lets you do quite a bit of customization. The user interface isn't the best, and even though there is a configuration panel it still seems necessary to edit the configuration file. One good thing about Wusage: it makes it easy to keep daily, weekly, and monthly stats, which are accessed by clicking on a calendar display.
It's not for everybody, and can be pretty labor-intensive at first, but, if you run a db-based site, you have the skills to build your own web tracking stats reports. Just get your log files into a database, either as a table, or by linking (since they're so huge), and it's pretty simple to start writing your own queries to find out exactly what you want to know.
The problem I had with the packages I tried out was that, while they gave me some interesting data and pretty graphs about overall usage of my site, the information was far less rich (or not available) if I drilled down to analyze specific pages, sections, or features of our site.
For example, I wanted to see all the referers for a particular new PDF on our site, with a count of how many unique downloads we got from each referer. We put up new publications a couple times a month, so this is an important performance stat to track. But none of the packages I looked at had such a report.
As has been discussed elsewhere on this board, in most of these stats packages, a download is not always a download; a single user paging through a 50-page PDF in her web browser might be counted as 25 downloads instead of 1. If you're writing your own queries, you can "group by" the downloader's IP, the date, and the URL so those multiple 'downloads' only count as one.
The analysis tools were also stymied by the query strings in many of our urls, and, for reporting purposes, I wanted to be able to track and identify pages based on the names of our site's sections and subsections. Also, depending on where the user is coming from, there might be several different cs-uri-stems and query strings that represent the same page seen through different templates, different user settings, and so on. For example, the following urls could all be the same page:
cs-uri-stem cs-uri-query
/section5/s5_1.asp ? sub_id=4&user_sys=1
/section5/s5_1.asp ? &sub_id=4&user_sys=2
/section9/s9_1.asp ? sub_id=4&user_sys=1
cs-uri-stem cs-uri-query
/section4/s4_1.asp ? sub_id=3
/section4/s4_1.asp ? sub_id=9
Furthermore, I found the commercially available web stats packages could not really adapt to the unique architecture of our web site. As a result, I couldn't answer the specific questions the people in my organization were asking.
Finally, the big problem was that the reports the stats packages I tested produced tables that identified pages by their url -- which doesn't tell you in plain English what the page is.
I ended up creating a database that linked to our log files as an external data source, interpreted them as space-delimited text files so it could read them in as tables, I also linked to the MS SQL database that runs our web site and content management system. This enabled my to generate web site stat reports that were adapted to the particular architecture of our site, in plain English and that my colleagues could understand.
With SQL statements that parsed the cs-uri-stem and cs-uri-query, I could more accurately identify what content each hit represented. The SQL queries can pick out the variable values in the query string, then look up the section or subsection names of the content in our web site database.
This look up makes our publication download report much more understandable (not to mention accurate) than a simple Webtrends download report; instead of a gobbledygook url and PDF filename, they see the publication listed by title, author, publication date, (all looked up by the filename in a publication table) and ranked by unique downloads.
Also, by linking my reporting database to our content management system, I was able to create reports for various editors and contributors showing how their content is performing, and so on.
Basically, there's a world of cool stuff that you can do. I've probably spent about 80 hours creating queries and reports with this database over the last three months, but it is so much better than settling for what most commercial and shareware tracking software can do. And the people I work with have loved the data I've been able to show them.
(edited by: NFFC at 11:43 pm (utc) on Feb. 14, 2002)
I used the same method for quite a while but recently had to abandon it for one site because the log files are 95mb each month and my import cannot crunch that.
This talk of spider databases ealier is amusing for me because using your own database solution you can easily query for any visitor IP asking for a robots.txt file, identify it as an SE spider and query what it does seperately or exclude it from analysis of human traffic patterns.
There is no doubt that you have more flexibility now I just have to make it work with big big files :-) any tips for me?
BTW with PDF files watch out that your multiple loads are true multiple loads and not failed loads.
Do you compare download kb with what it should be to verify that the PDF loaded correctly before you count it as a successful read?
I hate pdfs sorry.
Only google likes them I dont feel users do!
Yeah, I didn't think to check the desv3.0's IP, but when I did, it didn't really tell me much (see my resp. in that thread). It WASN'T checking my robots.txt file...
As for the big file problem, my logs are about 100 megs a month, too -- and growing. But instead of importing them, in Access you can select Get External Data > Link Tables, then select your log file, and it will bring up a "Link Text Wizard" that will allow you to control how it reads the file. Then, you can write a make-table query to filter for the kind of hits you want to analyze (humans, people outside your organization, no images, etc.) so you can have a smaller, more workable 'log file' to work with. In other words, you only import the hits (or rows) that you are really interested in. Once you do that first filter, the queries run much faster.
Good point about checking for failed downloads. I hadn't thought of it, and it's yet another reason I can't stand PDFs either. But it's what 'the people' want.
Amazing how they zip up innit :-)
> instead of importing them, in Access you can select Get External Data Link Tables, then select your log file,
Indeed I use this for other things but had been importing logs. Thanks for that idea I have always got and loaded not linked.
I think I know what my problem is now, I tried linking with a 95mb log and failed but this was the time of windows virus attack which caused way way long url strings in an attempt to overcome the buffer (I am not techie enough to understand :-) but it sure affects the log files) anyhow I think the first few lines of the log are corrupted with these attacks and the access link wizard cannot sort them and then get to the rest of the data because the first lines are simply too long..
(PS Its not serious because we have urchin stats for this period)
Plus I have evidence in support of the theory because smaller logs from other times load or link rather no problem but contain these lines of Win attacks only within the file not at the start.
(BTW Unix server so no harm done)
> Good point about checking for failed downloads. I hadn't thought of it, and it's yet another reason I can't stand PDFs either. But it's what 'the people' want.
I checked a log file for a client who was raving about their pdf and found lots of examples of the same users trying 9 or more times to download one datasheet yet when I looked at the kb delivered they never got the full document.
Particularly a problem on that site because they link to pdf files as if they were an html page. People almost certainly expect to see the download bar and get at least something quickly .. with pdf they see a big chunk of time when nothing appears to happen at all while the reader is being spawned and the file loaded no download indication in the browser at all.
Mark A wrote:
> I tried linking with a 95mb log and failed but this
> was the time of windows virus attack which caused way way
> long url strings in an attempt to overcome the buffer
Yeah, that damn admin virus created some headaches. I cleansed my logs in Textpad by doing search/replace for the long url string in its hits with shorter 'slug' text. Then I had no problem linking to the log in Access. Only had to do this for the REALLY long strings.