need a program to read log files

Forum Moderators: DixonJones

Message Too Old, No Replies

need a program to read log files

something that runs on my desktop

roscoepico

5:38 pm on Jan 16, 2002 (gmt 0)

Anyone have any suggestions? I want to be able to import my log files into this program as opposed to it running on my servers(already got one of dem)

hasbeen

7:05 pm on Jan 16, 2002 (gmt 0)

Analog [analog.cx] is free and pretty decent.

Summary [summary.net] costs US$60 and is really good. I just downloaded their trial version and I'm loving it.

Use the site search and you'll find a ton of references/preferences.

ROLAND_F

8:49 pm on Jan 19, 2002 (gmt 0)

I use Webalizer. It work on various Unix and Win32.

Marcia

9:18 pm on Jan 19, 2002 (gmt 0)

Just to be 100% sure, those run on the desktop rather than the server? That would be very convenient to have.

oilman

1:05 am on Jan 20, 2002 (gmt 0)

I generally use Mach 5 Analyzer [mach5.com]. It's freakin fast. Without the DNS lookup it can crunch logs faster than anything I've ever seen. Recently though it was choking on some malformed IIS logs I was getting and wouldn't process them so I gave Summary a shot and it handled them no problem. Summary also has a decent predefined bots list and can spot possible bots. I'm considering switching.

Mach 5 is cheaper but Summary has more bells and whistles.

tedster

1:36 am on Jan 20, 2002 (gmt 0)

I use the Mach5 also (aka FastStats). The report filters are quite flexible and I can usually spotlight just the stuff I want with little trouble.

Yep, the pre-configured bots list is a bit sparse, but you can beef it up with custom entries very easily.

minnapple

2:40 am on Jan 20, 2002 (gmt 0)

I use OpenWebScope, works well on log files that are copied from your server to your hard drive.

hasbeen

7:40 am on Jan 20, 2002 (gmt 0)

Sorry, Marcia...

Yep, both analog and summary run on the desktop. Just download the raw log files and presto.

mayor

7:14 am on Feb 6, 2002 (gmt 0)

I've tried Webalizer and Summary. They both require too much hacking to get them to work for me.

I want to spend my time analyzing reports, not hacking to get reporting software to work.

Can anyone recommend a good log analysis program that runs on the Windows desktop right out of the box?

James K Polk

6:24 pm on Feb 7, 2002 (gmt 0)

I use WebTrends and it is really a great program. I can customise it for just about anything. It has been a big help.

Your 11th President,

James K Polk

rogerd

4:34 pm on Feb 8, 2002 (gmt 0)

Welcome to WMW, Mr. President... it's good to see political leaders from another era taking an interest in today's technology.

I'm a fan of NetTracker Pro, although it's not cheap. It allows drilling down through the data from top-level stats to individual visits. When you are analyzing visitor behavior, this is a great thing to have. WebTrends is not bad for simple stats - it has nice graphs, etc., and answers the basic questions. If I need something quick to show a client, WebTrends works rather nicely. It doesn't let you drill down easily, though. If you see that you got 50 referrals from Lycos yesterday, you can't click to see what they are. You'll have to rerun the analysis with a filter, which can get rather tedious. WebTrends isn't all that cheap, though, and for the money I'd lean towards NetTracker.

Wusage is fairly low cost and lets you do quite a bit of customization. The user interface isn't the best, and even though there is a configuration panel it still seems necessary to edit the configuration file. One good thing about Wusage: it makes it easy to keep daily, weekly, and monthly stats, which are accessed by clicking on a calendar display.

nonprof webguy

11:27 pm on Feb 14, 2002 (gmt 0)

Dissatisfied with several web-based log analysis tools I tried out, (e.g. Webtrends, Urchin, LiveStats) I created my own analysis reports with MS Access. Based on this experience, I'd suggest trying the do-it-yourself approach.

It's not for everybody, and can be pretty labor-intensive at first, but, if you run a db-based site, you have the skills to build your own web tracking stats reports. Just get your log files into a database, either as a table, or by linking (since they're so huge), and it's pretty simple to start writing your own queries to find out exactly what you want to know.

The problem I had with the packages I tried out was that, while they gave me some interesting data and pretty graphs about overall usage of my site, the information was far less rich (or not available) if I drilled down to analyze specific pages, sections, or features of our site.

For example, I wanted to see all the referers for a particular new PDF on our site, with a count of how many unique downloads we got from each referer. We put up new publications a couple times a month, so this is an important performance stat to track. But none of the packages I looked at had such a report.

As has been discussed elsewhere on this board, in most of these stats packages, a download is not always a download; a single user paging through a 50-page PDF in her web browser might be counted as 25 downloads instead of 1. If you're writing your own queries, you can "group by" the downloader's IP, the date, and the URL so those multiple 'downloads' only count as one.

The analysis tools were also stymied by the query strings in many of our urls, and, for reporting purposes, I wanted to be able to track and identify pages based on the names of our site's sections and subsections. Also, depending on where the user is coming from, there might be several different cs-uri-stems and query strings that represent the same page seen through different templates, different user settings, and so on. For example, the following urls could all be the same page:


 cs-uri-stem  cs-uri-query 
 /section5/s5_1.asp ? sub_id=4&user_sys=1 
 /section5/s5_1.asp ? &sub_id=4&user_sys=2 
 /section9/s9_1.asp ? sub_id=4&user_sys=1

And the following urls might be different pages:


 cs-uri-stem  cs-uri-query 
 /section4/s4_1.asp ? sub_id=3 
 /section4/s4_1.asp ? sub_id=9

But the stats software I looked at couldn't figure that out, and couldn't be easily configured to figure that out.

Furthermore, I found the commercially available web stats packages could not really adapt to the unique architecture of our web site. As a result, I couldn't answer the specific questions the people in my organization were asking.

Finally, the big problem was that the reports the stats packages I tested produced tables that identified pages by their url -- which doesn't tell you in plain English what the page is.

I ended up creating a database that linked to our log files as an external data source, interpreted them as space-delimited text files so it could read them in as tables, I also linked to the MS SQL database that runs our web site and content management system. This enabled my to generate web site stat reports that were adapted to the particular architecture of our site, in plain English and that my colleagues could understand.

With SQL statements that parsed the cs-uri-stem and cs-uri-query, I could more accurately identify what content each hit represented. The SQL queries can pick out the variable values in the query string, then look up the section or subsection names of the content in our web site database.

This look up makes our publication download report much more understandable (not to mention accurate) than a simple Webtrends download report; instead of a gobbledygook url and PDF filename, they see the publication listed by title, author, publication date, (all looked up by the filename in a publication table) and ranked by unique downloads.

Also, by linking my reporting database to our content management system, I was able to create reports for various editors and contributors showing how their content is performing, and so on.

Basically, there's a world of cool stuff that you can do. I've probably spent about 80 hours creating queries and reports with this database over the last three months, but it is so much better than settling for what most commercial and shareware tracking software can do. And the people I work with have loved the data I've been able to show them.

(edited by: NFFC at 11:43 pm (utc) on Feb. 14, 2002)

Mark_A

12:28 am on Feb 15, 2002 (gmt 0)

Hi nonprof webguy

I used the same method for quite a while but recently had to abandon it for one site because the log files are 95mb each month and my import cannot crunch that.

This talk of spider databases ealier is amusing for me because using your own database solution you can easily query for any visitor IP asking for a robots.txt file, identify it as an SE spider and query what it does seperately or exclude it from analysis of human traffic patterns.

There is no doubt that you have more flexibility now I just have to make it work with big big files :-) any tips for me?

BTW with PDF files watch out that your multiple loads are true multiple loads and not failed loads.

Do you compare download kb with what it should be to verify that the PDF loaded correctly before you count it as a successful read?

I hate pdfs sorry.
Only google likes them I dont feel users do!

nonprof webguy

8:44 pm on Feb 15, 2002 (gmt 0)

Mark A:

Yeah, I didn't think to check the desv3.0's IP, but when I did, it didn't really tell me much (see my resp. in that thread). It WASN'T checking my robots.txt file...

As for the big file problem, my logs are about 100 megs a month, too -- and growing. But instead of importing them, in Access you can select Get External Data > Link Tables, then select your log file, and it will bring up a "Link Text Wizard" that will allow you to control how it reads the file. Then, you can write a make-table query to filter for the kind of hits you want to analyze (humans, people outside your organization, no images, etc.) so you can have a smaller, more workable 'log file' to work with. In other words, you only import the hits (or rows) that you are really interested in. Once you do that first filter, the queries run much faster.

Good point about checking for failed downloads. I hadn't thought of it, and it's yet another reason I can't stand PDFs either. But it's what 'the people' want.

EliteWeb

8:45 pm on Feb 15, 2002 (gmt 0)

I use FlashStats works on almost all operating systems, has basic and advanced mode, support for multiple accounts - web based so clients can check their stats also if you give them a login and password.

Mark_A

10:12 pm on Feb 15, 2002 (gmt 0)

> As for the big file problem, my logs are about 100 megs a month, too --

Amazing how they zip up innit :-)

> instead of importing them, in Access you can select Get External Data Link Tables, then select your log file,

Indeed I use this for other things but had been importing logs. Thanks for that idea I have always got and loaded not linked.

I think I know what my problem is now, I tried linking with a 95mb log and failed but this was the time of windows virus attack which caused way way long url strings in an attempt to overcome the buffer (I am not techie enough to understand :-) but it sure affects the log files) anyhow I think the first few lines of the log are corrupted with these attacks and the access link wizard cannot sort them and then get to the rest of the data because the first lines are simply too long..

(PS Its not serious because we have urchin stats for this period)

Plus I have evidence in support of the theory because smaller logs from other times load or link rather no problem but contain these lines of Win attacks only within the file not at the start.

(BTW Unix server so no harm done)

> Good point about checking for failed downloads. I hadn't thought of it, and it's yet another reason I can't stand PDFs either. But it's what 'the people' want.

I checked a log file for a client who was raving about their pdf and found lots of examples of the same users trying 9 or more times to download one datasheet yet when I looked at the kb delivered they never got the full document.

Particularly a problem on that site because they link to pdf files as if they were an html page. People almost certainly expect to see the download bar and get at least something quickly .. with pdf they see a big chunk of time when nothing appears to happen at all while the reader is being spawned and the file loaded no download indication in the browser at all.

ktmatu

5:18 pm on Feb 16, 2002 (gmt 0)

Partial GET requests (206) don't necessary mean that the download has failed. Adobe Acrobat Reader is able to fetch PDF documents page by page, thus avoiding the long wait when nothing happens. Nine 206 requests in the log means that the user has read nine pages.

Mark_A

8:21 pm on Feb 16, 2002 (gmt 0)

> Nine 206 requests in the log means that the user has read nine pages.

ktmatu the point is that if you also add up the data transfered and it is less than equal to the size of the full pdf.

They could NOT have. :-)

nonprof webguy

10:59 am on Feb 20, 2002 (gmt 0)

Mark A wrote:
> I tried linking with a 95mb log and failed but this
> was the time of windows virus attack which caused way way
> long url strings in an attempt to overcome the buffer

Yeah, that damn admin virus created some headaches. I cleansed my logs in Textpad by doing search/replace for the long url string in its hits with shorter 'slug' text. Then I had no problem linking to the log in Access. Only had to do this for the REALLY long strings.

Mark_A

11:09 am on Feb 20, 2002 (gmt 0)

nonprof webguy - huh text pad .....

"Powerful 32-bit text editor for programmers, files can be edited up to the limits of virtual memory"

We need to share a beer sometime NPWG cause it seems you have already solved all my problems :-) Keep em coming !