Forum Moderators: DixonJones

Message Too Old, No Replies

Combining Log Files

Are you log files combined or by day?

         

Fence

9:54 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



Hello all,

I have a rather newbie question, but it definitely needs answered before I go crazy. My site host is WebSite Source and they have a control panel that has a site statistics package called "http-analyze 2.4".

My problem is I want to download the raw log files to analyze them locally with "WebLog Expert" and no one at there support desk know if they offer this. Now I have managed to find a folder that has files ending in .gz but there are seperate ones for each day. Do I have to combine these some how to get a statistal look at the whole month? Furthermore they only have them going back to the beginning of March. Is this normal?

Sorry for all the newbie questions on log files but I don't know what a "raw log file" entails and my host is clueless.

Thanks,
Josh

[edited by: Fence at 9:56 pm (utc) on April 18, 2003]

Fence

9:55 pm on Apr 18, 2003 (gmt 0)

10+ Year Member



oops sorry, double post

tedster

12:45 am on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now I have managed to find a folder that has files ending in .gz but there are seperate ones for each day. Do I have to combine these some how to get a statistal look at the whole month?

Those sound like the files you need. The extension .gz shows a kind of zip compression. Servers can be set to create a single file that just grows and grows until you interrupt the process -- or they can output one file a day, which I find to be much more manageable.

Your analysis needs to include the single files for every day you want included in your overall "look", but you don't need to append all the files into one monster. That would be unwieldy anyway. Analysis software will usually aggregate stats from all the files you tell it to look at.

And if you really want to get simple about it, because log files are text files, you can use a free tool like grep to examine piles of logs in one look.

If you've only got the native notepad for working with text files, you'll probably want something a bit more heavy duty, because log files get BIG. I use EditPad and find it excellent - and there is a free "Lite" version. A Google search for 'text editor' will turn up others, each of which have their fans.

Furthermore they only have them going back to the beginning of March. Is this normal?

Given the file space that log files take up, most hosts will not keep them going back too far. Maybe one or maybe three months. If you want a historical record, you need to archive them locally. But to answer your question directly, yes this is normal.

Fence

3:27 am on Apr 19, 2003 (gmt 0)

10+ Year Member



Well the free trial of WebLog Expert which I was going to buy for $75 only has a place for one file I believe. Is there another program that is as cost effective that would allow me to reference say 30 days (files) worth of stats. Also what about javascript trackers such as IndexTools which I have seen mentioned here. I suppose you would have to place the javascript on every page correct?

The main features I need are:

1) What keywords used and from what search engine they came.

2) When the bots visit me.

3) Exit Pages

4) Tracking my CPC programs such as adwords.

I suppose these are all pretty standard features of many programs, correct?

Thanks,
Josh

ggrot

5:53 am on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If these files arent very big, just decompress them:
gzip -d filename.gz

then concatenate them
cat day1.txt day2.txt day3.txt > month.txt

tedster

5:54 am on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been happily using FastStats for many clients. Some have more elaborate needs and they usually purchase a more enterprise solution for the big bucks.

Yes, javascript on every page is a common approach (and a serious limitation, IMO).

Certainly your first two requirements are as basic as you can get. Pulling KWs out of the search engine referers is essential, as is giving you the search engine name.

And tracking PPC is usually as easy as putting a query string on the end of your URL when you place the ad (example.com/product.html?src=ov for instance)

Exit pages are also a common feature - but you do need to pay attention to how an exit page is defined. No further click after how long?

Of course almost all the stats require attention to definitions, and it's rare that any two packages would come up with the exact same numbers because of this. So the greatest value comes from regular analysis and the comparison of units of time to spot trends.

It's amazing how challenging it is to create a technical definition for a stat that lines up with our common sense ideas of what we want to measure. Server logs were not created with merchant's needs in mind, they were created with technicians needs in mind.

aspdaddy

11:26 am on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Al of what you require can be done without commercial analytics software, just using the approaches tedster suggested (grep), and/or other tools to help such as m/s log parser, which lets you use SQL queries and functions to replace and manipulate querystrings to help with tracking.

The downside is you need to be able to define and write your queries, in some form. You need to be pretty technically minded to do this for many e-merics as opposed to just hit or page counting.

The advantage is you can be sure of what you are measuring, and do a lot of ad-hoc querying, not relying on a vendors definition of a unique user, session or repeat visit, which are sometimes very questionable.

Fence

1:41 pm on Apr 19, 2003 (gmt 0)

10+ Year Member



tedster said:

"Yes, javascript on every page is a common approach (and a serious limitation, IMO)."

Could you elaborate on the "serious limitation" part?

Thanks

tedster

4:47 pm on Apr 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, the user needs to have javascript turned on. Depending on your audience, you may miss a hearty percentage of your statistics. I've heard of some sites where 10% of visitors or more have js turned off.

Second, if the script calls out to a third party server, then your page load times are dependent on the responses of two servers. Third party servers are one of the common causes of page loads hanging, from what I've seen.

Finally, I don't see how a javascript-based tracking system can show you stats about spider visits - they definitely don't parse javascript.

Fence

2:21 am on Apr 21, 2003 (gmt 0)

10+ Year Member



Thanks tedster,

I actually downloaded all 18 zipped files for march and combined them into a 30 meg text file. WebLog expert was then able to read this and give me my stats for the month. If I stay on top of it and add each days stats it should work fine.

Now is there a better program then "WebLog Expert" for $75?

Thanks all.

markusf

7:51 pm on Apr 21, 2003 (gmt 0)

10+ Year Member



if your using IIS, just allow logging to file and then look in those files...?