Forum Moderators: DixonJones

Message Too Old, No Replies

Multiple server log file analysis

Can logs be re-written to appear as if from 1 sever?

         

fom2001uk

8:43 am on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have what is probably quite a common problem. I'm trying to run stats report over a period of sevearl months on a popular log analysis software.

Unfortunately the version we have won't allow reports run on logs from different servers (we recently moved a whole bunch of websites onto a new server, during this period of study)

I thought we could get around this by just doing a search and replace, swapping one server IP field for another, so it looks like they're all from the same server.

But it's not that simple - some of the fields in the log are different between the servers, and the order of the fields is completely different.

Has anyone encountered a similar problem and been able to get around it in this way? If so, how? Did you develop your own script or use a particular tool (off-the-shelf)?

scintex

10:17 am on Jul 20, 2005 (gmt 0)

10+ Year Member



Hi,

On one of my smaller sites I use Mergelog:
[sourceforge.net...]

To be honest I don't fully understand it as it is written in C. However, one of our programmers modified it, to quote "a bit", and it merrily merges Gb's of logfiles every day for us.

However you may also fancy approaching it this way:

Although we have many different servers, we make use of Apache's (its actually IBM HTTP Server...anyhow...) comonvhost directive. There is an explanation of virtual hosts and how to go about logging here:
[httpd.apache.org...]

Is this the kind of thing you are after?

S

fom2001uk

11:32 am on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Our servers are IIS, but I'll have a look at Mergelog, thanks.

scintex

7:56 pm on Jul 20, 2005 (gmt 0)

10+ Year Member



cool, i hope it is helpful.
it might be worth looking at virtual hosts (or what ever the microsft-ism is for it) and address all of your logging in one go.
Putting my apache hat on (i assume there are similar things in IIS), its great to have one acccess log for the sites that consist of our 'application', i.e. 3 sites on the same architecture that share a webserver(s).
However we still have to split the log file up into threee chucks before our stats prog will read it all properly and for that i use split file:
[httpd.apache.org...]

its part of apache, but might work with logs from iis, if the logs are in a similar format.

fom2001uk

1:54 pm on Jul 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mergelog is for Apache only, sadly.

This is proving to be a real nightmare. I've now got a programmer working on a search & replace tool that will do this.

The first 2 attempts have failed - the log file is being corrupted during the S & R process, and our stats package is just ignoring huge chunks of data.

You'd think there was an off-the-shelf application to handle this, wouldn't you?

cgrantski

2:57 pm on Jul 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



how big are your logs, and do they currently include image and other non-analyzed hits? (because logging images etc usually makes the logs about 10 times the size that they really need to be for analysis)

larryn

10:04 pm on Jul 21, 2005 (gmt 0)

10+ Year Member



fom,

What tool are you using for the analysis? Even if the IP changed, did the name of the server also change? Maybe that will help with a solution...

Larry

gregbo

4:39 am on Jul 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I thought we could get around this by just doing a search and replace, swapping one server IP field for another, so it looks like they're all from the same server.

But it's not that simple - some of the fields in the log are different between the servers, and the order of the fields is completely different.

Hmmm ... you say these are IIS logs ... do each of the logs from each of the servers have the header field (#Fields:) indicating the type of data that's in a given field? If so, you should be able to rewrite each log to have a common field ordering. OTOH, if things are really jumbled up, you could try statistically determining which of the fields is most likely to occur in a given place by doing a cut on each field and counting the number of times a particular type of data occurs in a field. (This may not be completely automatable, but you can probably pick out some typical data types such as the date and IP address.) Once you've done that, then you can do the reordering, and drop anything that just won't reorder as corrupted entries.

Unfortunately, I don't have a program that does this. In general, I have advised that people who generate logs provide some way to distinguish individual log entries, at the very least, and possibly the origin server of each log, by including this information in header files. This makes it easier to reprocess old logs and to identify corruption.

Whatever you do, make sure you have a reliable backup of those logs!

fom2001uk

8:54 am on Jul 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the replies, guys. Just to give you some more detail. Yes, our logs do include image hits but I can't change that (we've done it that way for 4 years, and we need to keep doing it the same way for consistency, otherwise clients will get upset)

The package we're running is good old webtrends (yes I know they have a version which can handle multiple server data, but it's waaaay out of our budget).

Our programmer is still trying to hack the log files, but so far is coming up empty. Turns out the fields are exactly the same in both server logs, it's just they're in a different order. Other than that the only difference is the server IP and the IIS version number in the header.

gregbo

8:42 pm on Jul 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Our programmer is still trying to hack the log files, but so far is coming up empty. Turns out the fields are exactly the same in both server logs, it's just they're in a different order. Other than that the only difference is the server IP and the IIS version number in the header.

This doesn't seem too difficult. As long as you know what the order is in each log, your programmer should be able to split each log record into individual fields and reorder them in whichever way you want.

cgrantski

8:58 pm on Jul 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you're running Windows you can create a copy of the log with re-ordered fields with simple one-line DOS commands, by tokenizing the fields. It's even easier in unix. Your programmer is probably making it too difficult for him/herself.

Adam_T

2:30 pm on Aug 9, 2005 (gmt 0)

10+ Year Member



Did the 'mergelog' program help? We may be having to look at something like this in the future due to shopping and purchase servers being located on different servers, I really hope it is quite easy to do as tracking customers from enterance to purchase can be difficult I have found.