Forum Moderators: phranque

Message Too Old, No Replies

Log analysis/summary

         

billuk

9:51 pm on Sep 30, 2010 (gmt 0)

10+ Year Member



Hello everyone,

I have strange traffic spikes where downloads increase dramatically over a period of minutes or even an hour. I'm trying to track down what are causing these. Can someone recommend a command line script that will summarise an apache log file. Not like a normal graphical log analysis tool but one that might provide eg a top50 ip addresses and the amount of data each has downloaded. It would be good if it could update in real time eg tail -f log_file.log

Does such a thing exist? Or similar?

Thanks in advance.

sublime1

12:02 am on Oct 1, 2010 (gmt 0)

10+ Year Member



Hi Bill -

No doubt your traffic spikes are from bots or other automated sources. I'll bet for every actual HTML page we serve to a human, we serve 10 to a bot. Maybe 50. Of them, I care about perhaps 5 :-)

I have worked with a lot of log analyzers, from the simple (Webalizer) to fancy (NetInsight, and that other one whose name I forget), and then of course Google Analytics.

But when looking for specific immediate patterns, I tend to fall back on grep, or my personal favorite, awk. Awk is a simple programming language for processing line-oriented output ... in other words things like log files.

This also would work with your goal of having real-time updates, since you can pipe the output of the log files to grep or awk.

The top 50 IPs on a running log file is a little tricky. But to process the last 2000 lines, you could grab the IP (first field in a log file) and just print it out, then sort, like this:

tail -2000 log_file.log | awk '{print $1;}' | sort


This would group all the same IPs together, and just looking at the output, you could see easily if there were one that was more prevalent.

Awk has arrays, conditional operators, regular expressions and everything in a pretty straightforward language. Once you get reasonably facile with it, you can whip up something simple in a few minutes. Google "awk manual" to get started if that seems like the right approach.

(And of course if you're a perl, python, ruby, php or even bash wizard, they can all do the same kind of thing).

Probably not the answer you were looking for...

Tom

sublime1

12:37 am on Oct 1, 2010 (gmt 0)

10+ Year Member



Billuk --

You might also want to check out GetClicky. It's similar to Google Analytics in that it has a bit of JavaScript that you add to your page, but it's cool because it updates in (near) real-time.

Tom