Welcome to WebmasterWorld Guest from

Forum Moderators: bakedjake

Making easy graph XML output from log files

Works in Linux and Windows!

10:29 pm on Jun 23, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Remember, you can download Windows versions all the Linux command line tools so anyone can use these simple but powerful commands to process mass quantities of data without buying all sorts of fancy expensive software.

As a matter of fact, my target output was for the web, so I used the free version of FusionCharts which is Flash, avoiding tools like Excel as much as possible.

Many PHP programmers might have a first reaction to read in the comma delimited file and process it in PHP code, or LAMPers might convert it so MySQL, but both processes would take quite some time to code when a single command line written in a few seconds can easily produce quality results in a fraction of the time.

What I had was a log file specifically tracking local access to some pages and I finally needed to collate and sort this log file into a format I could import directly into FusionCharts and display a simple graph.

My starting log comma delimited log file looked like this:

date time,country,state,city,IP,etc.,etc.,etc.

or as AWK would see it:


My goal was to extract just the US data and sort the hits on individual States and output the results in FusionChart compatible XML.

Here's the one liner that processes the file into XML:

grep "United States" somelogfile.txt  awk -F "," '{print ","$3}'  sort  uniq -c  sort -nr  awk -F "," '{printf "<set name=\"%s\" value=\"%d\"/>\n",$2,$1}' > chartdata.xml

NOTE: WebmasterWorld breaks the solid PIPE bar and replaces it with which you'll have to fix for this to work.

Let's dissect this to make some sense of it.

grep "United States" somelogfile.txt 

This grep extracts just the log file entries for the US, leaving out the rest of the world and pipes the results into the AWK command

awk -F "," '{print ","$3}' 

In my logfile the fields were separated by commas (-F ",") to make parsing with AWK easier because I knew countries and states would have spaces in them so spaces as a delimiter wouldn't fly.

The $3 is the third field, which in my file was the state.

This field ($3) would obviously change for your files or purposes but what comes next should work as-is for most applications.

Note that I added a comma "," in front of every state which will be used later to save some trouble.


As simple as it says, the whole output of states, including the comma prefix, is sorted alphabetically.

uniq -c 

Count out all the unique instances of each state and preface the state with the counter, and it will include the comma for easy AWK'ing shortly such as " 102,California"

sort -nr 

Now sort the " count, state" organized data numerically and in reverse order so you have the file ordered from the most popular state down to the least.

awk -F "," '{printf "<set name=\"%s\" value=\"%d\"/>\n",$2,$1}' > chartdata.xml

This is the final XML output command using AWK to chop up the " count, state" data into $1 and $2 and then rewrite it in XML format.

Notice I used the "printf" in AWK to do a formatted output which removes all the spaces from the number field via the decimal format output of "%d". Had the spaces remained it would cause FusionGraphs to not like the data and it's a nice clean step.

The final output looks like this..

<set name="California" value="2608"/>
<set name="Florida" value="2601"/>
<set name="Texas" value="2205"/>
<set name="New York" value="1604"/>
<set name="Ohio" value="1408"/>


Now you just need to preface the XML data with the <graph...> element and append the file with the closing </graph> element and you're done.

Place those in files graphstart.txt and graphend.txt and cat them together

cat graphstart.txt chartdata.xml graphend.txt > mygraphdata.xml

Voila! There you have a nice XML file "mygraphdata.xml" ready to be graphed.

Of course you might need to manually specify the bar or column color codes to make it look pretty, but I can't answer all your problems in a single command line now can I?


[edited by: encyclo at 5:53 pm (utc) on June 24, 2009]
[edit reason] fixed typo [/edit]

1:08 pm on Jun 24, 2009 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

How did you get the nickname "incrediBill"? By beautiful performances like this one. Fantastic!
3:53 pm on Jun 24, 2009 (gmt 0)

WebmasterWorld Senior Member vincevincevince is a WebmasterWorld Top Contributor of All Time 10+ Year Member

Brilliant; awk is something I need to get more used to using instead of clumsy list loops.

If anyone (on windows) wants to get most of the important tools and a linux shell - install CYGWIN - it gives you a real linux prompt with all the fittings and fixtures you expect, within Windows.


Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month