homepage Welcome to WebmasterWorld Guest from 54.198.46.115
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
Need help in interpreting raw log files
I find Webalizer reporting data notably different from raw log files
federico2005




msg:3926619
 4:44 pm on Jun 4, 2009 (gmt 0)

My Webalizer server stats is reporting a number of page views far and away lower than what I found by looking into my raw log files. As I'm not sure if I am interpreting my raw log correctly, I would appreciate your advice about method I am using in doing it. Here's an example of how a line of my raw log files looks:
 
62.77.164.550  - -  [20/May/2009:09:28:04 -0700] "GET /index.php/ HTTP/1.1" 200 45626 "http://www.google.ie/search?hl=en&q=..." "Mozilla/4.0 (compatible; ....)"

to make it more readable, below I make a list of the values included in the example line (with each numbered value corresponding to a column in the line):
1. 62.77.164.550
2. - -
3. [20/May/2009:09:28:04 -0700]
4. "GET /index.php/ HTTP/1.1"
5. 200 45646
6. "http://www.google.ie/search?hl=en&q=..."
7. "Mozilla/4.0 (compatible; ....)"

In order to know the number of page views I take only the files with extension .php included in the column 4.
Now, what I find through this method is a number of pages hugely bigger than what Webalizer reports.
Also, in order to get rid of log-entries caused by robots, I eliminate the pages where in the corresponding column 7. (including info about browser) I find the string: bot. Even with that I am still finding a number of page views largely higher than what Webalizer reports.

Am I missing anything? Also, as most of my pages views have a dynamic Url I wonder does Webalizer read all of pages views having dynamic Url?

Thanks

 

Megaclinium




msg:3930099
 4:15 am on Jun 10, 2009 (gmt 0)

Not all bots have bot in ua.

I take the raw logs and write a db table, set on a script hourly. Because the logs reset daily, I key them by ip + time + other unique info and don't update twice.

On the way in I lookup each from a robots IP lookup table. If bot goes to second db table.

(as I find new bots I add their 4 segment IP).

I lookup first by 4 segment IP in robots table,
if a range I've banned that jerk bots come from and they change it, I simply write 3 segments to bot table, (1st 3) which it also checks if 4 segments not found.

Now that separated to users (or unknown bots not already in my bot table), and bots, is much easier to get counts.

The script program displays a continuous hourly count but I can on demand or daily do query off the users table, and if I've added any that were really bots it skips them.

(sometimes you need to look at behaviour to see if bot, like it hits head which real users don't or scrapes a bunch quickly, etc)

then you can do whatever you want in ways of query with the table, which tends to grow large.

I can sep out by IP to give a 'visit' trail for any IP or chain of IPs in the case of AOL visits and the like.

Other interesting analyses can be done.
I wouldn't put too much into webstats figs. they are not as smart as you can be if you analyze the logs yourself in detail

I keep one log a day so I can reload the db if something goes corrupt.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved