Forum Moderators: DixonJones
For some odd reason this month the stats appeared to be wildly fluctuating by up to 50% for several days which immediately put me into a panic mode that maybe my SE rankings were fluctuating as well.
OK, so I wipe the sweat off my brow and check Google Analytics which showed no change, everything was status quo, but always slightly lower since Google only counts browsers with javascript enabled so web spiders and disabled javascript don't show up in GA.
Now concerned that maybe my analytics script was crashing or something during processing, I decided to manually analyze my daily log files and get a raw number of actual IPs in the file.
Hopped in my stats directory on the Linux box and started checking each daily log file with the following command just to get a ballpark of total visitors looking at webpages:
grep ".html" -i access_log ¦ grep ".png" -iv ¦ grep ".txt" -iv ¦ grep ".jpg" -iv ¦ grep ".gif" -iv ¦awk '{ print $1 }' ¦ sort -n ¦ uniq ¦ wc -l
The reason for the various exclusions like 'grep ".jpg" -iv' is to eliminate the images and other files optionally being served off other servers, such as banner ads linked to my site, etc.
Sure enough, the script cranked out a number that almost matched some of the visitor counts but the log analysis script was wildly off on other days. Then I reversed the grep to get a count of files not being served from my box and the total numbers combined still didn't make sense with the discrepancy.
I'm not sure what to do at this point because it's obvious my raw log file analytics are a total lie and I'm not sure if it's just a problem with my site/server or if this software is simply buggy.
Not to panic everyone, but it's the default web analytics that's included with Plesk's control panel, so I've had quite a few years of history with this thing that now all appears to be somehow tainted.
Sigh...
Anyone else get similar discrepancies?
[edited by: incrediBILL at 11:14 pm (utc) on Mar. 21, 2008]
it's obvious my raw log file analytics are a total lie and I'm not sure if it's just a problem with my site/server or if this software is simply buggy
Webalizer builds off the CLF so it seems unlikely that your raw log is wrong or corrupt, but never say never. As far as the software, the last update was April 16, 2002 and there was a note in the fix regarding
mismatched KByte totalsso if you are running anything less than Version 2.01-10, you don't have the "latest" copy.
My local log files are just fine as I can run other tools on them that appear to crank out the correct information.
I don't think there's much more I can do considering it's integrated into Plesk without risking breaking things so I'll just find some other raw log analysis tool that appears to be more accurate and run them in tandem and see what happens.