Forum Moderators: DixonJones
Ours is a global B2C DotCom company. Having an average 1.7G log file a day. We have over 200 thousand pages on our site and average half a million page views a day. We are using a commercial log analysing tool but its having problems with the size of the log files, plus crounching that much data its destroying the speed of my server.
Any recommendations on which are the best analytical software to track referrals of all kinds, That does not use huge amount of server resource when using them.
Thanking you in advance.
Vimes.
1) Transfer the log files off the server before analysing so that the analysis does not clog up the server.
2) Filter out of the log files any superflous lines. Perhaps if you are not interested in analysing images transfered, you can filter them out, thus leaving a greatly reduced log file. Or favicon.ico requests, just as examples.
Matt
Thanks for the feed back, filtering sounds interesting. When I put this to the admin boys they will say how, could you give me an example of how I can filter out the images. Do I have to wait for the day’s log file to stop recording or is there a method that I can stop the requests of .jpg and .gif request every being logged in the first place.
Thanks again.
Vimes.
I always like to leave everything in the original logs, in case sometime in the future I decide that something I thought was not important is important after all. But, if you're dying in a flood of log info, you gotta do what you gotta do.
You might also want to filter out HTTP requests that came from within your own company.
Although I leave everything in the original log, before doing log analysis I filter out:
So, mostly, that's local requests, plus all embedded HTML references, plus that damnable icon from hell.
Unrelated Musing: Hmmm, I wonder how my MySql schema for logfile analysis would perform if I was pumping .5 million entries in per day? Probably not so good after several months. Oh well, the poor little Pentium II it's running on is handling several million entries OK so far, so I'll just hope any great increase in traffic brings enough money for a hardware upgrade.
Thanks Ron,
sorry should have made it more clear i suppose,
Crunching the stats the package I use actually does all that you have mentioned and is already implemented, the “Admin boys” have been complaining that the log files are getting to large, taking valuable server resources to compress and taking too much disc space on the server, they curse me very time I want the stats zipped and ftp’d down to my desk top. So it’s the log file size that they are complaining about. Once I’ve got it on my desk that’s not too much of a problem I can destroy it how ever way I want. We aren’t ready for a designated server for the stats, just isn’t cost effective as yet but they want me to investigate ways of making the files smaller.
They want me to reduce what I collect but I need all that’s collected it’s an on going battle. ;) i've asked them to create maximum size logs of maybe half a gig and auto zip, but they feel that this is still a size problem as the product that shows the live data needs the logs to be uncompressed.
So no way of stopping the log file getting to the size it’s got to be then. Everything is optimized in the loggin settings so it only records the bare minimum/essentials.
Any ideas?
Vimes.
an example of how I can filter out the images
Your Admin boys should be able to configure that for you on the server side (you didn't give any clue as to what server or OS is being used, so I assume you're not looking for server config info). You'll make their life easier if you arrange to store all your graphics (and only graphics) beneath a single directory tree.
If you're like most web sites, just filtering out the graphics will chop the log size in two, at least. If that isn't enough, maybe you've got some .css and .js files referenced a lot that you can stop logging as well.