Forum Moderators: DixonJones

Message Too Old, No Replies

Which route to take when reporting large log files?

         

musicales

8:39 am on May 6, 2005 (gmt 0)

10+ Year Member



My largest site gets around 10 million page impressions a month. I'm currently trying to process logs on the server but I'm beginning to give up. Webtrends works sporadically but tends to get overwhelmed, as do several other packages I've looked at - the daily log files comes in at 300Mb and it seems it's just too big for many of them to handle - not to mention the fact that it clogs up the server for 20 mins while it processes them. Not that downloading is much better - how am I supposed to easily download that amount each day. (I'm on windows by the way before someone posts some clever linux solution)
I've looked briefly into hosted solutions but for that number of pages they all seem to charge a fortune. Even google's newly acquired urchin charges an extra $99 per million after the first 100,000 pages.

What other options do I have?

Romeo

9:15 am on May 6, 2005 (gmt 0)

10+ Year Member



You need the log statistics but processing "clogs up the server for 20 mins".
Two solutions come into mind:

(1) if processing daily logs of 300 MB is too heavy for your server and your statistics package, you may try to switch the log more often: every 6 hours or every hour and process smaller chunks of data more often, therefore spreading the clogging load over the day.
At least the Webalizer log statistics package has a convenient incremental mode to support frequent runs.

(2) If you can't or won't do it on your primary http server due to file space and CPU-load constraints, then you need a second server near by (not to 'download' but to transfer the daily logs to; 300 MB on a fast local network should not be a problem), and then processing the logs as you like. This 2nd server may even be a Linux box then ...

Regards,
R.

7_Driver

12:29 am on May 28, 2005 (gmt 0)

10+ Year Member



You can run the logs through a zip program on your webserver - then download the zip files, unzip them and do your stats locally.

The free stats program "analog" is fairly simple, but gives decent basic stats, and it won't bat an eyelid at 300Mb per day.

McElvoy

1:03 am on May 28, 2005 (gmt 0)

10+ Year Member



You don't have a lot of choice. Even a fraction of that amount shouldn't be processed on the server itself if the site's busy or uses a database. Move the logs to another machine on the same network and do it in the middle of the night with a scheduled task. 300 MB won't take that long, (I'm ftp'ing 300 GB right at this moment; now THAT takes awhile). If you're logging images, css, js, etc, turn off logging for those folders and your logs will shrink by 90%. 300 MB should be able to be handled by most stats programs, even really old versions, but it should not be done on a machine that's also serving a site. You could do it on a mediocre older desktop machine if you put enough RAM in it. If a log of that size is clogging up your stats program with only 20 minutes of processing, then you're doing something wrong; ask the vendor for advice, check manual, etc. Analog really could handle this without a hiccup but then Analog doesn't sessionize the hits.

DariusYoung

2:00 am on May 29, 2005 (gmt 0)

10+ Year Member



I am pretty sure there is a hardware issue there either on the processing server or whatever else that your vendor uses. 300mb files aren't even an issue for most standard laptops (like the 300 mb files that I process on my 1 year old one).

In fact, a complete clicktracks run of 300 megs worth of logs on my laptop takes less than an hour.

musicales

6:53 am on May 29, 2005 (gmt 0)

10+ Year Member



Thanks all for the suggestions. Short of buying a new server machine, as suggested, 7_Driver - your suggestion of zipping the files was the answer for now - I was sure I'd tried this and the file size had barely reduced, but I tried again and sure enough it was over 90% less - 30M per day is of course no problem so I will probably just do that and process them on my local computer.

freeflight2

7:04 am on May 29, 2005 (gmt 0)

10+ Year Member



one line (e.g. in a crontab) can do the same job:

rsync -e ssh -avz RemoteHost:/path/to/access.log /local/logfile/path/ && analog

=> transfer + zip on the fly the log file(s), then call analog (or whatever) and process them locally.