Forum Moderators: DixonJones

Message Too Old, No Replies

Need advice

Logs are almost 1gb

         

Katie_Venra

5:11 pm on Jun 12, 2005 (gmt 0)

10+ Year Member



Ok, firstly...

Server OS: Windows

Control Panel: Plesk 7.5.0

The log files i have are huge as the site is visited a lot everyday. The log diles are not over 1gb in size and searching even the daily ones is impossible (im using the logs that come in Plesk.

Is there anything to make it easier for me to look through these logs to see how many time search bots are hitting the site?

Katie_Venra

6:20 pm on Jun 12, 2005 (gmt 0)

10+ Year Member



PS: sorry for the spelling errors, i was typing with one hand and argueing with my boss at work at the same time :(

GaryK

7:20 pm on Jun 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What are you recording in your logs? You might be logging things you don't really need and that take up a lot of space like cookie data or even absurdly long referrers like I had on my site until I started using URL rewriting.

Katie_Venra

8:04 pm on Jun 12, 2005 (gmt 0)

10+ Year Member



It looks like it's recording basically everything. The logs for each day is almost 40mb each, im seing records about .jpeg's being served etc etc.

From the plesk panel itself i cant see anyway to change what the logs record.

cgrantski

9:03 pm on Jun 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If Plesk can't do it at all and if you can't get at the real control panel for the Win server software then I think your only choice is to preprocess the logs themselves to remove extraneous lines, before feeding the logs to an analysis program or to Excel/Access. However before giving up on Plesk, note that in Windows server software you turn off logging selectively in the properties of individual folders, not in the logging control area. Each folder on your web site gets logged, or not, as you specify in the properties for that folder. (This means, of course, that to turn off logging for images you have to have all your images in folders that don't also contain things that you want to log.)

For preprocessing your logs you can use a windows command such as "find /V ".jpg" filename.log > filename.nojpg" or you can use a good editor that can handle files of any size such as TextPad. On sites I do stats for (many sites) you can almost always shrink your logs by 90% when removing extra lines.

At the same time, logs of 40 MB a day are manageable without being shrunk - I assume you are using a stats program of some kind. If the program is choking on logs of this size then something else is wrong.

Katie_Venra

2:32 am on Jun 13, 2005 (gmt 0)

10+ Year Member



Nope, im not using any kind of stats program, all im doing is getting a log out and searchng for anything with google in it so i can get a small idea about how the spider is doing.

Jack_Hughes

8:34 am on Jun 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



well, firstly you need to get hold of a log analyser. there are plenty of free ones around. I use awstats & find it to be excellent. doesn't cost a penny either. there's gold in them there logs & you are missing out on it.

secondly, as for your original question. try archiving your logs on a periodic basis. either automatically or manually. if you don't know how to do it, ask your hosts to set it up for you.

cgrantski

1:02 pm on Jun 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or, once you've removed the junk from your logs, they'll be small enough that you can open them with Excel (delimited, space) - then sort on the Referrer field. If you colorize all the Google referrer lines then re-sort on IP or cookie, you can also examine many (not all) of the Google visits to see outcomes. This isn't something you want to do with all your logs but it's very educational to do with an occasional sample day. By following the visitors' tracks through a single visit, you can get inside their head a little bit.

aeomac

2:19 pm on Jun 15, 2005 (gmt 0)

10+ Year Member



you might want to consider cron jobbing your log files to another server...these files can take up a lot of space.