I use M$ Access. It is the only software I found that is able to cope with the size of a log file. Then I use different queries to see the relevant entries (like user-agent = Googlebot).
As for your problem with the google cache, try using this tag:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
I use WebTrends...
Yep.. i use a simple spreadsheet at times to get a feel for traffic. Excell for example. Sorting and filtering can get you the info that the dedicated log analysers dont so well. Never thought of using a database like Access but thats a good idea. After deleting all the image hits etc, Escell can still handle thousands of records at a time. So great for small sites like ours with say less than 10,000 "real" page views a day.
After all. log analysers are just a glorified spreadsheets.
chiyo, how do you delete image hits and the like?
(original poster) Don't you think there's a market out there for a product that gives you lots of control but can only handle a few thousand hits? Some people don't like messing with Access or Excel. Then again, a product that puts it all into Access or Excel the right way might be nice. I would think people have trouble getting the W3C Extended Log File format into Access. Mine does that but I also added a nice little report designer.
(Disclosure: I made a product out of mine but I know I'm not supposed to sell here. So I won't say what it is. I promise to be good. I think I'm going to like it here.)
just run a filter for lines including the string ".jpg" or ".gif" or ".ico" or ".js" or whatever other extensions are superfluous to your analysis and then delete or exclude them. Will reduce your file size by about 70% to 95% depending on how many images/scripts you have per page.
No, what I meant is with what method/software do you run that filter? The reason I use Access and not Excel (which would be more practical) is that the file is too big prior to filtering.
Ah sorry, I preprocess using a text editor with a macro first if the file is too large to import into a spreadsheet first.
I take the raw log and:
- Use the freeware program LookUpIp to change the addresses (e.g. 999.999.999.999 -> www.somesite.com)
- Use GREP to remove .jpg .gif .js and robots
- Use SED to "repair" Danish and German letters (e.g. %F8 -> ø)
- Look at the log with the freeware application Loggling (remember to delete the lines added by GREP)
To get an overview of the search words and search terms, I use the freeware program Analog.
I'm really surprised there not more apps out there. I'v been looking for programs for a while, not found anything that great. The homemade scripts work well for me with Access or SQL as they tend to just log the visits/ uniques, which is all i really want, and can I run SQL queries to find out more info. The problem is the size of the data, and the referalls take some reading as the phrases are just stuck in with all the other stuff.
Personally I dont find webtrends very useful at all, it counts every single server request and can therfore be quite misleading on how buzy the site really is.
To handle bigger files, I have started to write a program in C++ to load referer text files into. One of the hardest parts is to work out the the rules for parsing each of the main search engine querystrings. Do you have some info on this?
The 'origq' is something to add to the list now :) thanks
I have a ton of data going back 3 yrs and hopefully when its done it will give me a concordance of referalls that I can process and will show me really neat stuff like
- Frequency of individual words, phrases
- Difference between search engines
- maybe even help me to guess which words to target :)
I found a compilation of log analzyers [uu.se] by category. I'm most interested in the referral log analyzers as there are lot of ways to see the data I had not considered before. This category I like a lot -- External referring URLs to this site, and their local targets [ktmatu.com].
You can get an idea of which keywords, intended or not, are effective for a specific page.
I log all my stuff into a database. Besides setting a cookie for sessions (30 minutes), I also set another cookie for a visitor number, for 1 year.
So i get something like:
visitor session ip request agent referer time
After that queries, views and stored procedures give me more info than all analyzing packages in the world.