Forum Moderators: DixonJones
I must admit I sort of thought log files and google analytics data were the same thing.
Now, I realize that they're obviously not. But what I don't get: Why would one need Google Analytics (or other better web analytics tools) if one has the server log files?
Does Google Analytics go more into depth (segmentation of user behavior) as opposed to server log files?
Does every normal shared host usually provide "log files" for me to see?
thanks!
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET / HTTP/1.1" 200 32443 "http://www.example.com/page.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET /style.css HTTP/1.1" 200 2070 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET /images/icon.gif HTTP/1.1" 200 181 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
Reports such as Google Analytics are (usually) the result of client-side code (such as JavaScript) embedded on the pages of your site, and therefore reflect only page-fetch related data; No information is available on requests for images, CSS stylesheet requests, etc. If the client does not support or allow JavaScript, no information about that client's visit can be collected.
These different collections of information are useful for different things, and all of them are useful to many Webmasters. The Analytics data is useful for visitor tracking, the Stats are useful for visitor and search engine robot tracking and server utilization information, and the raw access and error logs are very useful when tracking down problems such as missing files or abusive accesses to your server -- just for some examples.
Jim
. Log analysis will report bot traffic, JavaScript won't
. Log data can be skewed by caching, JavaScript data isn't
and so on.
The industry has moved towards JavaScript tracking in recent years as it's easier to implement and the companies can bill you for traffic each month rather than just once for a log analysis package. It's a better business model for them.
Ideally, you'd combine both methods, as log packages like NetTracker and pre-Google Urchin do. However, Google Analytics offers a lot for free, and you'd have to have fairly deep pockets to match its features with a log analysis tool.
So ideally Id have both: google analytics and log files?
But on the other hand Id have to pay lots of money to have log files that are capable of doing what google analytics does?
Is it still advisable to look out for a host that mentions that they do provide log files/server logs - even if Ill only be using them to augment the insights I get from google analytics?
The hosting providers I have in mind says ...alyzer (not allowed to mention the brand i guess), subdomain stats as one of their features. Are those "log files"?
thanks!
If you need to monitor things that Analytics won't show, such as the thing jdMorgan mentions, the bundled log analysis app from your host will be fine. I'm presuming this is Webalizer?
You'll only need access to the raw log files if you want to analyse them for data that neither Google Analytics or Webalizer will give you. The chances of this are slim.
Subdomain stats will just be a separate Webalizer report if you had a subdomain such as:
yoursubdomain.yoursite.com
and isn't anything you need to worry about unless you have a pressing need to use subdomains.
A word of warning: If you do end up checking your traffic stats in Webalizer, the values will differ wildly from those reported by Google Analytics. Believe the Analytics data. It'll be closer to reality than an unconfigured install of Webalizer.
I've read a bit about it in the meantime and have realized that I can use g analytics for my main purposes and use my log file data to check for the search engine robots. Oh and yes it was webalizer(webalyzer?).
What is going on with the "raw logfiles", though? I assume every server has them, too (and thus every hosting provider), but whether I have access to them or not is another story?
I assume most affordable shared hosting providers dont provide raw log files?
I also assume that as a beginner, I dont have to be worried overly much about a DOS attack or the other issues you mentioned, Jim (unless Im very unlucky)?
thanks
There are plenty of affordable web hosts that give you access to the server access, user-agent, and error log files -- just make it a requirement when shopping. There are also plenty of hosts that don't provide access to this info.
When a site on such a host has a technical problem or is abused, sometimes the only answer is "change hosts."
Also, once you and your site become more sophisticated and have a need to implement some of the more technical aspects of running a site (such as using search engine friendly URLs), you'll find that cheap hosting is the most expensive mistake you can make, as it limits your options and makes everything difficult. Prices have dropped a lot over the years, but I used to say that any site that wasn't worth $14.95 a month to host probably isn't worth spending any of your time on either, since it takes hundreds of hours to develop a site, and you'd have to count your time as nearly worthless to make $14.95 a month sound expensive compared to your time.
But then, I'm a bit opinionated... :)
Jim
However you do it, you will never get the "correct" number of visits. It just cannot be got.
However if you stick to one method, Google if you will, then what you will look at on a daily basis is variations in visits counted by that method. For most people that is sufficient to give trends, source of traffic, etc.
Personally I analyse raw log data, but use the a instant method if I want a figure "right now". In the end you get what you pay for, whether you need to pay more depends on you and how you use the info. But whatever you do, never think you will get an absolute figure for visits.
To enable log archiving, go to:
cPanel > Raw Log Manager
Check "Archive Logs..."
Uncheck "Remove the previous month's archived logs..."
Click Save
I always recommend doing this. You can ignore or delete accumulated logs you don't want, but you can't reconstruct deleted ones.
Only the raw logs can help you study or track down exploit attempts, successful or unsuccessful.
[edited by: SteveWh at 2:32 pm (utc) on Oct. 29, 2007]
"that give you access to the server access, user-agent, and error log files -- just make it a requirement when shopping."
Im not trying to be nit-picky, but just trying to understand the features/expressions I have to look out for: Did you mean to say "access to the server,..." or did you really mean to say "access to the server access,..."?
If Im trying to find a good hosting deal that offers these things will it usually list these things as features? or do many hosts not mention these things like that and one would have to dig deeper by sending an e-mail and asking this?
If its usually listed as a feature will they use these expressions or could they also be using different expressions that mean the same things?
I assume the best idea would be to shoot every hosting provider Im interested in an e-mail and ask them if they offer these things you mentioned (and also ask them where this is mentioned on their website to make sure theyre serious about it)?
thanks again!
EDIT: As for "raw log files" - will this usually be listed as a feature? If it is not listed as a feature but webalizer is listed can I expect them not to give me access to the raw logs? Send an e-mail to them and ask?
[edited by: Makaveli2007 at 4:58 pm (utc) on Oct. 30, 2007]
The data in the raw logs show you which IP addresses accessed your site, which pages they got, date and time of the access, HTTP result code, the site they came from (referrer) if any, their operating system, and browser (user-agent). All those fields are part of the log data.
The access logs themselves are just text data files that you download.
Google Analytics is terrific for summarizing data, but it won't provide details about an individual IP address or a particular access of your site. To look at detailed data, you need the raw logs.
You can try to run the raw logs through a database or a program like Webalizer to generate the same kinds of summaries that G.A. provides, but it isn't easy.
G.A. uses Javascript tagging to obtain its data. If a user has JS turned off, GA won't record any data for that visit. GA also only records pages retrieved. The raw logs show all files retrieved, including all pieces of pages such as CSS files, gifs, etc. So the results from your raw logs and GA won't agree. That's why it's helpful to have both.
Importantly, the raw logs also show hack attempts on your server, which GA won't record.
[edited by: SteveWh at 6:48 pm (utc) on Oct. 30, 2007]