|What exactly are server logs/log files?|
I know its a complete beginner's question, but I always thought I didnt have to watch out for anything when signing up with a hosting provider as I could simply use the Google Analytics code embed it into my html code and voilą I'd have my traffic stats.
I must admit I sort of thought log files and google analytics data were the same thing.
Now, I realize that they're obviously not. But what I don't get: Why would one need Google Analytics (or other better web analytics tools) if one has the server log files?
Does Google Analytics go more into depth (segmentation of user behavior) as opposed to server log files?
Does every normal shared host usually provide "log files" for me to see?
Server "Stats" and logs are saved by and on the server itself, with "stats" being the result of various processing methods to sort, organize, and graph the raw log data. This raw log data is collected on a per-HTTP request level, so each request for a page, image, CSS stylesheet etc. is logged separately like this:
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET / HTTP/1.1" 200 32443 "http://www.example.com/page.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET /style.css HTTP/1.1" 200 2070 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
208.***.97.219 - - [28/Oct/2007:14:20:35 -0400] "GET /images/icon.gif HTTP/1.1" 200 181 "http://www.example.com/" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"
These different collections of information are useful for different things, and all of them are useful to many Webmasters. The Analytics data is useful for visitor tracking, the Stats are useful for visitor and search engine robot tracking and server utilization information, and the raw access and error logs are very useful when tracking down problems such as missing files or abusive accesses to your server -- just for some examples.
and so on.
Ideally, you'd combine both methods, as log packages like NetTracker and pre-Google Urchin do. However, Google Analytics offers a lot for free, and you'd have to have fairly deep pockets to match its features with a log analysis tool.
Thanks for the help.
So ideally Id have both: google analytics and log files?
But on the other hand Id have to pay lots of money to have log files that are capable of doing what google analytics does?
Is it still advisable to look out for a host that mentions that they do provide log files/server logs - even if Ill only be using them to augment the insights I get from google analytics?
The hosting providers I have in mind says ...alyzer (not allowed to mention the brand i guess), subdomain stats as one of their features. Are those "log files"?
No, I probably wasn't clear enough: A single app that uses both methods. Personally, I'd stick with Google Analytics for your traffic reporting if you're happy with Google having your data and it does everything you need it to do.
If you need to monitor things that Analytics won't show, such as the thing jdMorgan mentions, the bundled log analysis app from your host will be fine. I'm presuming this is Webalizer?
You'll only need access to the raw log files if you want to analyse them for data that neither Google Analytics or Webalizer will give you. The chances of this are slim.
Subdomain stats will just be a separate Webalizer report if you had a subdomain such as:
and isn't anything you need to worry about unless you have a pressing need to use subdomains.
A word of warning: If you do end up checking your traffic stats in Webalizer, the values will differ wildly from those reported by Google Analytics. Believe the Analytics data. It'll be closer to reality than an unconfigured install of Webalizer.
|You'll only need access to the raw log files if you want to analyse them for data that neither Google Analytics or Webalizer will give you. The chances of this are slim. |
until your site is scraped, hijacked, or subjected to a DOS attack...
Thanks for the replies again jetboy and jdmorgan,
I've read a bit about it in the meantime and have realized that I can use g analytics for my main purposes and use my log file data to check for the search engine robots. Oh and yes it was webalizer(webalyzer?).
What is going on with the "raw logfiles", though? I assume every server has them, too (and thus every hosting provider), but whether I have access to them or not is another story?
I assume most affordable shared hosting providers dont provide raw log files?
I also assume that as a beginner, I dont have to be worried overly much about a DOS attack or the other issues you mentioned, Jim (unless Im very unlucky)?
If you're very lucky, you'll only get your site scraped once per day, instead of constantly. Your very first 'visitors' are likely to be site scrapers, e-mail address harvesters, or copyright 'bots -- Check your raw server logs and see... ;)
There are plenty of affordable web hosts that give you access to the server access, user-agent, and error log files -- just make it a requirement when shopping. There are also plenty of hosts that don't provide access to this info.
When a site on such a host has a technical problem or is abused, sometimes the only answer is "change hosts."
Also, once you and your site become more sophisticated and have a need to implement some of the more technical aspects of running a site (such as using search engine friendly URLs), you'll find that cheap hosting is the most expensive mistake you can make, as it limits your options and makes everything difficult. Prices have dropped a lot over the years, but I used to say that any site that wasn't worth $14.95 a month to host probably isn't worth spending any of your time on either, since it takes hundreds of hours to develop a site, and you'd have to count your time as nearly worthless to make $14.95 a month sound expensive compared to your time.
But then, I'm a bit opinionated... :)
As has been said, there are both plus and minus points for however you analyse your web visitors.
However you do it, you will never get the "correct" number of visits. It just cannot be got.
However if you stick to one method, Google if you will, then what you will look at on a daily basis is variations in visits counted by that method. For most people that is sufficient to give trends, source of traffic, etc.
Personally I analyse raw log data, but use the a instant method if I want a figure "right now". In the end you get what you pay for, whether you need to pay more depends on you and how you use the info. But whatever you do, never think you will get an absolute figure for visits.
If your server is Apache with cPanel, you might have to "enable" log archiving in order for them to accumulate. Otherwise the raw logs only accumulate for about 1 day until the stats program is run on them; then the raw logs are deleted.
To enable log archiving, go to:
cPanel > Raw Log Manager
Check "Archive Logs..."
Uncheck "Remove the previous month's archived logs..."
I always recommend doing this. You can ignore or delete accumulated logs you don't want, but you can't reconstruct deleted ones.
Only the raw logs can help you study or track down exploit attempts, successful or unsuccessful.
[edited by: SteveWh at 2:32 pm (utc) on Oct. 29, 2007]
Im sorry for waiting so long to continue my questions (I was spent most of the last 2 days on college and on public transportation to get there and back).
"that give you access to the server access, user-agent, and error log files -- just make it a requirement when shopping."
Im not trying to be nit-picky, but just trying to understand the features/expressions I have to look out for: Did you mean to say "access to the server,..." or did you really mean to say "access to the server access,..."?
If Im trying to find a good hosting deal that offers these things will it usually list these things as features? or do many hosts not mention these things like that and one would have to dig deeper by sending an e-mail and asking this?
If its usually listed as a feature will they use these expressions or could they also be using different expressions that mean the same things?
I assume the best idea would be to shoot every hosting provider Im interested in an e-mail and ask them if they offer these things you mentioned (and also ask them where this is mentioned on their website to make sure theyre serious about it)?
EDIT: As for "raw log files" - will this usually be listed as a feature? If it is not listed as a feature but webalizer is listed can I expect them not to give me access to the raw logs? Send an e-mail to them and ask?
[edited by: Makaveli2007 at 4:58 pm (utc) on Oct. 30, 2007]
Hosts will usually list the feature as some variation of "raw access logs", "raw log access", etc. They might also provide Webalizer, but that doesn't imply whether or not you also can get the raw logs.
The data in the raw logs show you which IP addresses accessed your site, which pages they got, date and time of the access, HTTP result code, the site they came from (referrer) if any, their operating system, and browser (user-agent). All those fields are part of the log data.
The access logs themselves are just text data files that you download.
Google Analytics is terrific for summarizing data, but it won't provide details about an individual IP address or a particular access of your site. To look at detailed data, you need the raw logs.
You can try to run the raw logs through a database or a program like Webalizer to generate the same kinds of summaries that G.A. provides, but it isn't easy.
Importantly, the raw logs also show hack attempts on your server, which GA won't record.
[edited by: SteveWh at 6:48 pm (utc) on Oct. 30, 2007]