Forum Moderators: open
Hits: Number of files on your server that have been looked for by a browser, including images, flash and so on.
Page views: Number of actual web pages requested by the browser, i.e. html (or asp, php,...) files.
In our case we leave these as js, css, cgi, pl etc.. they are not really a full page and can give you a false impression of what people are actually reading.
Unique visitors are different, sometimes called unque servers, they will be lower, perhaps much so that your "page hits". So hope we have helped to answer your question teeceo.
Unqiue server visits are your best estimate of number of visitors, though still largely imperfect. Some log analysers get smart and count a visitors from the same server who visit a particular time apart as 2 users. To my mind the data is so imperfect anyway that introducing a further avenue for error is failry useless.
1) identify all users by combining the client ip address or domain name with the user agent. i.e.
123.123.123.123 msie5 => 1 user
123.123.123.123 msie4 => another user
2) deduct all likely non human traffic including search engine spiders, email harvesters, code red like viruses etc this can be done by looking at user agent strings, known IP addresses and the pattern of activities on the site. This can account for up to 25% of the sites traffic in our experience.
3) Identify individual sessions based on clusters of activity on the site with at least a 30 min gap between transactions.
Not perfect and quite difficult unless you have the appropriate software or service but the only way to get the closest numbers from the log files.
Basically the raw data is the problem (the raw logs) and any amount of sophisticated analysis of that flawed raw data cannot solve that problem. It just makes the data look more credible than it really is. Proxies and cacheing is one of the major problems in determining uniqiue visitors.
Our company has done extensive studies on the use of log file data to determine site performance against marketing goals. We have found that log file data is the most accessible data source to support monitoring changes in site activities and their relationships to other factors such as site promotion and usability analysis. We are however very careful to ensure that we present the data as estimates only, from which only relative comparisions can be drawn.
In this way our methodology is similar to surveys conducted by a professional surveying company where one companies results such as 40% of people think this conflict with another company who say 60% think this. The methodologies differ so apples can not be compared with apples, however when tracking results with the same methodology over time and accepting that these are only estimates and not absolutes, a large amount of value can be drawn from the log files. Not least of which is which sites are sending the traffic and what keywords are being used in which search engines.
The reason that I mention cost is that the features required to filter the data set generally do not exist in the free systems or the lower end of the market and are only in tools with excessive $$$ price tags, no other reason. We ended up developing our own internal system to get around some of the problems with the tools on the market.
I know some "free" analysis software does not allow much flexibility and options. However some of the free stuff (such as analog) provide even far more flexibility than Webtrends etc.
Playing around with a log file by sorting and filtering and searching in a spreadsheet I find is a good way to get a feel of the nature of the raw data.