Let's say you want to develop your own web statistic analysis tool that runs on one web server and tracks all your other sites using a piece of tracking-code. What would be your most favourable approach to correctly define and track Unique Visitors and Page Impressions?
Unique Visitors
You want to deal with Proxy-Servers and/or Browser-Cache. A possible approach could be: count new Unique only if either:
REMOTE_ADDR (IP address of remote client or of proxy server)
HTTP_X_FORWARDED_FOR (IP address of client behind proxy server, when allowed)
HTTP_USER_AGENT (OS and Browser)
are different.
You might also want to implement some piece of source code that modifies the tracking code for each request (for example a time stamp or a piece of random-code). This would make tracking more precise if proxy-servers are involved.
These measures should help to prevent the solution from counting less UV than there are effectively.
Now what about how to avoid counting more UVs than there really are?
First, there is the spider issue. A solution should be to include a list of blocked user_agents and IPs so if one of these IPs/user_agents requests the page the request won't be counted.
Now, there is still the problem of interrupted dial-in connections and IP changes during one session. I was told that some providers like AOL(?) tend to change IPs even in one session.
For both problems, I could only think of cookies as a solution. However, my concern is that these cookies might be considered as third party cookies and therefore be blocked by the browser.
Page Impressions
Here I can think of the issue that a new Page Impression should not be counted if the user simply refreshes the page within a certain time period. Something like 10 seconds might be appropriate.
Now do you think this would roughly do the job? I'd appreciate your comments.