Thanks for everyone's input. Lots to think about here.
99.9% is not vital... BUT... If non-cookie traffic is ignored altogether from measuring trends then this is probably better than anything else, and is at least complete in its own right. However, my concern is that the percentage of people rejecting cookies is on the increase - from privacy software shipped with Norton Antivirus, PDAs or (potentially) future IE products requiring opt-in to cookies rather than opt out. In November, 96% of visitors had cookie support, So far on December it is only 95.1%... So the trend is not looking favourable for cookies long term on their own.
For this reason, I think it is important to look at trying to measure those that do NOT have cookies, and that is where the 4.9% can start to cause havoc with the stats... are they really 4.9% of my traffic? Or are they less? I do not know, because I am currently using IP number to measure these, which - as those above have shown - has its own errors.
Session variables seem the next logical step - which at least tracks a visitor for the duration of the session, if not returning visitors. But session variables have a new set of problems on the search engines and when (if) people link to interesting content they may well use the whole link including the session variable as one possible error that might arise.
Is there any way to use a computer's mac address? that would do the job just peach.
Assuming there isn't, then my logic for the way we want to move forward is beginning to look like this:
2) If a user accepts cookies, then fine - record them as such.
3) If they don't, then create an ID based on all the variables that they DO pass to us (but I am thinking of using an IP range, such as the first few blocks in the number, rather than complet IP numbers), combined with the difference between their system clock and the server system clock.
4) Use that ID to identify a returning visitor (as long as they haven't changed their screen resolution etc).
Of course, this will break down when a site becomes so busy that there are many people using the site at the same time with similar clock time, from the same ISP and identical browser settings, which is why I am trying to work out more ways to make individuals "unique", but we are starting - I hope - to really squeeze the error margin by thgis stage.
Interesting that someone did their thesis on this. Care to share?