Welcome to WebmasterWorld Guest from 188.8.131.52
For a rough estimate- I multiply the reach (per million) of a site by 500- guessing that are 500M internet users. But sometimes this leads to both under- and over-estimates of a site's actual traffic.
I understand that this method will never produce exact numbers due to sample bias and re-directs, but does anyone have any advice about the underlying idea and the choice of 500.
While an interesting idea, I don't think the theory holds water.
While that's particularly true for low-traffic sites, even high traffic sites can be affected. I'd posit that there's a feedback loop at WebmasterWorld - as discussions about the Alexa toolbar are posted, more visitors install it, and the site rises in popularity compared to sites whose toolbar usage percent has remained constant. (For the heck of it, I put some "free toolbar" banners in rotation on another site to see if people would click on them and boost the site's Alexa popularity. It's not really clear what value that might create, although at some point a very good ranking might be valuable at some point. Brett can claim WebmasterWorld is a top 400 site based on recent data - pretty cool bragging rights, and perhaps some real market value.)
I like the idea, though, Namezzz - I'd guess there's probably some multiplier that could provide an extremely crude estimate of traffic, emphasis on crude.
So Alexa's traffic estimation for websites catering to this type of users (eg: WW . Slashdot ) will always be an inflated one...
But for more consumer oriented , mainstream sites the estimate can approximate to the real numbers
It would be nice to see if someone could come up with a ballpark equation for Alexa, but it would probably require a lot of us to throw some traffic numbers, alexa numbers, and site types out there.
I also have a feeling that reach and rankings do not follow a linear relationship with Alexa, but this could only be figured out by someone with a stats background and a couple hundred data points from a wide variety of sites.
Alexa Guy could always just tell us. Brett, when is he joining the site?
As I recall an Alexa number of 200,000 meant about 1,000 daily visitors.
This was discussed on a Jim World site about two months ago. Anyone know the site that had that graph?
But isn't there also a "Page Views per user" available?
So if the "Reach per million users" is 0.8 (stop laughing), and the "Page Views per user" is 5.5 then... This it where I find a problem what reach is...
Is 0.8 a percentage, i.e. 0.8% of a million users that is 8,000 so the page views is 8,000 x 5.5 = 44,000/three months per million users on the net? Or is it 0.8 for the whole freakin million? which would mean that my site gets 4.4 page views every million users?!
ARGH! I have edited this message like a dozen times...
I don't know much about Alexa (so correct me if I've got the wrong idea):
The No.1 site on Alexa is Yahoo, and Google is No.5 I don't think webmaster types and more tech savvy users would deliver Yahoo as No.1. - would they? Yahoo sure isn't my No.1 site.
The key to your approach (which I like) is to find out how many million internet users there are- then you can multiply the reach per million times the page-views times the number of internet users (in millions) to get an estimate for actual page-views from Alexa data.
Anyone else have other thoughts on what this might be...
By the way, isn't that a clever use of the web?
Yes, the sillyjokes domain gives pause. But I think it's legit.
"I found the site displaying the graph of Alexa Rank versus Average Daily Visitors. Its at
It doesn't look like it's been updated since it first appeared. Maybe by bringing it back to attention, others will contribute their data too. ( I hope so.. this is neat.)"
It certainly not perfect but from the bit I've read at WebmasterWorld the Alexa toolbar is possibly the best indicator of site popularity on the web because they have a couple of million toolbar users (anyone know the exact number). That is better than any of the other official sources of web use.
Site Global Rank
1. sapo.pt 293
2. clix.pt 808
3. iol.pt 818
4. terravista.pt 1473
5. abola.pt 3016
6. publico.pt 3141
7. netcabo.pt 3192
8. cidadebcp.pt 3380
9. record.pt 3759
10. aeiou.pt 4011
11. mail.pt 4112
12. portugalmail.pt 4143
13. mytmn.pt 5008
14. ojogo.pt 5765
15. expresso.pt 6635
16. telepac.pt 8565
17. bes.pt 8883
18. vizzavi.pt 9212
19. oninet.pt 9611
Heck, you could even take a smaller country, with a bigger language barrier, like Hungary, and find a bunch of sites in the Top 10K:
1. freemail.hu 1423
2. origo.hu 2027
3. freeweb.hu 4226
4. index.hu 4468
5. lap.hu 5151
6. vnet.hu 8013
It is perfectly understandable that a web mail account could have higher page views per day than a portal page - because users may be checking their mail many times per day.
But it is also conceivable that it would be the other way around. Consider the Yahoo home page which is likely to be loaded many times per day -- users who have it set to be their home page load it every time they close and re open their web browser - this could occur 50 to 100 times per day per person!
Couple of more thoughts...
Epistemology -- the branch of philosophy which investigates the limits of knowing. IOW, it asks the question, how do you know that you know what you know? This concept is at the root of all web tracking. All web stat, traffic and tracking questions ultimately boil down to questions of epsitemology.
Example of epistemology: you are sitting in a chair in a room as you read this. When you get up from the chair and turn your back on it, how do you know that the chair is still there? Perhaps it disappears when you are not looking at it and reappears when you are looking at it.
We take it on faith that the chair is there all the time, because our experience tells us that this is likely to be so -- in other words we assume it to be true, but we don't in fact actually know it to be true. In fact, there is no way to prove that the chair is there when no one is looking at it or touching it.
So it is with web traffic counts. You only know about it because you can "see" it in a report or log or some other manifestation.
We make many assumptions in doing so - we assume that the server is counting every hit, we assume that we are capturing the right information in the logs to distinguish between hits, page views and unique visitors. We assume that Analog, WebTrends, HitBox, or other tool is reading the logs correctly and not introducing errors of its own when analyzing the logs.
We assume that the reporting engine or graphing tool or OLAP tool is processing and displaying the data accurately and not introducing errors of its own - but all of these are basically proprietary black boxes and we will never have any way of knowing.
And perhaps the biggest assumption of all - that it is necessary to measure every page request coming into the server, and count the behavior of every visitor, as opposed to a representative or random sample of requests and visitors.
And all that is just considering our own site. When we want to compare one site to another, we are really comparing our assumptions to their assumptions.
From a logical analysis of assumptions, we arrive at skepticism- a questioning or disbelief that something is true, especially those things that we assume are true but haven't or cannot prove.
And we have seen posts from many here who are skeptical, for instance, of the idea that Alexa is a valid measurement of absolute web visitor counts, and therefore skeptical of the accuracy of its rankings.
At the heart of the skepticism is a question - how does Alexa know what it knows? Since we do not know the answer to this question, we are skeptical of Alexa's output.
Brett expressed skepticim about Nielsen ratings, questioning the accuracy of its sample sizes and questioned their ability to accurately represent all web users.
But the fact is, as skeptical as we may be, when you add up all of these assumptions, there is no way you will ever have the time or resources to systematically track down and verify each one of them, not to mention resolving all of the issues that you find.
On the other hand, by learning to use the tracking tools and reports that are available to us, through experience, we begin to feel comfortable with the assumptions that are built-in to them, even without knowing what all of those assumptions are. We accept the truth of the report even though the truth is a distortion of reality. Eventually, just as we trust that the chair is always there, we can trust the reports. Like riding a bike with crooked handle bars - we can learn to steer it.
But we don't necessarily trust the reports in an absolute sense. Rather we trust them in a relative sense by observing patterns - patterns of change over time, and relative rankings.
The entire subject of search engines and rankings of any kind is fraught with these issues. The reality is we will probably never know certain important methodologies, for example, search results ranking methodologies.
We should at some point, perhaps a few years down the road, be able to know what the popularity ranking methodologies are, and be able to compare, for instance, Nielsen's methods to Alexa's, but it will probably take some official independent research group or academic study to do it.
This will only be fair, because search engine methodologies were dreamed up by Phd's. Now we need some new Phd's to study what these other Phd's have done and tell us how they how they did it.
The Alexa chart at the sillyjokes site looks pretty accurate for the stats of some small and large sites that I am familiar with. It correlates ranking with traffic to within 10-15% of the "actual" traffic counts that I have recieved.
As for Alexa's rankings, due to the large number of assumptions that we each make in trusting our own web reports and the huge opportunity for (Error X Error) that occurs when trying to compare Web Site A's reports to Web Site B's reports, there are tremendous benefits to having a universal third party measurement system when it comes to comparing visitor counts and page views of different web sites - by measuring all sites by the same criteria, errors will be cancelled out, a major advantage.
Even if their methodology is flawed, if they are applying it universally to all sites they are counting, we would be able to rely on it as an important relative measuring tool.
Hope Brett doesn't shoot me for putting up the URL - there was a thread somewhere where we were collecting data, but can't find it now :)