Forum Moderators: DixonJones
I have a feeling that weblog is a good segment since it is topic/subject neutral. I would imagine that, for instance, computer related industry websites would probably get more traffic than, say, food industry websites, because the Internet audience tends to be more computer savvy than, say, a TV audience. Also, the average visitors a day for the banking industry would naturally be larger than the food industry since, to open a bank, you need a substantial capital upfront. In comparison, there are many food-related websites that are run by individuals. I think Weblog would represent a good mixture of all the segments. I would be very curious to hear different opinions on this matter.
The reason why created this analysis was because I was fascinated by the fact that no one seems to have a clear picture of how web traffic is distributed along the percentiles of all websites. For instance, if you get 50 visitors a day, what percentile are you in? Is your site above, below, or around the average?
When you hear that some sites like Instapundit.com are getting over 80,000 visitors a day, you think that your site which is getting, say, 100 visitors a day seems to be very low in the ranks. Well, you are not. According to my study, 100 visitors a day would place you around the top 35 percentile. So, you wonder, when does it jump from 100 to 80,000? You can see it on my graph. It happens around top 1 percentile. Compared to what happens once you reach that top 1 percentile, any increase in visitors before that is miniscule. I conclude that this is how fame works. The vast majority of us are nobody. The difference between the top 2 percentile and the very bottom percentile is negligible compared to the popularity of the top 1 percentile.
[edited by: heini at 5:01 pm (utc) on Oct. 9, 2003]
[edit reason] delinked, thanks. [/edit]
>> an online source for industry average figures for site statistics?
Apart from the Scandinavian countries, i have not seen publicly available listings measured consistently using a reliable method. Even there, these listing do not cover all sites, only the larger ones.
>> I did my own analysis of this by using Weblog stats
These graphs are interesting and confirm what i've been comtemplating myself, although your graphs seem somewhat too extreme. Itīs not even an 80/20 rule, more like an 99/1 one according to your graphs. My guess would be more along the lines of 80/20, as that is more "normal" (as in the normal distribution in statistics, the Gauss bell).
This could in fact be a thing that's specific to the weblog segment, so it's possibly neutral but not necessarily representative.
One serious caveat: A sample of 1299 sites using sitemeter is not a reliable one for general trends. It's (a) far too small, and (b) skewed, as the sitemeter is not used equally as much by large and small sites - you don't find many yahoos that use these public stats counters.
It would be interesting to run the same type of research using a query for "webalizer" type public logfile summaries.
/claus
Thank you for your welcome.
I actually think that the extremeness of the graph makes intuitive sense. Naturally the number of popular sites that you can recall in your heads (such as Yahoo, Google, NYTimes, CNN, etc..) would be much more than the number of unpopular sites. The fact that you can recall them is the very basis of them being "popular". Because of this, we psychologically feel that there are a lot of popular sites, but in comparison to unpopular sites, they are a tiny minority.
If the world contained only 10 people, everyone would be famous, because everyone in the world would know who you are. Increase that to 100; the same would probably still hold. But, once you go beyond 1,000, there would probably be people you never heard of. I think there is a maximum number of people that can be famous, which is fixed, due to the way our brain and memory work. The number of famous people would probably not increase beyond 10,000. So, whether there are 100 thousand people in the world or 100 billion, makes no difference to how many people can be famous, which means that the larger the population, the more extreme the graph of fame would be.
Even though it is difficult to imagine a world where there are only 100 people, we could imagine when the Web first started, when there were only 100 websites. All websites then probably got more even numbers of hits. The curve would have been quite flat. For instance, the top site getting 10 visitors a day, and the bottom site getting 5. As the number of websites increases, the more extreme the curve is going to be.
This makes intuitive sense. How often do we run into someone who is a household name? If you took a random sample of 10,000 people, the chances of one of them being a household name would be still quite slim, which means that, to be a household name, you have to be above 0.01 percentile. In the same way, take a random sample of 10,000 websites. The chance of one of them being a site like Yahoo or Google is slim to none, because there are so many unpopular sites out there.
Since the number of websites with household recognition is limited, the larger the audience becomes, the larger the gap becomes between top sites and ordinary sites in terms of traffic.
In order for this type of analysis to be accurate, the most critical thing is to keep the sample as random as possible. This is difficult to do. If you try to do this with any business-related websites, the more popular the site is, the less likely it would be that they would share their statistics. This is another beauty of Weblogs. Weblog owners usually do not care about making statistics public. It's not that critical for them.
I do not disagree that the graph has the right shape. I only think that this one - as based on blogs - is too steep.
The dramatic increase happens at top 1% and not at around top 20%. I think this is due to a skewed sample.
>> If you try to do this with any business-related websites, the more popular the site is,
>> the less likely it would be that they would share their statistics
As i mentioned in post #3 this is not the case for Scandinavia. I have just made a similar graph showing the 77 largest Danish commercial sites, you can see the raw figures here, they are published weekly by FDIM, the Association of Danish Internet Media [fdim.dk]. The first column containing numbers are unique visitors - similar figures are published in Sweden and Norway.
I cannot post the URL to the graph i made, as it is against the TOS of this site ("self promotion" - i reckon the one you posted will be deleted as well when the mods see it), but if anyone is interested they can sticky me for info. This graph shows that the increase starts already at top 30% and at top 20% it gets steeper.
/claus
[useit.com...]
and
[useit.com...]
and
[kottke.org...]
HTH,
Midwestguy
The curves presented on these pages seem to conform to mine. kottke.com's graph, for instance, looks milder but that is only because the Y-axis is shorter. Mine tops at 90,000. His tops at 7,000. If your top site happens to be NYTimes.com which receives millions of visitors a day, the graph would look even more extreme, but the actual curvature would remain virtually the same.
Re: self-promotion. Just so that you know: I was concerned about it. I would have posted the entire text here if it were not for the graphs. Is there a way to inline images in here?
You are right. The sample I took from TruthLaidBear.com is more extreme than what Kottke.com has. According to the former, the top 20% controls 92% of the traffic, as opposed to 80% on Kottke.
Also, I should not use the term "curvature". I guess a more appropriate term would be "distribution." What I meant to say was that distribution should not change much over time or from one segment to another, with which Kottke seems to agree.
-Dyske