|DMOZ Frontpage: "Over 4,000,000 site" + "74,719 editors"|
In 9 years that's an average of 54 total websites per volunteer editor
| 2:18 pm on Dec 11, 2006 (gmt 0)|
4,000,000 divided by 74,719 = 53.53.
DMOZ has been around since when? 1998? Nine years?
I want to be supportive of the project. I think it's great to have an alternative search source. I think it's great that such a project might be operated on a volunteer basis.
That said, something is broken, badly.
74,719 people counted as editors? That "fact", being so material and significant, is posted on the homepage. There's a reason for featuring it. However, when I paused to "do the math" - to understand why there are recurring issues about the vitality of DMOZ - it occured to me that something MUST be broken somewhere.
Tens of thousands of volunteers. Almost 10 years. And 55 websites per volunteer?
Now, I know the volunteer editors didn't all sign up today. However, I also know that the notice about tens of thousands of volunteers has been in place, on the homepage, for years.
So, we're talking a directory that offers approximately 6 total new website entries, per year, for 9 years. I'll cut that number in half, to say that for the total of 9 years there were an average of 37,000 editors. That works out to 12 new websites per editor per year added to the directory. Alright, I'll cut that down to an average of 17,000 editors/year - averaged out of the 9 years - producing an average of 24 websites added per editor, per year.
That's 2 a month.
What's my point?
- The figure for the number of editors is grossly inflated; or,
- The expectations of productivity - for maintaining "editor status" are extremely low; or
- There is a systemic problem whereby websites are added, by the editorial pool, at an absurdly slow and unproductive rate; or
- There is a gaping void between the number of websites that exist and the number of websites that are worth a damn;
75,000 volunteer editors, searching the web for valuable content, useful websites, reviewing voluntary submissions.
With those type of numbers of volunteers why, on earth, isn't the vast majority of all search being done via DMOZ? I mean, there are 75,000 active, engaged minds of volunteers committed to scouring the WWW in search of useful websites, doing all this to help their fellow human beings find what they are looking for.
DMOZ ought to rule the world of search yet, in terms of inbound traffic that webmasters routinely study and report, DMOZ isn't even a whisper when - with 75,000 volunteers scrutinizing and aggregating the world's website information - it ought to be a roar.
I regret to report that when I pause the reflect on what is reported as reality - 75,000 volunteers, 9 years of work - something is either badly broken or poorly managed.
There's simply no good explanation for that many volunteer editors not being able to produce a tool that should wow the pants off of anyone who ever needed to find something of value on the WWW.
By comparison, how many people work for About.com and yet, with "staff" numbers that monsterously DWARF the About.com paid staff, which one pulls more traffic?
Per Alexa December 2006: About.com #40 ODP #87
| 4:10 pm on Dec 11, 2006 (gmt 0)|
So many misconceptions justify a reply :).
You keep referring to ODP as being used for search but it isn't a search engine; it's a directory. One is organised by keyword often in order of SEO effort/money expended rather than value to the end user. The other is organised as a hierarchy of categories by topic. If somebody wants to search using keywords, they should use a search engine. If somebody wants to find a lot of sites about say melanomas, use a directory. It's all about reaching for the right tool for the task in hand and I use whichever seems most appropriate.
The approx 75,000 editors is the total number who have ever joined. At any given time, there are about 7000 of them. Some editors never log on at all. Some list a few sites, maybe including their own and then we never see them again. Others are happy to nurture the niche category that they're interested in in perpetuity. Yet others get hooked and gain permissions in ever larger sections of the directory. These latter do thousands of edits in their spare time and a few do tens of thousands.
An editor doesn't 'occupy' a category blocking others from doing work there; anyone with higher level permissions can also help. Providing that the incumbent editor does an edit every 4 months, the account stays live. After all, just a few edits is better than none isn't it?
Editors are hobbyists and mainly enjoy being productive in areas that interest them. Some topics, such as travel, real estate, shopping and commercial medical websites are so beset by spam, doorways and mirrors that few have the stomach to work there. It's no coincidence that these sectors are amongst the most competitive on the net. It's those sectors and the SEO people who mainly complain that we aren't volunteering hard enough.
I understand that the Alexa figures are derived from data from surfers who have the Alexa toolbar installed. Most of the people that have are SEO people AFAICT - hardly typical users :).
| 4:36 pm on Dec 11, 2006 (gmt 0)|
Right back at ya Mr. Noble. ;0) I said I wanted a more interesting and informative DMOZ dialogue.
To respond to your first point I never said it's a search engine, so your first comment is a bit of a nonsequitor. On the other hand what is one expected to employ a directory for if not "to search" - for something?
Whatever DMOZ "is" or exists "for the purpose of" it appears to be underutilized as a channel for "finding websites". At least as evidenced by the reports of people confirming how their website was found. There are not many reports of clickthroughs from DMOZ. IF DMOZ exists in aide of the public "finding websites of value" - which certain appears to be is rasion d'etre - why aren't more webmasters confirming that fact? Like I said: So much time and effort invested - premised upon the idea of producing quality and qualified "search leads" - and . . where's the usage data to justify the effort?
That statement is not, in any sense, made to denigrate the mission or purpose of the DMOZ or its possible utility[/]. It does offer value, in my observation, but appears not to do a very good job of either a) getting the word out; or, b) ultimately [i]offering value in a form or format that draws in users, which may invite a new thread along the lines of "For all the effort what is missing in the DMOZ output that accounts for the low usage stats as evidenced by reports of inbound traffic from DMOZ".
Might even be grounds for a separate thread: "What IS the target market of users and beneficiary websites of the DMOZ"? Maybe this crowd - the WebmasterWorld crowd - is not the intended beneficiary market? IF SO then maybe DMOZ might do a better job of articulating to the public that DMOZ 'is da bomb' at helping you find . . . what? Non-profit educational resources?
From the bottom of the DMOZ homepage:
|over 4 million sites - 74,719 editors - over 590,000 categories |
So far are 75,000 versus 7,000 there is no qualification to that number on the homepage. Why mislead the public in any way? The statement certainly suggests there "are" 75,000 editors at work, not that there's only 7,000 at work on the project.
Putting aside whether 75,000 editors is ambiguous and therefore misleading it's still a fair inference, given the history of 75,000 volunteers and 4 million listings, that the numbers works out to 55 listings per volunteer - my point being that the apparent merits of the project appear a bit inconsistent with the apparent output of the numbered - 75,000 - volunteers. That number - 55 listings per "approved" editor - seems off to me. Either there's a problem in gauging the earnestness of the applicants OR there may be a problem in how things work once editors are approved.
OBTW, given the apparent merits of the project and the apparent low demands, why a) so many drop-outs; and, b) why so few active supporters? Maybe that's a subject for another thread for one might assume that a project of such merit might either a) attract more volunteers amongst the 100s of millions of netizens; or, b) more be a bit more proactive, i.e., do a somewhat better job of bringing on new volunteers. Big assumption. ;)
Maybe I'll add c) With 7,000 "active" editors what is the cumulative total of new websites added to the directory in the past 12 months?
So far as Alexa, I've got nothing invested in their "facts". However, I previously raised the question of whether DMOZ has a body of user data and whether DMOZ has ever made the data available. Certainly, since it's in the nature of a "public works project" it would appear that exposing aggreate (not individual) user data would be reasonable and justified.
What current user data is available to the public?
[edited by: Webwork at 4:50 pm (utc) on Dec. 11, 2006]
| 4:49 pm on Dec 11, 2006 (gmt 0)|
|55 listings per volunteer |
You forgot the "per year" you mentioned earlier. Which is one of the reasons that the calculation is invalid. The number of editors includes all editors ever joined to the ODP. No matter if they were active for just one day or joined only a few months ago. You would need to have some kind of counter telling you how long each editor is active with the ODP to make the numbers fit the reality.
If you want to compare with the total number of editors, you would need the total amount of sites ever listed. Which is a number unknown to me, but I am pretty sure it is a lot larger than the given number. If I remember my stats correctly, I removed about 10k sites from the ODP during my editing career (dead, content changed, ...).
|Why mislead the public in any way? |
It's not meant to mislead anybody. It's meant as a tribute to all the people who spent time contributing to the ODP.
|What current user data is available to the public? |
You can find the current editor count in the status reports the ODP started publishing about a year ago. Once the server hosting them is back online, that is :-)
For each and every user you can see which categories he/she is listed in from the profile. We chose not to give editing logs and dates of last activity of a specific editor and similiar stuff to the public, because we consider that private data.
[edited by: windharp at 4:59 pm (utc) on Dec. 11, 2006]
| 4:55 pm on Dec 11, 2006 (gmt 0)|
4,000,000 divided by 75,000 is 53. Not per year. That's 53 total listings per historical editor for all time.
75,000 "as a tribute"? I didn't see that mentioned and all the while I was thinking "Wow, there's 75,000 editors". Not sure why I thought that. Maybe a little more context to the homepage statement would help. Small matter to clarify.
Maybe if you stated 7,000 "active editors" more people might apply? The statement - as is - suggests that there is already an army of editors at work.
Perhaps someone might answer the question: How many sites were added by the work of those 7,000 editors in the past 12 months?
Again, my interest in not to denigrate in any way the efforts of those volunteers, but rather to have a better public undertanding of exactly what the DMOZ IS and is not, what works and what doesn't work, why it works a certain way and why it may never work - as now configured.
At the end of it all it would be nice to see it work better, as evidence by greater public utilization, which might also translate into greater public support and better support by "the funders".
[edited by: Webwork at 5:02 pm (utc) on Dec. 11, 2006]
| 5:01 pm on Dec 11, 2006 (gmt 0)|
The increase of listed sites is another figure which can be seen in the status report. (once it is online again).
The total sites listed in a period of time is not a figure we currently monitor. As you can imagine this is differnet from the growth :-)
(Archive.org didn't catch the reports, sorry)
| 5:37 pm on Dec 11, 2006 (gmt 0)|
In any organization, you run into the 80/20 rule (if it doesn't turn out to be 90/10 -- at any rate, it's a logarithmic scale on number of editors versus sites added by editor. Figure one editor with 250,000 adds, 60,000 editors with one add, and interpolate.
You could use almost exactly the same graph over at Project Gutenberg. Just label the x-axis "pages proofread."
| 7:08 pm on Dec 12, 2006 (gmt 0)|
ODP´s monthly reports are available at [freenet-homepage.de...]
(usually they live at [research.dmoz.org...] )
The volunteer community has published these reports since January 2006, the content of the reports covers the time period January 2005 - September 2006. (More recent reports will follow as soon as the system is up again and we can get at the data and tools.)
I´d strongly recommend to read these reports thoroughly before trying to do further calculations and estimates, webworks. Especially as you´ll learn not only a lot about ODP, but also about the challenges of maintaining any really large directory, or other types of large human-compiled link collections, over a long period of time under current conditions on the web ;-)
Specifically, if I may call your attention to the February report: [freenet-homepage.de...]
It explains why your attempt to calculate editor productivity based on net (!) growth:
|Over 4,000,000 site" + "74,719 editors" - In 9 years that's an average of 54 total websites per volunteer editor |
can not produce meaningful results.
In brief, the biggest mistake is that you use data for a time period of ~ 8 years, without considering that there might have been any linkrot, and that someone (or something - ODP has various automated quality control processes) might have cared about getting the rotten links out.
The second-biggest mistake is that you reduce editor activity on the act of "publishing" a listing, while forgetting a rather long list of other tasks that are important for directory-building and maintenance.
| 8:05 pm on Dec 12, 2006 (gmt 0)|
I'd say that the Internet being what it is, the number of websites that were added at one point and then had to be deleted one or three or nine years later must be at least twice as many as the number of websites that as are still in the directory today. It's a pretty rare website that lasts ten years, when you think about it.
So it's probably more like an average of 150 websites per editor. Which isn't really unreasonable as an average. The "average" editor is probably one who signs up for one category that they happen to be very interested in. Most categories have less than 200 sites that could be listed there at all, so, there you go. Then on one end of the bell curve you have highly active editors who manage a huge category or add sites to hundreds of different categories, and on the other end, you have all the people who sign up but only add a few sites before getting bored and wandering off (from my experience with non-profits, ALL volunteer projects have a high drop-off rate.)
| 9:24 pm on Dec 12, 2006 (gmt 0)|
|It's a pretty rare website that lasts ten years, when you think about it. |
Do you look at whois creation dates as indicator of a sites suitability or is it based on what you see at the point of inspection? In that an older creation date may provide less food for robozilla?
I would have thought an average of 54 listings per volunteer editor would be very good and an average of 150 would demonstrate a healthy core of editors with a positive long term commitment :)
Are all the 4.8 Million listings added by editors or do some of them come from other sources?
| 9:34 pm on Dec 12, 2006 (gmt 0)|
>Do you look at whois creation dates as indicator of a sites suitability or is it based on what you see at the point of inspection?
Generally, the content on date of review is all that matters. For an "aggregate content" site, there's nothing but good sense to keep an editor from checking the date and letting a very new site wait unreviewed for a few months, to see if it takes off. But I wouldn't expect that to be common at all, and in "competitive niches", I would expect it to basically never happen?
>In that an older creation date may provide less food for robozilla?
On average, perhaps. But if we were to go by averages, we'd just delete all sites, since the minority of sites that are listable today will die sometime. There's no substitute for checking them one by one.