Forum Moderators: open

Message Too Old, No Replies

DMOZ Editor Statistics

How many are really active? (revisited)

         

IITian

1:30 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



Even though the current tally is around 60,000 it has been mentioned that the number of "active" editors is around 10,000. The term "active" seems to be roughly equivalent to someone who has done 5 or more editing in last one year.

The term editing seems to be broad and even changing a single character in the title/description seems to be considered an act of editing. It is conceivable that many editors, who want to remain as editors for various reasons, would log in once every 4 months and make a single word change here or there just to fulfill the requirement for continuing their editor status.

I wonder, if it is possible for general public or even the DMOZ editors to get some chart or bar diagram indicating breakup of editors by number of sites added during the past year that were not affiliated with them. A representative output could be

#sites added #editors
1 - 5 ---------3,167
6 - 10 ----------456
11 - 25 ---------111
26 - 100 ---------56
101 - 250 --------12
251 - 1000 --------7
1001 - ------------1

This will give us more understanding of how DMOZ works.

rfgdxm1

1:42 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I doubt if such is possible, or if anyone with the ODP would reveal that. As for adds, no way of knowing if an editor is affiliated. Also note it is possible an editor could do no adds for years and be highly useful. Imagine an editall who liked blowing away spam from unreviewed queues. This makes it easier for other editors to find useful sites to add.

takagi

1:42 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just for my understanding, these numbers are fake? The table is just a way to explain how you would like the break-up of active editors?

IITian

2:04 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



rfgdxm1
You are right about ther useful activities by editors such as deleting spam or even changing title/descriptions. We could have a table for that too. :) I like some sort of rough statistics to confirm my suspicion that out of about 60,000 editors maybe about 1000 current ones are "really" dedicated.

takagi
Yes, it is a made up table. I wrote "A representative output could be ..." ;)

However, I think that the actual number will be similar to ones I presented.

choster

2:26 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'd need a lot more detail to make the table meaningful. For one, most editors only have editing privileges in a portion of the directory, and those with wider scopes often have different or broader responsibilitise (such as leading and executing reorganizations, or mentoring new editors, or processing new applications themselves).

Moreover, categories are as different as editors are. Someone who edits in the Sanskrit lexicons category may spend six hours a day hunched over Copernic searching for new listings to add and find none. Someone in international dating classifieds could spend six hours deleting spam and not have time to add a single site. But an editor in, say, Tennessee Baptist chuches might handle five submissions in five minutes-- leaving a hundred more in unreviewed.

John_Caius

3:07 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember that the 60,000ish figure is the total tally of ODP editors ever, not the number of editors currently listed in categories.

hutcheson

4:40 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You could estimate this as closely as would matter by the usual formula:

n(edits of (i)th-most-active editor) is proportional to
c(log(i)).

For initial values:

Take 200,000 edits as the "most active editor's" stat.
Take 1 edit as the "50,000th most active editor's" stat.

Take 20% of edits as unique site adds, and another 5% as duplicate site adds (dual listings in Regional/Topical, Spanish/English, etc.)

IITian

5:28 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



choster
I agree that such tables cannot do full justice to the special situation you mentioned. However, I think, if not this table, so other table like, giving each editor a %ile rank on various dimensions like #edits, #inclusions, average time of wait in each of the categories, unreviewed sites queue length et cetera. The reference group could be the entire directory, parent categories, other editors with similar experience ...

It could be for internal use only so that when editors log on, they find this info flashing in front of their eyes. It's non-accusatory and might encourage some editors to do better.

John_Caius
I think I mentioned that there are about 10000 "active" editors.

hutcheson
Thanks for the suggestion. I will however want to eliminate outliers and take into account the unique distributin of #edits - perhaps taking the numbers for the 100th most active editor and 5000th most active editor will be better for interpolation/extrapolation.

rfgdxm1

9:07 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I like some sort of rough statistics to confirm my suspicion that out of about 60,000 editors maybe about 1000 current ones are "really" dedicated.

60,000 is the total number of editors ever. IIRC active editors is something like less than a quarter of that. And "active" can mean as little as one edit in the last several months. Biggest problem with how you want to look at this is what choster brought up. If an editor only edits one or several very obscure ODP cats, he may be able to maintain them very well with just a few edits a year. Some topics don't have worthwhile new sites pop up frequently.

synergy

9:20 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



I'm an ODP editor, and I'm active. :) I've added 25 unique sites to my category since I joined just a little over a week ago.

In fact, ODP is very active right now. Lots of new exciting things going on that will go public in the next few months.

You all have to remember that ODP is a volunteer effort. Sometimes, real life takes over and you don't have time to determine if the 500 sites in your que are relevant or not.

In addition, the directory and it's editors aim to make it the most comprehensive listing of the web. It's based on quality, not quantity. Good things come to those who wait.

ettore

11:05 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



>> the number of "active" editors

Let's check my ODP "activity" today. I stopped RL work around 6 PM and since then
- went to the public forum to read submission status threads, spent some 30 minutes there
- went to the internal forums to read relevant threads, spent some 30 minutes there
- replied to 7 emails from submitters, 4 asking where their site could be listed, 2 asking why they haven't been listed yet, and one complaining that the editor in the "Google category where they are listed" didn't want to change the description of their site in an oh so wonderful keyword-stuffed one. Spent some 20 minutes there
- did some investigation on a case of alleged abuse, spent some 30 minutes there
- reviewed 3 new editor applications, spent 1 hour there
- while surfing the categories the applications were intended for, corrected 3 inappropriate descriptions
- prepared a brief report for an ongoing reorg in a small subtree, spant some 15 minutes

Total number of hours dedicated to ODP today: more than 3 hours
Total number of edits: 3
Number of new sites added: 0

Please (re)define "active".

synergy

11:20 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



amen :)

IITian

11:34 pm on Aug 25, 2003 (gmt 0)

10+ Year Member



ettore,

I have earlier stated in msg #4
>You are right about ther useful activities by editors such as deleting spam or even changing title/descriptions. We could have a table for that too. :)

Obviously you did useful work for the ODP. Actually the editors who are active are the ones most likely to be visiting such forums and therefore, there is a biased sample.

On the other hand, it is quite possible that there are editors who are fast asleep now with the alarm programmed to wake them up every 4 months so that they can log in ODP, change a word in one description, and then go back to sleep.

While computer generated reports cannot do justice to many editors, at least the most obvious forms of negligence can be easily spotted. In your case and some other editors who are dealing with cats with very few sites in existence, they will ignore the report which states that they are in the 10th percentile of sites added ranking compared to the entire pool of editors. However, if the report also shows that editors editing similar categories are scoring in the 25th percentile, it might lead to thinking of one's strategy. It's just a feedback and does not mean that one is doing something wrong or right.

John_Caius

11:37 pm on Aug 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, missed that bit on your first post.

Looking at your approximate guesstimate breakdown:

edits ---------- editors
6 - 10 ----------456
11 - 25 ---------111

The 100th most active editor is certainly an editall, if not a meta, given that there are several hundred editors with higher permissions, who have typically amassed between 10 and 30 thousand edits each on top of all the non-site-reviewing things they do, so eloquently highlighted by ettore. Given that most senior editors have been around for two or three years, that's around 10000 edits a year for somewhere between the 100 and 500 most active editors, not the more conservative 6-25 edits per year in your model.

3 million sites listed in three years is a million sites a year. Given an average of say 5000 active editors throughout the history of the ODP, that gives an average of 200 site adds per editor per year. In the areas I edit, I probably add one in five sites that I review and that's probably fairly average. So your mean value is going to be around 1000 edits per editor per year.

Just my guesstimates. :)

steveb

1:27 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"...at least the most obvious forms of negligence can be easily spotted."

Negligence? What are you talking about? Nothing in this topic suggests any negligence at all. An editor making one edit in four months either makes a microscopically positive contribution via that one edit, a microscopically negative one by screwing up a sentence, or has no effect (changing "is not" to "isn't").

You seem to not understand what is involved here. that editor isn't being negligent in the least. That editor isn't preventing another person from volunteering, or editing a specific category.

You seem to think there are only so many seats on a bus and some people are hogging them. That is not how it works. It's not even remotely close to how it works.

victor

7:47 am on Aug 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure, IITain, if you want rough statistics or exact data for each editor -- you've mentioned both.

ROUGH STATISICS
Rough statistics for sites added are trivially simple. Once a week, analyze the RDF (or spider the DMOZ site) and collect data on new sites added. You can in many cases, to a statistically significant level, assign edits to named editors:

  • If a cat has a named editor, allocate the additions to that editor
  • If it has more than one named editor, divide the additions between them
  • If it has no named editor, find the nearest name(s) up the tree and divide the additions between them.

    This will undercount additions by metas, and overcount for some dormant editors who share cats, but it will give the "sort of rough statistics" that you can use to confirm or deny your "suspicion that out of about 60,000 editors maybe about 1000 current ones are 'really' dedicated."

    The facts are out there, and we all know where they are. It just needs someone to do the work. And the best person to do that is the one who wants the results.

    I suggest you undertake to publish the numbers here once a month for the next year. If ODPers complain that the numbers are oversimplified, or biased, or whatever, they will be able to suggest refinements, and you can reanalyze until all objections are met.

  • IITian

    5:33 pm on Aug 26, 2003 (gmt 0)

    10+ Year Member



    victor,

    Good suggestion. I think I will took at the data and see what useful info can be extracted out of it.

    John_Caius

    John, if we take averages, you are absolutely right. My figures are wrong. For example, even if one assigns the largest number to each editor in each grouping, it leaving hundreds of thousands of sites to just one poor editor. ;) However, the point I was trying to make was that it is likely that while a few editors are really adding tens of thousands of sites, many might be adding just a few and seeing a chart showing where they stand might motivate them to improve their standings.

    steveb
    By negligence I didn't mean deliberate negligence. There are many small categories with very sincere editors and they sincerely believe that there are only say 5 sites in the world concerning that topic and they have found them all. To continue the editorship, they have to change a few things once in 4 months, which is a pain but tolerable. If those same editors are shown that compared to their peers they are lagging - not as criticism of their volunteer activity but just as a feedback - some of them might talk to other editors in similar categories to discover better ways to find new sites. For example, let's say there is a category based on novelist Robert Lastname. Later when the editor finds out that it is fruitful to look under "Bob Lastname" too, maybe more sites could be found. (This example is trivial.)

    John_Caius

    5:42 pm on Aug 26, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I know what you're trying to say and in some circumstances it might be useful to look at in more detail. Any suggestion made in external forums is inevitably read by senior ODP editors and sometimes new procedures are implemented on the basis of those suggestions.

    The reason why this kind of information for motivation probably wouldn't be employed in the ODP goes back to the volunteer principle - who would volunteer if they got named and shamed, or told that they had to do a minimum amount of editing? Yes, once every four months is a nominal amount but doing only this many edits would essentially rule an editor out of applying for a second category, restricting them to only an extremely small cat space, perhaps fifty sites in typically a not particularly commercial area. You need to clean up your cat before applying for another one, requiring perhaps 100-150 edits. The ODP staff and metas prefer to have a lot of people doing a little editing, plus a decent number doing a lot of editing, than having all the editing duties left just to the hard-core types.

    If editors were being paid to work, as is the case at LS or Y!, then I think it's quite reasonable to kick people up the backside if they're not pulling their weight. However, objectivity perhaps decreases when there is a financial reward for listing sites. Why should I delete all the useless online pharmaceutical affiliate sales sites when I get paid 5c for each one I list?

    There are specific procedures in place within the internal structure of the ODP, including category checks and suggestion for improvement by senior editors, new editor mentoring, a dedicated forum for New Editor questions, plenty of documentation on editing skills, editor-produced tools etc. that are easily available for new editors to learn new skills. One example is regular threads on how to deal with sites that are flagged up as not responding by the dmoz robot, Robozilla - advice on where to look to see whether the site has moved, does it have a Google cache, is there useful information on archive.org, is it part of a larger site that has just completed a re-org etc.

    The principle is that the information is there if editors want to make use of it, however there's no obligation to do so, other than that your edits, however sparse, should be unbiased and guidelines-compliant.

    IITian

    6:36 pm on Aug 26, 2003 (gmt 0)

    10+ Year Member



    >who would volunteer if they got named and shamed

    Correct. I am in no way suggesting anything like that. It should be used only for self evaluation and could be easily ignored if found not applicable to one's situation. It could be a good tool for senior editors to evaluate where DMOZ is going and what can be done to improve that.

    Anyway , I think I agree with victor that I should be able to present something concrete by using the data to show what I mean. Otherwise it is just empty air.

    After all is said and done, my biggest concern is the long delays in acceptance reported by many. On internet, projects have short half lives. I have a site that got accepted by DMOZ quickly and thanks to it and acceptance by one more directory, I am doing well on Google serps. I tried to get a few more links but was unsuccessful except for 2-3 minor ones. I would guess that DMOZ listing and the other directory listing accounts for most of my ranking on the search engines.

    However, say if I had to wait for a year for a listing, I would have been nowhere unless I paid big bucks for "review" in some commercial directories. I think inordinate delays in listing can kill many projects because DMOZ is so important. (I had applied to GoGuides too more than 4 months ago, still awaiting review, do I care, very litte.)

    motsa

    8:09 pm on Aug 26, 2003 (gmt 0)

    10+ Year Member



    I don't know how you can think that compiling these kind of stats will in any way speed up the review process for sites. Any editor who is going to be worried about not pulling their weight is likely already a self-starter who will go out and find additional work to do on their own, not one who will be inspired by stats showing what a slacker they are.

    Go ahead and compile the stats, but don't expect them to be anything but a curiosity.

    steveb

    9:04 pm on Aug 26, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    "If those same editors are shown that compared to their peers they are lagging"

    As explained previously this is wholly irrelevant and inappropriate thinking. Editors neither compete, nor is there any reason at all to compare them.

    You seem to be wasting a lot of time on a non-useful line of thinking rather than doing something yourself. The only reason things remain undone in some areas is because people don't choose to volunteer to do them, whether they be existing editors or those who don't even bother to join up.

    kctipton

    10:18 pm on Aug 26, 2003 (gmt 0)

    10+ Year Member



    Looks like we've seriously drifted into a "how ODP runs itself" mode...

    Back on topic please?

    mwaf_JK

    7:54 am on Aug 27, 2003 (gmt 0)

    10+ Year Member



    Not perhaps exactly what you (IITian) were thinking, but I think it would be easier (and perhaps more fun) to have general stats on the activity of dmoz as a whole. Such as 'How many sites where added', 'How many application reviewed' (including denied and accepted), 'How many logins' etc, on a daily or weekly basis. An activity rating could be calculated as an weighted average (ie. reviewing an application would be considered more activity than adding a single site). On the downside, these kind of stats could only be compiled by an editor (I'm not even sure if an editor could do it).

    However, I'm not convinced that having stats showing editor activity would benefit the directory in any way. For instance, compiling stats on an IRC channel has the risk of people starting to post lots of nonsense to gain higher ranking in the stats. Naturally this will only lower the quality of the discussion and dmoz is all about quality, not quantity. The only slight benefit I see is that it would prove that work is in fact being done at dmoz.

    Nonetheless, being a statfreak I would really like to see such stats. Just don't expect me to actually implement anything although I probably could manage to do something along the lines of my suggestion.

    LizardGroupie

    1:14 pm on Aug 27, 2003 (gmt 0)

    10+ Year Member



    The interest in this topic and everything related amazes me. I'm just glad that WebmasterWorld deleted that thread "Please apply to be a DMOZ editor" and the other thread about tips for becoming an editor.

    Really, is it anybody's business what happens within DMOZ except for the editors? I suggest you all go about your daily lives and leave DMOZ to those chosen to care for it.

    hutcheson

    1:26 pm on Aug 27, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    For self-evaluation purposes, most of the stats you mention are available internally, and it's fairly easy to see them for a particular editor. Of course, for self-evaluation purposes, most of the stats you mention are pretty nearly worthless. As has been mentioned, you can whip through Test/Misplaced/Manual racking up "unreviewed edits" at the rate of 100 per hour or more; or you can muck out Shopping/Health/Patent_Nostrums, spending 15 minutes per site trying to track down which affiliate program it's using, and trying to see if this site really has any evidence of anything unique at all on it. Which is more useful? _Both_ need to be done; both take the time they take. I'm better at one of them, someone else is better at the other. Who's the better editor? -- that's a nonsense question.

    IITian

    3:34 pm on Aug 27, 2003 (gmt 0)

    10+ Year Member



    hutcheson

    I agree that much of the statistics could be meaningless becasue of difficulties in comparing different things. However, more transparency in info can only help DMOZ.

    For example, lots of rumors, mostly unfounded and perhaps spread by webmasters who think that their sites have been unfairly treated, circulate about years it takes to get one's site included, how some editors are just there to help their own interest and block their competitors etc. There are some redress mechanisms but people have the inertia.

    To give an example: If a chart was made available about time it took for sites to get included, it will be difficult for false rumors to spread because DMOZ can clearly show the facts. For example it could say, "look, there was one site that took 5 years to get included , but median time is 7 weeks, and average time is 9 weeks, and the site that took 5 years to get included was probably because the cat editor was an ex of the person submitting the site."

    rfgdxm1

    3:47 pm on Aug 27, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    >and the site that took 5 years to get included was probably because the cat editor was an ex of the person submitting the site.

    Or the site was submitted to a neglected corner of the ODP, or the site was submitted to a very heavily spammed cat and got buried in all the spam, or the site got moved into a series of the wrong unreviewed queues due to editor error about guessing its topic, or...

    There is just no way of knowing from these kinds of stats if there was likely abuse. This would take a case by case evaluation.

    kctipton

    11:24 pm on Aug 27, 2003 (gmt 0)

    10+ Year Member



    I am willing to do a survey of a smallish (less than 250 listed sites) category of interest to the person who started this thread and then report back here. I would look at every _listed_ site and see how long each one waited as well as how many sites apparently were added directly by an editor and not submitted from the outside. There will be some caveats in whatever is learned, of course.

    pleeker

    11:48 pm on Aug 27, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I agree that much of the statistics could be meaningless becasue of difficulties in comparing different things. However, more transparency in info can only help DMOZ.

    How so?

    It seems to me that the original idea in this thread is based on the notion that DMOZ should do things to keep webmasters and SEO folks happy. That's just not the way it is. I'd suggest that DMOZ's main customer group is all of the sites that use its data for their own directory purposes, and the secondary audience is Joe Websearcher who uses DMOZ to find web sites.

    And I don't think either of those customer groups would care about a chart showing "how DMOZ works" by providing editor activity data.

    Just my opinion....

    IITian

    6:01 pm on Aug 28, 2003 (gmt 0)

    10+ Year Member



    kctipton

    Thank you kctipton for the offer. You could select the categry yourself since you might know which category some compaints migh be referred to. If possible select a category many visitors to this forum might be interested in professionally. Thanks.

    pleeker

    Professional webmasters are also users and while DMOZ should not be catering to them exclusively, they should not be ignored because they provide good feedback because of their strong interest.

    As is the matter with everything, that is not based on more rigorous analysis, any perception of DMOZ or even Google is based on selection bias of the sample. For example, webmasters who could not get their sites included in DMOZ within a reasonable time are bound to be much more vocal about what they "really" think about DMOZ than the webmasters who quietly are getting multiple sites accepted within weeks.

    So is the matter with the criticism of some of the editors. Rumors are spread by webmasters who think that their competitors, who also are the cat editors, are unfairly rejecting their sites or delaying the submissions.

    It is my belief that DMOZ is not as bad as the perception created by the upset webmasters. Some sort of statistics, while it will probably be attacked by almst everyone for its distortion of "reality", will nevertheless help in countering some of the serious alllegations and at the same time serve as a monitoring tool for DMOZ metas and senior editors. Just my opinion. :)