| This 77 message thread spans 3 pages: < < 77 ( 1  3 ) > > || |
|State of the ODP: 3.5 million in, 1.1 million unreviewed.|
I just looked up the exact numbers, and the above are the current figures for today. If anyone is wondering exactly why it sometimes takes so long to get sites approved, the above statistics should reveal why. I do have access to the complete breakdown, but I don't think the Powers That Be at the ODP would approve of me revealing this publicly. However, the most backlogged categories are Business, Computers and Shopping. No surprised there about Business and Shopping, although this high number of unrevieweds in Computers surprises me. The percentage unrevieweds in Business as compared to the the number of listed sites is staggeringly high. Of course a lot of those are no doubt spam. However, the editors have to slog through the spam to get to the sites that should be listed.
|...however it's completely misleading to rely on the 1.1 million figure as an indication of the efficiency of the review process. |
Ok, but it does tell you something about the overwhelming information overload facing the ODP and its editors. To organize the web efficiently you will need a fully automated web directory engine where categorization, description, and submission is delegated to the web page owner. Admittedly, such a web directory will probably contain more spam and be less organized, but at least it will organize the entire web in a timely, updated, and comprehensive way.
It’s all a matter of what the end user prefers: a highly organized subset of the web, or the entire web a little less organized? In a comparison with search engines, part of the explanation to Google’s success is that Google indexes more pages than any other engine. Maybe the same holds true for a comprehensive web directory?
Human edited directories are probably better suited for smaller niche subjects where qualified expertise can handle the volume of web pages and really make a difference to the end user.
Laisha, to answer your question, I would say that the percentage of applicants that are accepted is higher now than that figure. Most of it has to do with the more comprehensive application form which deters people applying for the sake of it.
|Ok, but it does tell you something about the overwhelming information overload facing the ODP and its editors. |
I don't think so. The Dewey decimal system, invented over a century ago, is still in use today and think of the millions of different books that it accurately catalogues. The same is true of ODP ontology - it is extremely flexible (too flexible in some cases). Besides the great, great majority of sites submitted already fit into the ontological scheme that's already in place.
|where categorization, description, and submission is delegated to the web page owner. |
This is already in place - it's called meta tags. The usefulness of meta tags has been amply demonstrated.
|The Dewey decimal system, invented over a century ago, is still in use today and think of the millions of different books that it accurately catalogues. |
When speking of information overload I was referring to the number of unreviewed submissions, not the classification system (BTW I don't think you can compare books to web sites, books don't change and they don't cease to exist).
Meta tags is a standard for specifying meta data for a document, i.e. data about the document itself rather than of the content of the document. By taking advantage of this underestimated but highly powerful mechanism a web directory engine enabling Internet users to organize the web themselves in a robust and secure fashion is made possible. And there is a small but growing web directory that proves that this can be done.
|How many of those 1.1 million are Server Not Found or 404? How many are duplicate submissions from aggressive submitters? |
I can see the point of the ODP denying editor applications. They may want quality editors instead of quantity. They have apparently repeatedly ignored my applications to edit categories. I now have a different way I can help them out.
What I don't understand is that the above problem can be cured programatically. I hereby volunteer to help them eliminate submissions for sites that return errors and duplicate submissions. Heck, I'll even make the script send a mail to the person who submitted the site telling them they have been rejected because their site returned an error. This type of simple improvement could dramatically help out the submitters and editors.
I believe that I can cut down on duplicate submissions by checking to see if the site is already submitted and if so either returning an error on the page at the time of submission, or by emailing the submitter. Simple customer service is not that expensive and can be undertaken without requiring incremental human effort for each new user. Not doing it is simply a derelection of responsibility.
I don't understand why this kind of thing is not already in place. I'm not smarter than the people currently running the directory. If there is some problem with my proposed solutions such as encouraging abuse of some kind then find another solution. The current policy is not a solution.
Actually, I would be surprised if Google and some of the other people who actively use the directory data for businesses aren't lending a hand in this area. When are we going to see some changes to the current policies?
I don't think anyone is running DMOZ.
It certainly is apparent that if there is someone running it, their pay isn't based on performance.
I have doubts that AOL/Time Warner...Netscape even care about DMOZ or it's problems. If it were a profit making entity the problems would be cleared overnight. The RDF problem has existed since they published with errors on about Sept 17th (they published a notice then) and then republished Sept 22nd. They aren't putting a rocket ship into orbit...just crawling a database...kick errors out in a log and publish! Then cure the problem creating the errors.
The concept of DMOZ having humans reviewing sites is great, but having humans correcting entry errors into the database is warped. To say it doesn't matter because Google crawls anyway, is also warped.
I would suspect that hundreds of small engines are now or shortly will be crawling DMOZ to create a database and drop their reliance on parsing the RDF Dump. Google is starting to look foolish with their RDF Dump that is three months old. They should probably have their own software to create a parseable database.
There you go! Someone should write some software to crawl DMOZ and create a parseable database and the software to parse it....make it open source. Then no one would care if DMOZ ever got it right.
>Pretty much "ditto" rfgdxm1. I think 95% of all whines against the odp are because some low quality site wasn't added. The problem is the 5% that are accurate. I think this is partly what laisha was referring to with those that get chainsawed - it darn hard to figure out if that 5% is worth a darn.
Tell me about it. I applied for being editor a new cat, and was granted it within 24 hours. I just had a look from the editor side exactly what is what. Excluding a huge child cat that has an active editor doing an apparently very good job, this new cat space of mine has about 2,000 listed sites. It also has about 500(!) unrevieweds that have to be evaluated, some are over half a year old. While I do see some blatant spam, the vast majority of these submissions look legit.
I have no intention of chainsawing anything, and will evaluate each submission on its merits. However, in a few days on the public side it should start displaying that I am the editor there. Most of these greens are in fact commercial sites. Are people now going to start whining that I am a corrupt editor because I have half year old unrevieweds still around it my cat space? A lot of these submissions are to misplaced cats and need to be moved, and of course many will need site name and description edited to remove promotional hype. However, having to review over 500 sites properly is something that is going to take quite a bit of time to do. My personal goal is to get this done within a month, but this might be more complex than I imagined. I'm going to have to hope that so long as I make steady progress in evaluating these unrevieweds competently and reducing the number, none of the metas are going to lean on me for having cat space in as bad of shape as this currently is. I am just hoping I don't see people complaining at resource-zone.com or such about me. :(
|I know that two years ago, only about 10% of applicants were accepted. I haven't read anywhere that that has changed. |
I've posted on this board, and elsewhere, that my personal proportion of acceptances has risen from about 4% to almost 30% since early 2000. I attribute this to a number of improvements-- a more detailed application which cuts down on joke/dare/experimental applicantions as well as cases of "not sure what he meant by XYZ, rejecting to be safe," better server-side checking against accidental duplicate applications, restrictions on applying to very large or complex categories not customarily assigned to new editors, and more consistent quality of listings in various parts of the directory meaning that applications which ape existing listings are more likely to pass quality muster.
I'll caveat that by saying, however, that I'm only one meta-editor, I rarely work in Adult, Business, Computers, or Shopping, and I no longer handle as many applications in the other categories as I used to.
steveb's observation applies to, er, applications just as it does for site submissions. We do not get terribly many applicants day-to-day for Philosophers/Kierkegaard, Museums/Aviation/Africa, or Crafts/Balloon_Sculpting. For real estate categories, on the other hand...
>Yes, you do. Many (overburdened) editors do it, and quite likely completely by accident.
Laisha, what was mentioned was editors chainsawing through unrevieweds. That implies just dropping a bunch of them in the bit bucket that should have been added. By definition chainsawing would mean what happened was by intent, not accident. As for an overburdened volunteers editor occasionally rejecting a site that deserved being added by accident when slogging through a huge pile of greens, of course that happens. AFAIK, no infallible beings happen to be ODP editors at the moment. ;) Currently I am responsible for handling 500 or so of that 1.1 million unrevieweds at the ODP. As I am not infallible, it is quite possible that I'll end up calling a ball a strike through inefficiency. I'm not perfect.
Also, it may not be chainsawing, but editorial description. Every submitter thinks that there site deserves to be in the directory. However, the editors standards may disagree with that. The very word editor implies that discretion is involved. While there are clear rules such as that an editor may not favor their own sites, or not add sites on the basis of ideological bias, the guidelines are pretty much open ended as to what level of quality an editor may decide is necessary for inclusion. Hypothetical. I am the editor of the spit bubble blowing cat. Someone submits a page that they believe is the ultimate authority when it comes to blowing spit bubbles. However, IMO it is just a trivial page about that topic that doesn't come close to the quality of the sites already listed, and I reject it. Let's say you look at that page, and disagree and think that it is one of the best pages out there about blowing spit bubbles. Does this make me an abusive editor? Or, is it that we just disagree about standards? I'd say the answer is the latter.
>I would really like this question answered, though it seems to have been lost in the postings
AFAIK Laisha, the percentage of applicants accepted isn't public data. The problem is that the percentage is irrelevant without knowing how many apps just proved the submitter was barely literate. Or, were dubious because of abuse possibilities. Because of abuse possibilities, rubber stamping all apps in some cats could be a Bad Thing.
Hmmm, for those applicants who get rejected for reasons other than inability to write a coherent and literate description (e.g., cat applied for is too big or doesn't need help, possible abuse concern due to site ownership, etc.), maybe the rejection could offer a trial in a small, neglected cat that really needs help. If an editor was willing to prove himself/herself for a period of time, it might serve both parties needs. I'm sure it would be possible to generate a list of small cats that haven't had an editor in years... and if DMOZ selects the cat, the possibility of abuse is minimal.
You could start a "submitted but rejected" cat a sort of b list. It would keep things out in the er...open.
|By definition chainsawing would mean what happened was by intent, not accident. |
Actually, it's a tool.
rogerd, when a new applicant applies to a category that is too large, too complex, in the wrong language, etc., an effort is made to find a suitable starting place for the new editor. Sometimes, a new catgory may be created (often the case with an author or movie which might not yet be represented), or the meta-editor may make an educated guess that someone who submits three sites in Hungarian might prefer to edit in World/Magyar. But as often as not, the newly accepted editor has no interest in the category granted them, or may even feel offended ("I'm not some plebeian art dealer, I'm an art broker, I don't want to edit art dealers!") and resign.
Commonly, therefore, where I think the applicant has potential that was not expressed because the application was not taken seriously or s/he did not understand what the ODP actually was, I'll reject "with guidance," with a form letter with suggestions for re-applying. Of course, some don't bother, and other get offended by this as well ("I already *told* you that I am the founder of the true religion, I demand that you give me Society/Religion_and_Spirituality").
As with all volunteer situations, we make an effort to cultivate applications, but you wouldn't accept someone who's illiterate to be a volunteer literature tutor, or for that matter allow someone convicted of lewd conduct to volunteer at a preschool.
There *are* unreviewed submissions dating from mid-1999 in some parts of the directory, but most unreviewed sites over 12 months old are usually dead anyway. The oldest sites have usually been mis-submitted to the wrong cat and then passed around between seldom-edited cats.. slowly.
There *is* spam, lots of spam, especially submitted by bots. Some submitters are extremely agressive. But the quality of the submissions in general is terrible - a lot of editor time is spent.. well.. editing. Around 95% of submissions don't meet the guidelines, a fair chunk of things are in the wrong category. Some sites are just rubbish.
If everyone actually wrote a decent description and found the right cat everything would be better, but without putting people through a Zealot-style test it ain't gonna happen soon.
However, the more care a submitter takes the better. If you want a really good description, write a really good description yourself. If you want to be properly listed, spend some time looking for the correct category. A DMOZ listing is worth probably about $300 a year by commercial rates, so it's worth spending time on the submission process rather than just lobbing a site into the maelstrom and hoping :)
>> To organize the web efficiently you will need a fully automated web directory engine where categorization, description, and submission is delegated to the web page owner. <<
These exist on the Internet. We don't hear about them because the results they return are mostly spam, and of little use to the searcher. The Open Directory Project is one alternative to that sort of thing.
It's not perfect, but I trust that the results are better than we would see if us webmasters were fully in charge.
>> maybe the rejection could offer a trial in a small, neglected cat that really needs help. <<
We offer that sort of advice frequently, and we have sometimes accepted editors to categories other than the one that they applied for. As someone mentions here or elsewhere in these forums, the latter action often doesn't work out; the new editor feels slighted or confused, and either never logs in at all or immediately applies to edit the larger category and quits when he is denied.
I may have missed someone in this thread making a comparison with Zeal/LookSmart but there is a model which seems to work well.
The commercial element, LookSmart, works or not because you pay if you want to be in it and normal market forces apply. The non-commercial side is taken care of by an army of voluntary enthusiasts who must demonstrate competence as they assume more responsibility.
I have submitted 5 sites to Zeal in the last month as a trainee Zealot and all have been reviewed within 48 hours.
DMOZ is dead in the water unless it changes its operational model in my opinion.
Can I just say that ive signed up for greenbusting duties on an Internet Marketing cat at ODP after reading this thread (well, the beginning of it anyway).
There is unbelievable amount of crap to wade through.
Do people even read cat guidlines?
Ive been deleting porn for a lot of the time....not a bad job, but still its irrelevant the cat and the office staff are giving me funny looks... ;)
To all the people that are moaning about the state of the ODP:
1. If you are an editor already, then try to spend a bit more time helping out (i know a lot of you do a lot already!) - if everyone lends a hand then the job will get done that bit quicker.
2. If you are not an editor, either sign up or shut up. You dont have to submit to the ODP - you choose to do so.
As do many, many other people - most of whom submit innappropriate crap into the cats.
The ODP is the one place where you can become an editor and actually make a difference to the state in which it operates. I realise it's time consuming, but lets face it, if youve got time to complain about it, then youve got time to do help out.
Just my humble opinion though (and i havent read much of this thread so im not directing at this anyone in particullar)! :)
Now ive got 330 unreviewed to deal with!
When it says 3.5 million in and 1.1 million unreviewed...what exactly is the real average daily over time of unreviewed sites.
What I want to know is if the number of "unreviewed sites" at ODP is getting larger over time.
What is an "unreviewed site" definition?
Is it a function of ODP editors being negligent in their duties in reviewing submissions to their categories?
What in the process of reviewing a submitted site that is difficult for editors to create a 1.1 million backlog?
Is it the act of actually taking the time to view submissions, or providing suggestions to the submitter, or a problem with the ODP review software or process?
Can anyone shed some light on this issue?
What is an "unreviewed site" definition?
An unreviewed site is a site in an editor's queue waiting for the editor to do something with it. The editor has (I think) five major things they can do:
1/ Publish it in their category
2/ Delete it
^^ Those two things reduce the unreviewed by 1.
3/ Kick it over to another category -- it'll still be in "unreviewed" though technically it has been reviewed by one editor, found not to be suitable for the submitted cat, and passed on.
4/ Leave in Unreviewed while they think about it, ask questions of other editors about its suitability, or other reason to not act on it immediately.
^^ Those two things leave the unreviewed count unchanged.
5/ Put it in "personal bookmarks" -- that's for sites that are not ready yet -- maybe they have promise but are mainly an "under construction" banner.
^^ I don't know off-hand if that drops the count or not.
Take my example. I never use the personal bookmarks. I have ten sites that come up as "unreviewed". I have reviewed every one of them, and they probably belong in my cats. But:
-- several of them are not much more than an "under construction" page (like they might be a promising home page but all the menu items 404). I'll revisit these once a month or so to see if they've made any progress
-- Several of them are listed in World under other languages and they are claiming to now have enough english pages to get a listing in my cats. I disagree: when they've done a bit more work, I'll add them.
-- In two cases I'm awaiting clarification from the site owner. In both cases I've been waiting just about a year for them to reply to my email.
I suppose it'd help if there was an "under consideration" status, because that it where those ten sites are.
It'd also help if we could distinguish:
-- sites in unreviewed that are not listed. That's new sites trying to get in;
-- sites i unreviewed that are listed. That's sites trying for a second listing in another category.
|Is it a function of ODP editors being negligent in their duties in reviewing submissions to their categories? |
You need to remember that these guys are volunteers, they do not have to turn up for "work" every day :)
|What in the process of reviewing a submitted site that is difficult for editors to create a 1.1 million backlog? |
1. There are not enough editors. All sorts of reasons for that - not enough apply, too many that do apply are turned down, editors get fired, editors get fed up with the politics and leave, ...
2. The editors that are there do not process enough sites. Why? They do not have a quota to process, nobody pays them so its up to each editor how much editing they do.
3. Quality against quantity arguement - does an editor try to keep the unrevieweds down by processing a lot of submissions in a given time, or do they process fewer and give each submission more time for consideration.
You could try applying to become an editor, you will probably get turned down, but at least you will have tried!
>What I want to know is if the number of "unreviewed sites" at ODP is getting larger over time.
From the last stats I saw before, it doesn't seem to be getting larger. Thus the current editors are at least keeping up. One thing to note is that the issue of the backlog of unrevieweds depends a *lot* on the category in questions. There are a lot of areas of the ODP where the category editor just has that cat, or maybe a few other small cats. Many of these editors are conscientous and log in weekly or more frequently. Thus, a site submitted there will likely get reviewed in a week or less. However, there are other areas of the ODP without editors (or, more accurately the only editor is way higher up the tree, and has so many sites to worry about in his cat space that he doesn't have the time to deal with many of the unrevieweds.) I was recently approved for new rather large cat space as an ODP editor that had over 500 unrevieweds, many of those 6 months old. Thus, if you submit a site to some small hobby cat with an active editor, you may get in lickety split. However for example, in this new cat space of mine if someone ran a substance abuse center and submitted it 6 months ago, only now is an editor actually reviewing the application. Such is the way things are at the ODP.
>I have submitted 5 sites to Zeal in the last month as a trainee Zealot and all have been reviewed within 48 hours.
>DMOZ is dead in the water unless it changes its operational model in my opinion.
The problem here is that the ODP model was designed such that submissions would be free. Also, the reason why the Zeal model works so well is that Zealots *can't* add commercial sites. There isn't much incentive for abuse in categories with informational/hobby type sites. Thus, few such editors ever abuse their position. However, with the ODP also listing commercial sites, with the Zeal model where anyone with enough inclination can get to be editor of almost any cat, if the ODP did that I am sure *many* people would become ODP editors for corrupt reasons if the ODP used the Zeal model. The ODP by necessity has to be very careful who they let edit commercial cats with lots of abuse potential, and the metas need to keep watch to make sure everything is being run honestly. Thus, I don't think that changing things at the ODP to get down the number of unrevieweds would be a good idea if it meant a lot of corrupt editing happen.
Make that 1 million, 99,991 unreviewed.
An editor of a specific cat I've been interested either recently checked his email and acted on my inquiry, or else Santa showed up early for me. Either way, I owe somebody a big ol' plate of cookies and some nog.
My list of submissions, each covering the specific information for a separate geographical area within the NFP I aid, are now listed after having been just sitting there awhile. I would never have thought to write an email if I hadn't read the idea here.
Happy holidays to all.
|You need to remember that these guys are volunteers, they do not have to turn up for "work" every day |
That is part of the problem. Some editors join soley to promote their "own" websites or are SEO pro's.
I believe that editor's should be fired if there is inactivity in their editing for say ...a one month period.
Why volunteer for ODP if you don't do your volunteer job? No one asked them to specifically join. The negligent editors read the TOS and they said they would/could do the job as an editor.
Shame on those who take on the task and don't or fail to perform.
|I believe that editor's should be fired if there is inactivity in their editing for say ...a one month period. |
They are. An editor is removed if he or she doesn't make an edit in their first month as an editor, or one edit every four months thereafter.
As for the rest of your post - every contribution to the ODP is valued. Even if an editor makes one edit and then resigns, that's one edit more than would have been done otherwise. These people are editing for nothing except personal gratification. It's a hobby not a job.
I believe that editor's should be fired if there is inactivity in their editing for say ...a one month period.
Um, and how exactly would that reduce the backlog of unreviewed sites?
If an editor adds a single site every other month, then he's contributing something useful to the project. What's the point in firing such a person?
rfgdxm1... thanks for the info. You are a wealth of information!
Excellent point, bird. Getting rid of editors won't reduce the backlog of greens. It should also be noted that higher level editors can also edit child cats beneath them, including those that have editors. Thus, if there is an editor in a nearby parent cat, that editor likely will review the site if the lower editor has not edited in a long enough time.
>What I want to know is if the number of "unreviewed sites" at ODP is getting larger over time.
Right now it seems to be fairly stable.
>What is an "unreviewed site" definition?
Could be many things, including:
-Site submitted to correct category, waiting to be reviewed
-Site submitted to absurdly incorrect category, waiting to be (or slowly being) moved toward the correct category
-Nonfunctional but interesting-looking site, waiting for webmaster response to editor query [relatively rare]
-Formerly functional URL, being held to see if it will come back at the same place (or waiting for an editor to try to hunt it down at its new location, if any)
-Incomplete but interesting site, being held to see if it will be completed.
-Possibly legitimate, possibly spam site waiting for more thorough investigation.
-Functional site, being moved to another category because of a general reorganization, or because it had been accepted to a wrong category at first. [hopefully rare]
-Site, already reviewed and rejected dozens of times, being submitted yet again.
>Is it a function of ODP editors being negligent in their duties in reviewing submissions to their categories?
No. ODP editors are volunteers. There are no assigned "duties". There are only "editorial privileges."
>What in the process of reviewing a submitted site that is difficult for editors to create a 1.1 million backlog?
The biggest problems are deceptive spam, really badly mischosen categories, and taxonomic difficulties.
Many sites submit the same content under multiple (sometimes thousands) of different names, with slightly different presentation, hidden behind various schemes that cannot be automatically detected. In some categories, editors may have to spend half an hour or so to find whether a site has any possibility of having unique content. (Submittals may pile up in such areas, if editors don't want to deal with the garbage. And the good sites have to wait with the other 99%.)
If you submit a site written in French (or any of the other dozens of barbarian languages the ODP contains) to one of my categories, I'm not going to be ABLE to review it. I have to try to guess the language (if it uses a non-Latin-alphabet, I won't even be able to do that) and send it someplace where an editor might be able to read it. Or if you submit a pen-and-pencil retail site to Arts/Literature/Authors (because authors write with pens, and you're engaging in marketroid-think target-audience-targeting, and you are too stupid to see that most of the Authors are actually DEAD) (I am NOT making this up!) then when I'm editing Literature sites, I'm not going to drop everything to explore a part of the Shopping category I've never visited before. I'll move that site to the "Badly-Misplaced" holding tank, waiting for some volunteer (which on some other day might be me) to feel inclined to do some rough site-sorting.
And finally, the ODP taxonomy is still growing. Sometimes, in some places, sites build up because we haven't yet figured out a good way to organize them.
>Is it (#1) the act of actually taking the time to view submissions, or (#2) providing suggestions to the submitter, or (#3) a problem with the ODP review software or process?
Generally we don't DO #2. #3 is not a major problem: the ODP editing interface is IMO one of the half-dozen slickest website interfaces I've seen. It could be improved; I hope it will be improved, but most of the time is spent (1) reviewing and (2) categorizing sites. (Note that if you categorize well, the review WILL be faster, and the categorization will be MUCH faster.)
We could always talk about the number of sites that the ODP HAS listed.
| This 77 message thread spans 3 pages: < < 77 ( 1  3 ) > > |