|A snapshot of the dmoz unreviewed queue|
quality versus quantity
Much debate takes place in this forum about the time taken to process the unreviewed pools of sites in dmoz. Editors frequently comment that it can take a long time to find listable sites in the unreviewed pool, let alone ones with guidelines-compliant titles and descriptions. Webmasters frequently comment that sites should be listed more quickly.
So just to add some real factual data into the pot, I'll just highlight progress in one of the top-level category trees that receives a lot of webmaster submissions.
In the last two weeks and two days, the size of the unreviewed pool has been cut by a total of just over four thousand sites. That's approximately 20% of the total pool in that category tree. In the same period, the number of listed sites within that category tree has increased by less than 350 sites. So in this reasonably sized sample of reviewed sites, less than ten percent were listable in the branch where they were submitted, the rest were 404, under construction, sites with very limited content, spam, duplicates, mirrors, affiliates or mis-submitted sites.
So when dmoz editors tell you to submit a guidelines-compliant title and description so that it stands out from all the rubbish in the unreviewed pool, you can trust them that that's extremely good advice.
Good post John
I am amazed that there is such a high proportion of rubbish submissions editors have to wade through.
You can lead a horse to water......
|less than ten percent were listable in the branch where they were submitted |
In other words, 90% of the time spent by editors in reviewing new submissions is wasted.
It's amazing how many times at WebmasterWorld posters automatically jump to the conclusion that their site has not been listed because of editor abuse.
As we can see from your statistic, 90% of the delay is caused by submitter abuse. Perhaps with this level of abuse, it is time to either close the submission/suggestion page, or require some longer, more quality-assuring procedure as part of site suggestion.
Perhaps make a note on the submission site for webmasters NOT to submit sites ahead of time simply because "by the time it'll get listed it'll be ready". I've seen this so often just because it takes 3 months to list a site they assume it will not be looked at for 3 months. Instead it might get looked at in the first week and not list for another 3 months for various reasons.
Perhaps it'll be worth it to make it TRIPLE clear to folks that submiting "ahead of time" unfinished sites will lead in the site not beeing listed at all.
Excuse me for hijacking the thread, but I was wondering how long it takes from a site beeing found in a search on dmoz.org to the site actaully appearing in the category?
>In other words, 90% of the time spent by editors in reviewing new submissions is wasted.
I'm guessing the area he is writing about is Shopping or Business. 90% of all submissions being bad seems high overall.
Killjoy - AFAIK the two are independent since the second one is entirely dependent on when the public pages get updated - see multiple posts on the subject in the forum. Apparently the public pages are starting to be updated now, cat tree by cat tree.
<back on topic>
rfgdxm1 - actually I'm talking about Health. This problem is not isolated in just Shopping and Business. Health is full of online pharma sales affiliates, Regional travel and tourism cats plus Recreation/Travel are full of hotel booking affiliates, no doubt other category trees have their own particular bugbears.
The actual ratio of listable submissions varies by category type. At a guess, shopping, gambling, fan-site, and adult categories would tend to have a very low ratio, such as the 10% listability reported by John. John didn't mention the possibility that a listing could be forwarded to a different category type, but in these types of categories, the number of listable but missubmitted sites would tend to be very low. (Submissions can be both missubmitted and unlistable, though.)
I edit a number of categories related to insurance, so my examples will reflect a limited range of ODP category types.
Non-commercial categories such as government insurance regulators or non-profit insurance associations have a very high ratio of listable submissions. (Well, in the regulators case there have been very few submissions; most listings are direct adds by editors.) The ratio could be as high as 90%, with maybe 20% of those being listable submissions which needed to be forwarded to another category.
Consumer-info categories related to commercial products (eg Home/Personal_Finance/Insurance) have about a 67% listability factor. Half of those are missubmitted commercial sites which need to be forwarded for review in an appropriate category. I think of this as 1/3 accept, 1/3 forward, and 1/3 delete.
Personally the lowest listability I have experienced are commercial categories related to the Cash Flow industry. (How many times must one editor read, "We have thousands of investors eager to buy your note!"? ;) Only about 40% of submissions have been listable there.
The above numbers are rough order of magnitude; I haven't tracked actual statistics.
I think it is time for DMOZ to invest in some simple filtering tools to speed up the process and to stop wasting editors time.
404s and redirects are easy to detect by bots and crawlers. Mistyped or non existant sites submitted should be filtered off the top and not let into the review pool.
The "pending" sites should be re-scanned at least once every two weeks to make sure that they are still alive.
If 10% of all sites submitted are 404, redirects, or dead domains, this would make a very big increase in the actual useful work done by editors.
regarding sites that are found to be "under construction"
How about a standard email back ....
Submission dropped ref terms no xyz.
When site is completed submit appropriately, we do not review incomplete sites.
...but the submission guidelines, which submitters agree to at submission time, clearly state that such sites aren't accepted. Surely you aren't suggesting that some submitters don't bother to read them :).
I can edit in one of the Business categories. Most of the submissions to that category ARE worthy of inclusion somewhere in the directory in my opinion, but not in this category. I spend most of my editing time finding the correct categories for these sites and sending the submissions to other categories.
If I had any ambitions about moving up within DMOZ (I donít) I might be unhappy about this because (as far as I know) you get editing brownie points for adding, deleting, and modifying entries, not directing them to other categories. Anyhow, redirecting sites takes a lot of my time. I log in to edit about 2 or 3 times a month and spend all my time trying to make a dent in the unreviewed queue. I havenít looked at the sites in the actual category for at least 10 months or so. There are probably many descriptions that are out of date. The higher-ups at DMOZ would probably prefer that I spend time on those sites than the unreviewed ones, but I really want to do something about worthy sites that havenít been reviewed. The rate of submission about equals what I can process, so although the queue is big it remains about the same size over time.
>>you get editing brownie points for adding, deleting, and modifying entries, not directing them to other categories<<
You misunderstand a thing or two about editing if that's what you really think.
Editing statistics include "activity in the unreviewed queue" now. (They didn't last year.) Of course, 10,000 edit/brownie points can be exchanged for, um, 600 hours of free time ... no, that's not right, it's the other way around, the hours are non-refundable.
>How about a standard email back ....
Submission dropped ref terms no xyz.
At this point, I can't imagine any benefit to the ODP of communicating with spammers. And even the stupidest spammer can tell whether the site being submitted is actually there -- all it takes is pasting that same URL in the browser window of the same browser he's using to spam the ODP with.
Also, we certainly don't see this as a major part of the spam problem, so it's unlikely to call for the special attention of our limited programming allocations. There is a very rare problem with "site previously submitted temporarily down" (but dwarfed by invalid URLs and nonexistant sites and sites under construction.) So if we ever get to the point of giving more automatic status information, I don't see any immediate HARM to the ODP of allowing a status of "site down or empty when reviewed" to show publicly.
If we want to start automatic e-mails (and only staff knows whether this is true) the place to start would be with HELPFUL submitters. "Thank you for suggesting the site xxx, it has been reviewed and listed in category yyy."
Another place to start would be better up-front validation of site suggestions. (THis may actually be under consideration.) If the URL isn't well-formed, or the page can't be loaded, the form could immediately reject it (with a message.) There's no reason to get e-mail involved for this purpose.
Hi hutcheson re my comment "How about a standard email back .... Submission dropped ref terms no xyz. "
What I meant mainly was for those who submitted sites that were clearly under construction when checked by an editor.
I think it is a mistake to post "under construction" on anything, if it is unfinished dont upload or link to it.
I just hope working patterns mean such sites do not delay the processing of those which comply to guidelines.
I said some slightly harsh things about dmoz recently probably unfairly. My feeling is that for businesses there is some merit in "pay for review". It means hours can be allocated and unbiased people paid for their time.
I guess that is my main gripe with the way odp operates where businesses are concerned.
Developers of business sites know a dmoz listing is important. There can be no interest for genuine developers to either submit sites which do not comply with guidelines or which are incomplete.
"pay for review" and "unbiased editors" are an unhappy couple, since editing quality would inevitably drop in an environment where the editor has a financial incentive to review many sites in a short space of time.
>I just hope working patterns mean such sites do not delay the processing of those which comply to guidelines.
They do delay the processing of guidelines-compliant submittals. Spam is the main cause of that delay.
>My feeling is that for businesses there is some merit in "pay for review". It means hours can be allocated and unbiased people paid for their time.
>I guess that is my main gripe with the way odp operates where businesses are concerned.
ODP has both the strengths and weaknesses of its design. I believe it adds more sites daily than the top two pay-for-review sites put together. Every time I compare a category in Yahoo and the ODP, the ODP category is more current and larger. [This is probably NOT true for business categories, where I as a surfer don't spend much time.] Our forty-mule team pulls much more weight than your thoroughbreds.
One weakness is that its social organization is completely designed around no-pay-for-review, and so it has no way of harnessing thoroughbreds for additional speed. I believe the original ODP concept was for the ODP to deliver the borax to the nearest city, from where thoroughbreds belonging to the various general stores could deliver individual packets to people doing their laundry. This seems to be the way, for instance, Google's various directed advertising programs are moving, although they're not doing exactly it exactly the same way.
>Developers of business sites know a dmoz listing is important. There can be no interest for genuine developers to either submit sites which do not comply with guidelines or which are incomplete.
No argument here either. It is just that we see ALL of the faux-developers -- and their friends -- usually twice daily.
>>You misunderstand a thing or two about editing if that's what you really think.<<
Gee, Keith, I think you misunderstand a thing or two about Internet message boards.
This is the kind of remark that gives DMOZ a bad name with the general public.
Just when CrimsonGirl had me thinking there was hope for the ODP because it is mostly made up of good people that want to make the web a better place, you brought me back to reality. It's scary to think of all the kctipton wannabees out there ready to pounce.
|You misunderstand a thing or two about editing if that's what you really think |
It's a happy misunderstanding, though, in this case. (-: You've gotten credit for it every time you moved a site to a more appropriate category, just as much as you would have for adding a new site or deleting a bad one.
>>This is the kind of remark that gives DMOZ a bad name with the general public.<<
This web board and the general public haven't got a lot in common.
As for unreviewed, I've posted in the past that the duplicate submission problem in a tree of categories and subcategories can run from 1 to over 10 percent of the total. The higher numbers are usually in the more competitive categories.