DMOZ Submission Review Delays and Automating "Community" Approvals

Forum Moderators: open

Message Too Old, No Replies

DMOZ Submission Review Delays and Automating "Community" Approvals

Going forward would it make sense to change the system?

Webwork

5:37 pm on Dec 10, 2006 (gmt 0)

One of the consistent gripes that has appeared in this forum takes the form of "I applied months ago . . . "

There is always an explanation or response.

Yet, I pause to wonder: Is there the possibility of a better way of processing submissions AND is that better way going to be programmed into the new (hopefully) release of the ODP?

Why not default submissions to approval and listing after 60-90 days?

Why not allow default listings to appear, after that time, with a marking that it "has not been reviewed"?

Why not allow "approved at large reviewers" to vote on default listings, which will automatically throw them back into the queue if enough at large reviews confirm that either a) the website is NOT what it purports to be; or, b) it doesn't fit in the category?

Isn't it time to move towards a Web 2.0 model of listings?

Webwork

5:28 pm on Dec 12, 2006 (gmt 0)

I envision a model where submissions are tagged with the identity of the submitter and that, as the community votes to 'upstream' the submission, the count number of upstreamed and approved submissions begins to add 'movement weight' to the initial submitter's ability to 'forward' (expedite) future submissions. The essential 'voting on the voter' model, the granting of authority by the community by their agreements with the initiator's votes. Such authority can be lost or put on hold, too, by evidence of either loss of judgment or corruption. It's simply the real world of election-appointment-authority-impeachment modeled by software and applied and monitored in real-time. Somewhat likely daily elections. (Before we throw out Congress you might want to ponder some of the historical reasons cited by the founders for building in certain inefficiencies.)

A modified version of 'Digging', applied to the accretion of entries to a directory?

Perhaps the overseers need to remain in place, perpetually or at least until the algorithm / process is fine tuned, but at a certain point one would think that whatever the mental or judgment process is that the overseers now apply that process could be modeled by an application/software - only to run better in the sense that allowing the editorial process to scale to tens of thousands of realtime participant-editors might a) speed up the submission-review-acceptance-listing process; b) democratize the setting of standards of value (a website might get listed but it's appearance might reflect a scaled community value judgment, one that is subject to change since websites change); c) other benefits?

As stated earlier: The objective of this thread is not to end the conversation with the statement "the system is broken". No, the system works - as is - to whatever degree the users judge it to work and, obviously, in a manner acceptable to those who have a say in mainting or changing the status quo. However, the mere fact that a majority of control says "No, it's good enough" isn't the end, especially in the context of an open source or GNU data source.

I am certain that the existing model will and can survive - as is - so long as there are editors willing to participate in the 'as is' model. However, somewhere in the world, there may be those who are working right now on a democratized or open-to-the-public-open-source version of the ODP, where submissions and approvals will be handled in the manner roughly outlined so far.

Might it be better for the existing overseers to at least run an experiment of a new submission-approval model, as a parallel process to the existing version of the ODP process, or at least as an experiment to be applied to some section of the ODP? Not my call. I'm just opening a dialogue with the idea that there might be some merit to testing the community editing process for a directory.

Experimenting - trying on evolution - while it holds the possibility of extinction or threat to the procedural and control status quo, actually not smell of death but of new procedural-control "life" springing from old.

But who ever said risking death (of authority or control) or letting go was easy? ;0)

Besides, maybe we cannot trust the greater good of the greater community. Maybe the spammers outnumber the masses?

We are borgspam? We are all spammers at heart?

[edited by: Webwork at 6:40 pm (utc) on Dec. 12, 2006]

hutcheson

5:36 pm on Dec 12, 2006 (gmt 0)

The capcha "solution" keeps coming up, but it's a solution to a problem we don't have.

No doubt, it's very good for keeping bots out of forms on low-to-medium-visibility websites.

For high-visibility sites, there is a high motivation for the botmakers to break capcha--and in this as in all other areas, automated attackers have an inherent advantage over visible automated defenses. And the ODP is a high-visibility site.

And secondly, my impression is that bots are not a significant unsolved problem for the ODP: our approach is we let the bots in, then WE attack THEM where THEY are the fixed visible defenders. We have the inherent advantage, and they have the insoluble problem.

So what's our actual problem?

Professional spammers: people who have memorized the process because they do it several times a day, repeating every week or month. No one spammer causes much trouble, it's just that there are a few thousands of them, new ones appearing every day, old ones giving up in frustration and despair. But these people are the ones who go around suggesting every new site to dozens of "directories" and "blogs" and anywhere else user input goes. The ODP is near the top of their target list. So we make them waste another thirty seconds of mickey-mousing each time they submit, will that slow them down? No, they'll STILL HIT US FIRST, but they'll just drop off a blog or two at the bottom of their target list. (Which helps US not at all!)

So: capcha simply will not address the kind of spam we face!

The people who follow our submittal policies--suggesting their one unlistable site once--aren't a problem, even though many of them suggest unlistable sites. We can ignore them in any analysis: their effect is dwarfed by the professional recidivists. These people might or might not be put off by capcha: it doesn't matter and it's not worth asking the question.

But the most important problem is the issue of who "capcha" would put off. It should be obvious that capcha wouldn't slow a spammer down at all. And it wouldn't slow you professional SEOers down at all. You get paid, and you'll go through the mickey-mouse if that's what it takes to get your clients' money.

THE ONLY PEOPLE WHO WOULD BE PUT OFF BY MICKEY-MOUSE REQUIREMENTS ARE THE PEOPLE THAT THE SUBMITTAL FORM WAS DESIGNED FOR.

That is, casual well-wishers, the kind of people who AREN'T highly motivated to get past barriers of institutional distrust, but who are willing to repay trust with good information--if it isn't too much work. The kind of people who might become editors if they didn't get the impression that we were trying to make it artificially hard for them to help.

Remember, our "spam" is not one jerk in Moscow or Chicago e-mailing 40 billion possibly impotent Lotharios. Our "spam" is more like what you'd see described as a "distributed denial of service" attack -- done, so far as I've seen, primarily by hand. One jerk doesn't suggest 500 sites a day (well, usually, but that's been taken care of!) Each jerk just stands in line daily or weekly to add his straw to the camel's that day or week, then disappears.

And ... that pattern of site suggestion, as a pattern, is indistinguishable from the pattern of suggestions from genuine business-to-business web development services, who regularly create websites for actual real-live persons, partnerships, organizations, and companies of people working together in business -- who suggest each site to the ODP as it is published. These suggestors are a help that we don't want to give up. So spotting spam is more complex than just looking at patterns of suggestions.

willybfriendly

5:42 pm on Dec 12, 2006 (gmt 0)

It seems to me that virtually every attempt to discuss DMOZ and any problems or improvements follows a very predictable path.

The Editors know best. The staff knows best. No one from the outside really understands. The system is as good as it can be given the mission of DMOZ and the resources available.

We have all heard it before...

The responses are a sign of a detached, self absorbed, and ultimately doomed system. I am not criticizing indivdual editors here, since I know that many of them are in fact dedicated.

However, DMOZ suffers from group think. It shows in threads like this one. It shows in responses to appeals about individual decisions made by individual editors. It shows in the condescending responses towards webmasters that try to engage DMOZ volunteers in dialogue.

The rigidity is killing the project. Look at this thread. Four pages of Why don't you... Yes, but...

WBF

Webwork

6:05 pm on Dec 12, 2006 (gmt 0)

Actually, WBF, to the extent that the existing ODP editors and volunteers wish to perpetuate the existing model it will continue to work - within the limits of the model. The existing model may be the very best model, as hutcheson has done a very good job arguing.

It can be argued that the existing model is broken only by comparison to other models (Wiki, Digg, etc) but some of them are showing cracks with Digg being subjected to efforts to inflate certain Diggs for their promotional value.

No one as yet has fashioned an operating automated directory accretion model built upon community editorial approval. All we see here are attempts to tease out the model and analyze its strengths or weaknesses.

My money is on the possibility of 'collectivizing' at least some part of the DMOZ/directory submission and approval process. IF anyone is positioned to both take the lead and to benefit from the effort - if it succeeds - it's the ODP. My concern (?) is that for wont of at least attempting to launch the idea as a controlled experiement the existing ODP may cede oversight of the version of the ODP that the public is most likely to access. I would suggest that the very act of attempting to democratize the submission and approval process might be likely to expand the user base.

And, yes, lurking out there are legions of spammers. The same that lurk about the Wikipedia, Digg, MySpace and so on AND the only way to address the "how to" of dealing with spammers is to run tests and deal with spammer's evolutionary tactics too, which may come down to a future of assigned IP numbers or who knows what. Those who are at work on the "how best to" automatically and democratically deal with the problem likely will have some small advantage going forward over the "we'll do it by hand" crowd.

Webwork

6:43 pm on Dec 12, 2006 (gmt 0)

Can a version of "Digging" be applied to the submission and approval of listings in a directory?

Does Digg work? Why or why not and how might the model apply - or not apply - to building a DMOZ style directory?

Maybe it's time for the DMOZ 'widget' or browser plugin or toolbar? Visit a website. Like it? Submit it in realtime from your browser or desktop within a few clicks?

The existing DMOZ model is valid, is likely to go on and on, and it works. So much has been said about every business that had a status quo, such as classified advertising, print journalism, music distribution, telecommunications, buying airline tickets, booking a hotel, . . .

The impossibilities of effective change, to open up and democratize or streamline a business process are often or always endless . . until some start-up proves the assumptions wrong.

[edited by: Webwork at 7:46 pm (utc) on Dec. 12, 2006]

hutcheson

7:56 pm on Dec 12, 2006 (gmt 0)

Might I suggest that question is worth starting a separate thread for?

There are probably people here who have experience in such projects. And there are all sorts of potentially interesting questions:

(1) What kind of abuse-prevention techniques are compatible with Wiki content development processes?

(2) How well do they work against highly-motivated malicious spammers trying to drop links, and how well do they scale up?

(3) How do you build (and do you try to identify) the core community? How do you handle highly-active saboteurs?

This 36 message thread spans 2 pages: 36