Forum Moderators: open
It probably goes without saying, of course, that the categories which rely upon user submissions are also usually the ones with 12-hour turnaround times. (-:
helleborine: So I threw in the towel and searched the ODP data from Google and to my delight, found a a well-maintained category, without spam, SEO or other aggressive nonsense. It was like a oasis of peace, I had come to the right place.
Amen to that. There are certain topic areas that are so spammy in all SEs including Google that the results are useless, and the only way to go is with a directory, and for me that's typically dmoz.org, the Google Directory or Alexa.
There's some interesting ideas floating around, but you need to realise that the ODP is pretty conservative when it comes to implementing new things.. this isn't necessarily a bad thing, it shows that there's quite a strong culture at the ODP.
Here's a couple of my own ideas of things that may or may not be cool.
Firstly - accelerated paid reviews. It's never gonna happen with dmoz.org, but there's absolutely no reason whatsover that a downstream user of the data can't enhance the directory with additional listings. The downstream user could merge their paid data with the ODP's RDF dump. As long as the downstream user puts the correct attribution on the page, then it's all perfectly acceptable. Somebody like Google could probably make a fair profit from enhancing their directory in this way.
Secondly - make the unreviewed sites accessible, either through the dmoz.org interface or as an optional RDF dump - because there *are* times when valuable sites are stuck in the queues, even if it means wading through some spam. Within dmoz.org it would be quite possible to use robots.txt to ensure that the unreviewed sites carry no PageRank, and visitors would examine those sites at their own risk. Downstream users could use whatever technology they thought appropriate to weed out spam from the RDF dump. OK, this is not for 99% of web surfers, but it could be a powerful tool for those who know what they are doing.
Don't forget the the "Open" in "Open Directory Project" actually refers to the use of the data - as long as you comply with the license, you can take the data and innovate with it. So instead of saying "wouldn't it be great if the ODP did this.." you could actually go off and *do* it yourself :)
Likewise, direct feedback on site status isn't going to happen. The last thing we want to do is help spammers, so when you see vague answers to questions, then that is probably ODP editors giving as little away as possible to people known to be causing us problems.
No, actually, we already have to do that for every rejection. It's not all that time-consuming, really just takes a minute or two.
I can think of several reasons not to publish our unreviewed queues, though, off the top of my head: 1) the availability of the data would encourage spammers to flood us with useless submissions in new and irritating ways, since this would gain them some minimal amount of publicity for free, 2) having access to these queues would give the cleverer spammers extraneous information about how well different spam techniques work by comparing which submissions are eliminated more quickly, 3) I'm not sure how hard or easy it would be to separate sensitive information from the data before publishing it, such as internal editor discussions or the email addresses of submitters--no one agreed to have their email address shared when they submitted and most submitters probably do not need more deposed-Nigerian-dictator spam.
Anything that would encourage more spamming of the ODP is probably not going to be a popular suggestion--if the spam and security problems associated with it were eliminated, I think providing our downstream users with more information is always nice. But I'm way too non-technical to know how possible that is myself. (-:
The ODP Two Step
Submit in the proper dance hall per the guidelines
Scoot a boot on across the floor and out the door
and seek the next link y'all.
Wrong category.
A gem that I'm delighted to list.
That spammer again.
Wrong category.
A description that makes my eyes bleed.
Wrong category.
A deeplink.
Already listed in proper category.
Wrong category.
Duplicate submission.
A site in Turkish.
Wrong category.
Secret? More like under the blanket, and rightly so.
The free PR would be unreal. I bet if you set this up, within a month you'd be looking at millions in donations.
Just a goofy thought from someone who's in an antihistamine haze :)
Sean.
Too bad you guys didn't read the debian contract closely. Rule 3 would be pretty handy.
3. We will not hide problemsWe will keep our entire bug report database open for public view at all times. Reports that people file online will promptly become visible to others.
Don't know if it was based on the debian contract (never heard of that, personally) but even if it was, "based on" wouldn't mean "exactly like". So items that appear on the debian contract but don't on the ODP one would be completely irrelevent to anything.
There are editors' forums for bugs in the editors' interface, and they are open to all applicable users (i.e. editors).
As for bugs in the public interface, if they get reported by the public (in public forums), then, um, we know about them...(and typically some editor will either thank you and pass the message on, or tell you it's already been reported.) I'm not sure how that's different from what Debian does (except with different categories of user). Generally, when the sort or submit functionality gets broken, it gets noticed here. With the very small number of software developers at any one time, we simply don't have (or need) the complexity or formality required to track a compiler, OS kernel, or the like.
It's *not* opening up the entire contents of the unreviewed queue and all the details to the public - that would be daft. But it *could* allow researchers to see what is still waiting to be processed.
There's a proposal at Slashdot to allow subscribers to see rejected stories for the same reason - if you know how to sort the rubbish from the useful stuff yourself you can get more value.
As I tend to say often.. if you think outside the box on what to do with the ODP data then you can do all sorts of clever stuff.
Back to my suggestion of publishing the unreviewed queues.. they wouldn't have any PageRank anyway, it's a power users tool, you couldn't anything that the title and description and you use it at your own risk.
The downside is it's very likely to create more spam for the editors to have to deal with.
TJ
Aside: Americans may remember the furor surrounding the Clinton's casual distribution of selected files. The problem wasn't so much the violation of privacy (although it is curious that this violation -- much more egregious than any other U.S. administration since Johnson has committed -- didn't so much as cause a peep in protest from the usual self-proclaimed "privacy advocates") as the fact that the files included records of all allegations made to agents: the false ones, the unchecked ones, the wildly insane ones.
The unreviewed queues are full of toxic waste -- sometimes filtered, sometimes unfiltered, sometimes chemically enhanced. They are not useful for any kind of "research" other than figuring out what kinds of spam we have the hardest time spotting.
For a long time I expected someone to take the ODP and add paid listings; that hasn't happened in that form, and I'm not sure exactly why. Perhaps after Looksmart's impressive self-immolation gave a bad reputation to all such schemes. And the pure-commercial-play classified-ad "directories" probably don't WANT free listings devaluing the advertisements. And there aren't any other big independent information-directory projects.
The RDF is out there. If you can imagine and implement a cool new kind of added value, it's probably worth a LOT of Google PR ... for whatever incentive that's worth.