Forum Moderators: open
- A handful of good Editors, doing their best, outnumbered by thousands of “capricious” editors.
- Spammers draining the (few) resources of well minded volunteers, taking time and value for their precious job.
- Millions of non-indexed sites, unattended.
- Pay per submission is a no starter.
- Very few resources available to pay editors.
Ok, here is a possible solution. A parallel, Dmoz based, hight profile public Directory, addressing those problems from a for profit perspective. In sort, a new, for profit, Dmoz distribution. Think of it like the relation between Slackware Linux and RedHat Linux.
That new distribution will be free to implement a business model that allows committing resources to complement the original Dmoz listing, just like RedHat carries the original linux Kernel and its upgrades, by paying editors to purge the spammers and add new sites to its “extended” Dmoz version.
Of couse, Dmoz will not necessarily incorporate the new distribution changes to its main RDF dump, but the public availability of the new Distribution could have similar effect to the original one.
I’m sure a few of us have resources to pull this one out.
What do you think?
if so many people are dissatisfied with DMOZ
I'm not that dissatisfied. sure there is abuse of DMOZ by editors/submitters but we're legitimately doing OK out of it.
I like the idea that Marcos proposes also, now that I understand it. Whether it takes off or not I'm not sure that it matters but it certainly raises awareness and may prompt someone at DMOZ to do something about the quality of their directory... who knows?
>think the solution will be clear if and when DMOZ recognise that
>there is a problem
That’s a sensible topic. Who is DMOZ? How much interest have Netscape-AOL-TimeWarner on it?
DMOZ was, like Mozilla, one of the late Netscape projects. At the end, when it was clear they where going to lost the Browser War, they surprised us all with the GNU release of the Netscape code, and started the Open Directory Project. A rare move for a public company, but a great one, no doubt. Then, AOL takes over, and after that, many geeks left the company, even Anderseen, the original techno-founder, left it too.
Since then, things are quite different. To their credit, AOL did not kill the ODP, and it have been maintained quite well, but the truth is they have not changed a bit of it since then. No improves, no low level work on the code, no adaptation of the inclusion strategy. Heck, you know perfectly well Google is changing the algo almost in a weekly bases, mostly in order to fight spammer. Big changes at ODP are long overdue.
The ODP is headless, no visible ÜberGeek inner circle is running it, no corporate managers show up that much. Everything rest now in the hands of the Editors, who, no matter how much they work on it, can not make big changes to the process itself.
Still, the ODP founders knew pretty damn well what they where doing. The ODP has survive all this, and it is arguably the best Directory around. Further more, the base data is Open, available to others. We can try to improve it, and we will. ODP is not going to die, no "editor crisis" will take it down. The data and the countless effort of the thousand who have help build it will no doubt survive, at is present form as Dmoz.org, and probably also in a number of parallel versions of it.
Netscape didn't start the Open Directory Project. It was bought by them--presumably in an attempt to provide solid support for it's continued running so they could use the data for their own directory. They promised, when buying it, to keep the data available to all and free. I doubt that ODP had much to do with AOL buying Netscape. They began using it shortly thereafter, but I don't think it would have made a difference if Netscape hadn't owned the ODP.
>>Since then, things are quite different. To their credit, AOL did not kill the ODP, and it have been maintained quite well, but the truth is they have not changed a bit of it since then. No improves, no low level work on the code, no adaptation of the inclusion strategy.<<
The code has changed so much since AOL bought Netscape it's hard to remember what it was like back then. Most of it is only evident to editors, granted, and usually editors that have been there for a while. But there are visible differences on the outside too: more levels of sorting categories, alternate language links, new levels of editors including cat-editalls and category moderators (catmods). Recently a level of editor called a Greenbuster was introduced.
>>I think the solution will be clear if and when DMOZ recognise that there is a problem. <<
I think everyone at dmoz.org knows there are problems. But editors look at the problems from a different angle and they assign a different priority to some of those problems than submitters do.
>Netscape didn't start the Open Directory Project. It was bought by
>them
Oh, thanks, Didn’t now that. I thought it was an in-house project, maybe started by Netscape engineers
>I doubt that ODP had much to do with AOL buying Netscape.
I can’t see any relation, either. Netscape was trying to be more geek/Netizen friendly, distancing themselves from Microsoft Big Bad Corporation image, in an attempt to win the public relations war, if not the browser war. AOL just acquired it with the rest of Netscape assets.
>The code has changed so much since AOL bought Netscape it's hard to
>remember what it was like back then. Most of it is only evident to
>editors, granted
I guess you are right, but from the outside, it looks different. We have been using ODP DUMP data for 3 years now. The last change to its buggy format was year 2000-11-20...
>I think everyone at dmoz.org knows there are problems. But editors
>look at the problems from a different angle and they assign a
>different priority to some of those problems than submitters do.
True enough.
But internally, I noticed another new feature for editors today. Like one of the changes I noticed _last_ week, it's a tool for helping deal with spam. (I may have mentioned that this sort of thing is appreciated by the editors, and frees them up to concentrate more on legitimate sites, which should be appreciated by honest submitters.
I'm also noticing a couple of problems today (unrelated to each other or to the new tools), by which I deduce that more tools are in the pipeline. Well, that's life in the programming business.
More generally, there have an average of 1.5 or 2 programmers working full time on the tools -- and they've had some very sharp people. dmoz.org has always been on my "top three list of sites that just work the way you expect", but if you haven't edited since 2000, you'd not recognize half the functionality of the editor dashboard.
Just to make it clear, I think ODP is a great, impressive project, and I like it so much I´m thinking in ways to improve it from a outside point of view. I'm not attacking it in any way, but I feel the need to be critical as a way to improve it
>I know of at least two changes in the RDF format since then: Altlang
Well, mayb true, but we have not seen any, and it's not reported at the official Changes & Errata page. They took out the Netscape data, that’s all, but the format has not changed, the file is still messy and enormous, and unusable for most users.
>But that's sort of like "Oxford Press hasn't done anything in the
>last 300 years, they're still just putting out bits of paper glued
>at the edges."
Not at all, It is not a trivial problem. The dump is a monster file, getting bigger and bigger every day, and it’s not a secret there are a number of EASY ways to improve that. Heck, Dmoz.org claims it should be doing that since 1999. Most of the Dump users don’t need all the data, just a piece of it, regional or niche data. Still they are force to download hundreds of megabytes, and try to parse a Gigabits size file. Not a problem for us, but a pain in the a** for many users.
>But internally, I noticed another new feature for editors today.
Great.
>More generally, there have an average of 1.5 or 2 programmers
Well, that's not exactly an impressive number. Many of my customers have 5 to 10 programmers just to maintain the usual business databases.
Anyhow, if they are only two, they are doing a great job, no doubt first class programmers. Just maintaining the status quo of such a monster service is already an impressive task.
That new distribution will be free to implement a business model that allows committing resources to complement the original Dmoz listing, just like RedHat carries the original linux Kernel and its upgrades<
OT: If you are not familiar enough with Linux distros don't use them as examples.
RedHat is full of the latest software patched by the amateurs at RedHat as opposed to Slackware which doesn't have all the bugs there are in RedHat. We want to compare the final product not who's using the latest version of XFree.
True. But it may be the price to pay if you want to reach a wider audience, and in any case, you will always have the possibility to chose between them. Same applies to any new ODP distro.
The good thing about ODP is that it is used by Google and I don't expect them to switch to a branch of it that easily.
The problem with directories though is in getting a critical mass of submitters. The ODP dataset provides a nice start but checking 3.8 Million URLs has to be automated as I don't think that it is feasible to do the checking manually.
The one thing that kills directories is the staleness of the data and Dmoz is not immune. The other factor is that directories rely on the Catch 22 situation of getting sufficient submissions to grow. It would be possible to design a system that would automatically detect new websites in various tlds and cctlds. (I've done it on a small scale for the .ie cctld (daily) and on a limited scale for Irish owned .com/.net/.org/.info domains (monthly).) Websites detected by this method could become the priority for review by the editors and this would give it an edge over conventional search engines/directories. The only concern with applying this kind of methodology to a large scale directory would be the hardware and bandwidth requirements.The reason that the .ie cctld was so easy is due to the fact that only 22000 or so .ie websites exist and of that about 14000 are active. However a globally distributed system with each cctld being processed separately and the results being combined into a central dataset that can be used by all participants may be feasible.
Regards...jmcc
>The good thing about ODP is that it is used by Google and I don't
>expect them to switch to a branch of it that easily.
Well, the way I see it, it´s not about Google. They may be the biggest thing now, but may not be so say, 1 year o two years from Now. Besides, as soon as Aspseek goes distributed computing, you will see pagerank based grids popping all around the place, eventually taking over Google or whatever is there then.
It´s about improving ODP, nothing more. Hundreds of sites are using ODP data, some may switch, some may use a number of different distros, or just one. But there is definitely room for some improvement.
DMOZ.org: like so many other things, spammers and people out for self-interest killed it. Wonderful concept, but unrealistic model to scale well. For any DMOZ-style directory to work properly, it has to scale well and answer these questions from the get go:
1. Unpaid volunteers...who wants to work like a dog for free? You need to pay them something.
2. Quality control? Unpaid=no quality control. Need a hierarchy with managers training new editors...
3. A system to deal with Self-interest.
===============
Yahoo did use a paid system but theirs was flawed obviously and yahoo, which was built first and foremost as a directory, is no longer a directory but a search engine, and google at that.
The problem with their fee is that it was way too high. Ordinary sites can't afford that. I have 5 sites I would have gladly paid the fee for, IF I was sure they'd get in. But I lost $200 bucks once when they started the system, and DIDN'T get in! That soured me from ever submitting again. The editor was very rude about it, I was allowed only one email to question their ruling and boom... If they even had one larger fee if you get in and some smaller penalty if you didn't, then more people would submit. I could have stomached a $25 fee...enough disincentive to most serial spammers is $25...but they were too greedy...
=====
So coming back to square one...take the initial idea of a DMOZ style directory...charge a small fee, $1 - $5....pay your editors a percentage of that...but they have to go through a process...it's easy to filter out bad editors out for self-interest. They will tend to want to do one thing. Edit their own category in with a few domains and then disappear. Well, new editors will have all action moderated. DMOZ gave them too much power... Once they've dealt with 100-200 listings, give them unmoderated, but "watched" power...and no payment to them until they reach certain thresholds...
====
A new DMOZ, designed from the ground up, NOT using their RDF at all, might have a chance to work...but the big problem is getting the eyeballs...
Designing a directory from the ground up is possible though it would be rather time consuming. However using the Dmoz RDF has some advantages. It would allow existing software to be used and it would make the new directory a lot easier to integrate with existing directory sites. While it would not necessarily meant that these sites would stop using the dmoz feed, it would provide them with an added service by making the new directory structure and RDF Dmoz compatible.
Having written programs to import the Dmoz RDF to MySQL and export pages as static HTML, I can see some of the benefits of using RDF. One of these benefits is its organic growth. This kind of structure would be difficult to replicate from the ground up.
Regards...jmcc
It's much easier for a person to find a search site, decide its not what they are looking for and then go move on to another directory or engine (Google) that does give them good results than to spend time providing feeback on inappropriate listings to a directory they've just discovered and have no real need to deal with.
Well, not that much, the more relevant sites will come first. The first results, unless voted out, will mostly be Dmoz edited sites. New sites, including spam, will be added at the end, in a first come first listed bases, and only those few voted "up" will come up first.
>Most people just want to find what they are looking for
Right. But they hate spammers, and downgrading a spammer, a very rewarding action, could be just a few votes away. Users do not provide feed back so easily, but they will be tempted to do it if they dislike (or like) hard enough the site they have just seen. Normal sites, more or less on topic, will be untouched, while nasty sites will be more heavily voted.
I'd agree with you on this point. This is one of the weaknesses of Dmoz that can actually be solved in a fairly easy manner using simple programs. The obvious start is to look for link swamps (such as one large .hk based company that buys up expired domains)and ban them. Then 301/302 analysis can be used to determine websites that are really all pointing to a single site. Dmoz had perhaps tried to do too many things at once and overloaded the editors with responsibilities that could easily be taken care of by some relatively simple spiders.
The navigational aspect is critical. Most people seem to use search engines to navigate sites and as a consequence, they do not rely on the hierarchical directory structure for navigation.
Regards...jmcc
We can deal with most of those problems automatically, or using a restrictive submission policy (Only domain names, no directories, no redirection).
Still, they will be able to place a few spam sites, but they will be originally placed at the bottom of the results. Only voting will allow them to go up, but, at the same time, they will be exposed to negative voting, what may end up delisting them.
> Only voting will allow them to go up, but, at the same
> time, they will be exposed to negative voting, what may
> end up delisting them.
It isn't easy to come up with a fair directory. In the
example above the first people in are GREATLY advantaged
over later entries. Who will ever get down to the bottom
of a category to go vote on something good, unless it is
their own site they want to vote up?
I'd propose that anything relying on the proper behavior of
Internet users is not going to work. Every time the rules
for ranking well are known people "cheat" because it is very
valuable to do so.
On another note, I think it is quite expensive to get a new
directory out in front of enough people to get any
reasonable quantity of traffic or submissions. If you put
the DMOZ up on an alternate site tomorrow, with bold new
ways to improve it, who will care?
To do it right you'd need to have some type of budget to get
a basic amount of editing done, generate some publicity and
traffic and then earn enough revenue to promote your site to
the masses to keep the ball rolling. On the other hand, I
am a technical person, so perhaps I'm just bad at these and
over estimate their difficulty.
>In the example above the first people in are GREATLY
>advantaged over later entries.
Just like the DNS system. If you registered good-domain.com before anybody else, you have an advantage over others: the first mover has a natural advantage. Is it unfair? Well, that´s life ;)
>Every time the rules for ranking well are known
>people "cheat" because it is very valuable to do so.
Sure, but still, you can set up rules that are extremely difficult to break, and wildely reconiced as fair rules. Think Democratic Elections, or Chess Games.
>On another note, I think it is quite expensive to get a
>new directory out in front of enough people to get any
>reasonable quantity of traffic or submissions.
No necesarilly. I´m sure some of us can pull a few tricks to get some traffik ;)
Anyhow, pay promotion and search engines is not the only way to get some good traffik. How much promotion needed the owners of sex.com or nasa.com in order to get his first million visitors? :)
Right. That is why humans editors are needed. Dmoz.org humans editors ARE needed, doing their great job.
We can add more value to that, using robots and programs. but we can't replace them. That is not the goal.
Here you go through all this effort to build a voting system but admit that it isn't likely to really do much for people out of the top zone anyway.
Is the purpose of the directory to identify quality material or is it not? If you want it to be an effective directory you have to leverage the human editing capability and somehow promote quality whether or not it arrives late to the party.
There needs to be a better way to identify quality than to hope that users will scroll through a ton of entries to find the one gem in the pile.
Again, I am very curious about how you'd promote the beast and get things rolling. I find this the more difficult aspect, especially if you don't own a web property with zillions of visitors. Would you try to convert ODP editors or would you build your own pool of editors from scratch?
How would you interest them in participating if you can't offer money? Without a brand or some buzz it seems awful hard to generate excitement and participation these days. The aspect of "coolness" that used to spread by word of mouth in the past has mostly evaporated.
>Here you go through all this effort to build a voting
>system but admit that it isn't likely to really do much
>for people out of the top zone anyway.
We hope the voting system will most likely help keeping the spam away, nothing else, nothing more. If your site is really the gem in the pile, it must be shiny enought to be slowly voted up. It happens all the time at Slashdot, so it is perfectly possible.
>Is the purpose of the directory to identify quality
>material or is it not?
We think directories must identify quality, AND list as many sites as possible. That's what will try to accomplish. We are counting on ODP editors, voting, citation ranking, and a few more tricks. Any other scaleable method you could think of would be welcome.
>Again, I am very curious about how you'd promote the beast
>and get things rolling...
Hey, it’s a kind of magic! ;)