Forum Moderators: open

Message Too Old, No Replies

A new Dmoz distribution.

Crisis? What Crisis?

         

Marcos

1:43 am on Oct 3, 2002 (gmt 0)

10+ Year Member



Well, this is what we have so far:

- A handful of good Editors, doing their best, outnumbered by thousands of “capricious” editors.
- Spammers draining the (few) resources of well minded volunteers, taking time and value for their precious job.
- Millions of non-indexed sites, unattended.
- Pay per submission is a no starter.
- Very few resources available to pay editors.

Ok, here is a possible solution. A parallel, Dmoz based, hight profile public Directory, addressing those problems from a for profit perspective. In sort, a new, for profit, Dmoz distribution. Think of it like the relation between Slackware Linux and RedHat Linux.

That new distribution will be free to implement a business model that allows committing resources to complement the original Dmoz listing, just like RedHat carries the original linux Kernel and its upgrades, by paying editors to purge the spammers and add new sites to its “extended” Dmoz version.

Of couse, Dmoz will not necessarily incorporate the new distribution changes to its main RDF dump, but the public availability of the new Distribution could have similar effect to the original one.

I’m sure a few of us have resources to pull this one out.
What do you think?

Jaze

10:02 pm on Oct 3, 2002 (gmt 0)

10+ Year Member



if so many people are dissatisfied with DMOZ

I'm not that dissatisfied. sure there is abuse of DMOZ by editors/submitters but we're legitimately doing OK out of it.

I like the idea that Marcos proposes also, now that I understand it. Whether it takes off or not I'm not sure that it matters but it certainly raises awareness and may prompt someone at DMOZ to do something about the quality of their directory... who knows?

mack

2:59 am on Oct 4, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think the solution will be clear if and when DMOZ recognise that there is a problem. It was said in this thread that a few decent hard working editors are being shaddowed by thousands of editers that are only there for their own benefit. I think it is the other way round, i think the number of good editors far out weight the number or corupt ones. If DMOZ whre to concentrate more on weeding out the bad apples that would for sure be a step in the right direction.

Marcos

2:52 pm on Oct 4, 2002 (gmt 0)

10+ Year Member



Hi mack,

>think the solution will be clear if and when DMOZ recognise that
>there is a problem

That’s a sensible topic. Who is DMOZ? How much interest have Netscape-AOL-TimeWarner on it?

DMOZ was, like Mozilla, one of the late Netscape projects. At the end, when it was clear they where going to lost the Browser War, they surprised us all with the GNU release of the Netscape code, and started the Open Directory Project. A rare move for a public company, but a great one, no doubt. Then, AOL takes over, and after that, many geeks left the company, even Anderseen, the original techno-founder, left it too.

Since then, things are quite different. To their credit, AOL did not kill the ODP, and it have been maintained quite well, but the truth is they have not changed a bit of it since then. No improves, no low level work on the code, no adaptation of the inclusion strategy. Heck, you know perfectly well Google is changing the algo almost in a weekly bases, mostly in order to fight spammer. Big changes at ODP are long overdue.

The ODP is headless, no visible ÜberGeek inner circle is running it, no corporate managers show up that much. Everything rest now in the hands of the Editors, who, no matter how much they work on it, can not make big changes to the process itself.

Still, the ODP founders knew pretty damn well what they where doing. The ODP has survive all this, and it is arguably the best Directory around. Further more, the base data is Open, available to others. We can try to improve it, and we will. ODP is not going to die, no "editor crisis" will take it down. The data and the countless effort of the thousand who have help build it will no doubt survive, at is present form as Dmoz.org, and probably also in a number of parallel versions of it.

theseeker

5:39 pm on Oct 4, 2002 (gmt 0)

10+ Year Member



>>DMOZ was, like Mozilla, one of the late Netscape projects. <<
>>they surprised us all with the GNU release of the Netscape code, and started the Open Directory Project. <<

Netscape didn't start the Open Directory Project. It was bought by them--presumably in an attempt to provide solid support for it's continued running so they could use the data for their own directory. They promised, when buying it, to keep the data available to all and free. I doubt that ODP had much to do with AOL buying Netscape. They began using it shortly thereafter, but I don't think it would have made a difference if Netscape hadn't owned the ODP.

>>Since then, things are quite different. To their credit, AOL did not kill the ODP, and it have been maintained quite well, but the truth is they have not changed a bit of it since then. No improves, no low level work on the code, no adaptation of the inclusion strategy.<<

The code has changed so much since AOL bought Netscape it's hard to remember what it was like back then. Most of it is only evident to editors, granted, and usually editors that have been there for a while. But there are visible differences on the outside too: more levels of sorting categories, alternate language links, new levels of editors including cat-editalls and category moderators (catmods). Recently a level of editor called a Greenbuster was introduced.

>>I think the solution will be clear if and when DMOZ recognise that there is a problem. <<

I think everyone at dmoz.org knows there are problems. But editors look at the problems from a different angle and they assign a different priority to some of those problems than submitters do.

Marcos

7:11 pm on Oct 4, 2002 (gmt 0)

10+ Year Member



Hi theseeker,

>Netscape didn't start the Open Directory Project. It was bought by
>them

Oh, thanks, Didn’t now that. I thought it was an in-house project, maybe started by Netscape engineers

>I doubt that ODP had much to do with AOL buying Netscape.

I can’t see any relation, either. Netscape was trying to be more geek/Netizen friendly, distancing themselves from Microsoft Big Bad Corporation image, in an attempt to win the public relations war, if not the browser war. AOL just acquired it with the rest of Netscape assets.

>The code has changed so much since AOL bought Netscape it's hard to
>remember what it was like back then. Most of it is only evident to
>editors, granted

I guess you are right, but from the outside, it looks different. We have been using ODP DUMP data for 3 years now. The last change to its buggy format was year 2000-11-20...

>I think everyone at dmoz.org knows there are problems. But editors
>look at the problems from a different angle and they assign a
>different priority to some of those problems than submitters do.

True enough.

hutcheson

8:31 pm on Oct 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know of at least two changes in the RDF format since then: Altlang links and a third level of subcategory grouping. Of course, I haven't been tracking it closely, so there may have been others. But that's sort of like "Oxford Press hasn't done anything in the last 300 years, they're still just putting out bits of paper glued at the edges."

But internally, I noticed another new feature for editors today. Like one of the changes I noticed _last_ week, it's a tool for helping deal with spam. (I may have mentioned that this sort of thing is appreciated by the editors, and frees them up to concentrate more on legitimate sites, which should be appreciated by honest submitters.

I'm also noticing a couple of problems today (unrelated to each other or to the new tools), by which I deduce that more tools are in the pipeline. Well, that's life in the programming business.

More generally, there have an average of 1.5 or 2 programmers working full time on the tools -- and they've had some very sharp people. dmoz.org has always been on my "top three list of sites that just work the way you expect", but if you haven't edited since 2000, you'd not recognize half the functionality of the editor dashboard.

Marcos

8:59 pm on Oct 4, 2002 (gmt 0)

10+ Year Member



Hi hutcheson,

Just to make it clear, I think ODP is a great, impressive project, and I like it so much I´m thinking in ways to improve it from a outside point of view. I'm not attacking it in any way, but I feel the need to be critical as a way to improve it

>I know of at least two changes in the RDF format since then: Altlang

Well, mayb true, but we have not seen any, and it's not reported at the official Changes & Errata page. They took out the Netscape data, that’s all, but the format has not changed, the file is still messy and enormous, and unusable for most users.

>But that's sort of like "Oxford Press hasn't done anything in the
>last 300 years, they're still just putting out bits of paper glued
>at the edges."

Not at all, It is not a trivial problem. The dump is a monster file, getting bigger and bigger every day, and it’s not a secret there are a number of EASY ways to improve that. Heck, Dmoz.org claims it should be doing that since 1999. Most of the Dump users don’t need all the data, just a piece of it, regional or niche data. Still they are force to download hundreds of megabytes, and try to parse a Gigabits size file. Not a problem for us, but a pain in the a** for many users.

>But internally, I noticed another new feature for editors today.
Great.

>More generally, there have an average of 1.5 or 2 programmers

Well, that's not exactly an impressive number. Many of my customers have 5 to 10 programmers just to maintain the usual business databases.
Anyhow, if they are only two, they are doing a great job, no doubt first class programmers. Just maintaining the status quo of such a monster service is already an impressive task.

martin

8:17 pm on Oct 7, 2002 (gmt 0)

10+ Year Member



>In sort, a new, for profit, Dmoz distribution. Think of it like the relation between Slackware Linux and RedHat Linux.

That new distribution will be free to implement a business model that allows committing resources to complement the original Dmoz listing, just like RedHat carries the original linux Kernel and its upgrades<

OT: If you are not familiar enough with Linux distros don't use them as examples.

RedHat is full of the latest software patched by the amateurs at RedHat as opposed to Slackware which doesn't have all the bugs there are in RedHat. We want to compare the final product not who's using the latest version of XFree.

Marcos

8:26 pm on Oct 7, 2002 (gmt 0)

10+ Year Member



>RedHat is full of the latest software patched by the amateurs at
>RedHat as opposed to Slackware which doesn't have all the bugs there
>are in RedHat. We want to compare the final product not who's using
>the latest version of XFree.

True. But it may be the price to pay if you want to reach a wider audience, and in any case, you will always have the possibility to chose between them. Same applies to any new ODP distro.

martin

7:03 am on Oct 8, 2002 (gmt 0)

10+ Year Member



>True. But it may be the price to pay if you want to reach a wider audience, and in any case, you will always have the possibility to chose between them. Same applies to any new ODP distro.

The good thing about ODP is that it is used by Google and I don't expect them to switch to a branch of it that easily.

jmccormac

7:53 am on Oct 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



An alternate Dmoz is an interesting project. (I've been using the Ireland/Northern Ireland slices of Dmoz to develop an Irish webpages directory.) The problem with the link farms is actually easy to solve. The larger companies that trawl ODP looking for expired domains tend to use clearly identifiable nameservers. Therefore identifying these linkswamps by checking the authorative nameservers for the URLs provides an easy way of stripping the deadwood from the data.

The problem with directories though is in getting a critical mass of submitters. The ODP dataset provides a nice start but checking 3.8 Million URLs has to be automated as I don't think that it is feasible to do the checking manually.

The one thing that kills directories is the staleness of the data and Dmoz is not immune. The other factor is that directories rely on the Catch 22 situation of getting sufficient submissions to grow. It would be possible to design a system that would automatically detect new websites in various tlds and cctlds. (I've done it on a small scale for the .ie cctld (daily) and on a limited scale for Irish owned .com/.net/.org/.info domains (monthly).) Websites detected by this method could become the priority for review by the editors and this would give it an edge over conventional search engines/directories. The only concern with applying this kind of methodology to a large scale directory would be the hardware and bandwidth requirements.The reason that the .ie cctld was so easy is due to the fact that only 22000 or so .ie websites exist and of that about 14000 are active. However a globally distributed system with each cctld being processed separately and the results being combined into a central dataset that can be used by all participants may be feasible.

Regards...jmcc

rogerd

6:33 pm on Oct 8, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



The comment was made above that Yahoo can't afford enough paid editors... I'm not sure if that comment makes sense. At $299 a site, there should be plenty of room to pay a modest fee to an editor. They could even work on a "piece rate" basis, e.g., $30 for each site processed. Even at that modest percentage, an part-time, independent editor could make a killing... Yahoo could afford to recruit the cream of the crop of DMOZ volunteers...

Marcos

7:42 pm on Oct 8, 2002 (gmt 0)

10+ Year Member



Hi martin,

>The good thing about ODP is that it is used by Google and I don't
>expect them to switch to a branch of it that easily.

Well, the way I see it, it´s not about Google. They may be the biggest thing now, but may not be so say, 1 year o two years from Now. Besides, as soon as Aspseek goes distributed computing, you will see pagerank based grids popping all around the place, eventually taking over Google or whatever is there then.

It´s about improving ODP, nothing more. Hundreds of sites are using ODP data, some may switch, some may use a number of different distros, or just one. But there is definitely room for some improvement.

heretic

6:27 pm on Nov 5, 2002 (gmt 0)

10+ Year Member



Some great kernels of ideas, but not all that realistic yet...with some tweaking, I think the original idea could work.

DMOZ.org: like so many other things, spammers and people out for self-interest killed it. Wonderful concept, but unrealistic model to scale well. For any DMOZ-style directory to work properly, it has to scale well and answer these questions from the get go:

1. Unpaid volunteers...who wants to work like a dog for free? You need to pay them something.

2. Quality control? Unpaid=no quality control. Need a hierarchy with managers training new editors...

3. A system to deal with Self-interest.
===============
Yahoo did use a paid system but theirs was flawed obviously and yahoo, which was built first and foremost as a directory, is no longer a directory but a search engine, and google at that.

The problem with their fee is that it was way too high. Ordinary sites can't afford that. I have 5 sites I would have gladly paid the fee for, IF I was sure they'd get in. But I lost $200 bucks once when they started the system, and DIDN'T get in! That soured me from ever submitting again. The editor was very rude about it, I was allowed only one email to question their ruling and boom... If they even had one larger fee if you get in and some smaller penalty if you didn't, then more people would submit. I could have stomached a $25 fee...enough disincentive to most serial spammers is $25...but they were too greedy...

=====
So coming back to square one...take the initial idea of a DMOZ style directory...charge a small fee, $1 - $5....pay your editors a percentage of that...but they have to go through a process...it's easy to filter out bad editors out for self-interest. They will tend to want to do one thing. Edit their own category in with a few domains and then disappear. Well, new editors will have all action moderated. DMOZ gave them too much power... Once they've dealt with 100-200 listings, give them unmoderated, but "watched" power...and no payment to them until they reach certain thresholds...

====
A new DMOZ, designed from the ground up, NOT using their RDF at all, might have a chance to work...but the big problem is getting the eyeballs...

jmccormac

8:00 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



in msg 44 heretic posted:
"A new DMOZ, designed from the ground up, NOT using their RDF at all, might have a chance to work...but the big problem is getting the eyeballs"

Designing a directory from the ground up is possible though it would be rather time consuming. However using the Dmoz RDF has some advantages. It would allow existing software to be used and it would make the new directory a lot easier to integrate with existing directory sites. While it would not necessarily meant that these sites would stop using the dmoz feed, it would provide them with an added service by making the new directory structure and RDF Dmoz compatible.

Having written programs to import the Dmoz RDF to MySQL and export pages as static HTML, I can see some of the benefits of using RDF. One of these benefits is its organic growth. This kind of structure would be difficult to replicate from the ground up.

Regards...jmcc

Marcos

8:52 pm on Nov 5, 2002 (gmt 0)

10+ Year Member



We are thinking of using ODP data, adding new sites real time, without prior approval, for free, and letting users "vote" the sites they will find on the cats.
The voting system would be similar to slashdot voting system, very difficult to spam. The users will have the power to degrade sites, label sites as off-topic, and, ultimately, delete sites they consider inappropriate or spammy. We would also have some pay editors, but just for conflict resolution. What do you think?

skibum

9:50 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's gonna take a critical mass of editors or users to keep the spam from getting out of control, assuming enough submitters are attracted to the site to submit sites.

It's much easier for a person to find a search site, decide its not what they are looking for and then go move on to another directory or engine (Google) that does give them good results than to spend time providing feeback on inappropriate listings to a directory they've just discovered and have no real need to deal with.

Brad

10:33 pm on Nov 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



skibum has a good point. Most people just want to find what they are looking for as quickly and with as few steps as possible.

Marcos

1:00 am on Nov 6, 2002 (gmt 0)

10+ Year Member



>It's gonna take a critical mass of editors or users to
>keep the spam from getting out of control

Well, not that much, the more relevant sites will come first. The first results, unless voted out, will mostly be Dmoz edited sites. New sites, including spam, will be added at the end, in a first come first listed bases, and only those few voted "up" will come up first.

>Most people just want to find what they are looking for

Right. But they hate spammers, and downgrading a spammer, a very rewarding action, could be just a few votes away. Users do not provide feed back so easily, but they will be tempted to do it if they dislike (or like) hard enough the site they have just seen. Normal sites, more or less on topic, will be untouched, while nasty sites will be more heavily voted.

skibum

4:21 am on Nov 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When a spammer targets a directory using multiple sites, multiple submissions of the same site, useless doorway pages and there is not an editor there to filter out all that stuff it can build up fast and they can hit hard. It's a catch 22. It's not good to make the submission process to complicated (because, of course good submissions are desirable), but if its too easy and there is not an automatic or human filter in place, an FFA can be the result.

jmccormac

5:09 am on Nov 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In msg 50 skibum wrote:
"When a spammer targets a directory using multiple sites, multiple submissions of the same site, useless doorway pages and there is not an editor there to filter"

I'd agree with you on this point. This is one of the weaknesses of Dmoz that can actually be solved in a fairly easy manner using simple programs. The obvious start is to look for link swamps (such as one large .hk based company that buys up expired domains)and ban them. Then 301/302 analysis can be used to determine websites that are really all pointing to a single site. Dmoz had perhaps tried to do too many things at once and overloaded the editors with responsibilities that could easily be taken care of by some relatively simple spiders.

The navigational aspect is critical. Most people seem to use search engines to navigate sites and as a consequence, they do not rely on the hierarchical directory structure for navigation.

Regards...jmcc

Marcos

6:32 pm on Nov 6, 2002 (gmt 0)

10+ Year Member



>When a spammer targets a directory using multiple sites,
>multiple submissions of the same site, useless doorway
>pages

We can deal with most of those problems automatically, or using a restrictive submission policy (Only domain names, no directories, no redirection).

Still, they will be able to place a few spam sites, but they will be originally placed at the bottom of the results. Only voting will allow them to go up, but, at the same time, they will be exposed to negative voting, what may end up delisting them.

vroom

9:48 pm on Nov 8, 2002 (gmt 0)

10+ Year Member



Well, the conversation here hits pretty close to home. Some
thoughts...

> Only voting will allow them to go up, but, at the same
> time, they will be exposed to negative voting, what may
> end up delisting them.

It isn't easy to come up with a fair directory. In the
example above the first people in are GREATLY advantaged
over later entries. Who will ever get down to the bottom
of a category to go vote on something good, unless it is
their own site they want to vote up?

I'd propose that anything relying on the proper behavior of
Internet users is not going to work. Every time the rules
for ranking well are known people "cheat" because it is very
valuable to do so.

On another note, I think it is quite expensive to get a new
directory out in front of enough people to get any
reasonable quantity of traffic or submissions. If you put
the DMOZ up on an alternate site tomorrow, with bold new
ways to improve it, who will care?

To do it right you'd need to have some type of budget to get
a basic amount of editing done, generate some publicity and
traffic and then earn enough revenue to promote your site to
the masses to keep the ball rolling. On the other hand, I
am a technical person, so perhaps I'm just bad at these and
over estimate their difficulty.

Marcos

12:41 am on Nov 9, 2002 (gmt 0)

10+ Year Member



Hi vroom,

>In the example above the first people in are GREATLY
>advantaged over later entries.

Just like the DNS system. If you registered good-domain.com before anybody else, you have an advantage over others: the first mover has a natural advantage. Is it unfair? Well, that´s life ;)

>Every time the rules for ranking well are known
>people "cheat" because it is very valuable to do so.

Sure, but still, you can set up rules that are extremely difficult to break, and wildely reconiced as fair rules. Think Democratic Elections, or Chess Games.

>On another note, I think it is quite expensive to get a
>new directory out in front of enough people to get any
>reasonable quantity of traffic or submissions.

No necesarilly. I´m sure some of us can pull a few tricks to get some traffik ;)

Anyhow, pay promotion and search engines is not the only way to get some good traffik. How much promotion needed the owners of sex.com or nasa.com in order to get his first million visitors? :)

Brad

2:28 am on Nov 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Only domain names, no directories,

A domain name is not an indication of quality. The advantage a human edited directory has is to be able to distinguish the quality and intent of the site creator better than a robot and a program can.

Marcos

2:35 am on Nov 9, 2002 (gmt 0)

10+ Year Member



>A domain name is not an indication of quality. The
>advantage a human edited directory has is to be able to
>distinguish the quality and intent of the site creator
>better than a robot and a program can.

Right. That is why humans editors are needed. Dmoz.org humans editors ARE needed, doing their great job.

We can add more value to that, using robots and programs. but we can't replace them. That is not the goal.

vroom

9:26 pm on Nov 12, 2002 (gmt 0)

10+ Year Member



Hmm, your dismissal of the fairness doesn't seem to make sense to me.

Here you go through all this effort to build a voting system but admit that it isn't likely to really do much for people out of the top zone anyway.

Is the purpose of the directory to identify quality material or is it not? If you want it to be an effective directory you have to leverage the human editing capability and somehow promote quality whether or not it arrives late to the party.

There needs to be a better way to identify quality than to hope that users will scroll through a ton of entries to find the one gem in the pile.

Again, I am very curious about how you'd promote the beast and get things rolling. I find this the more difficult aspect, especially if you don't own a web property with zillions of visitors. Would you try to convert ODP editors or would you build your own pool of editors from scratch?

How would you interest them in participating if you can't offer money? Without a brand or some buzz it seems awful hard to generate excitement and participation these days. The aspect of "coolness" that used to spread by word of mouth in the past has mostly evaporated.

Marcos

6:49 pm on Nov 13, 2002 (gmt 0)

10+ Year Member



Hi vroom,

>Here you go through all this effort to build a voting
>system but admit that it isn't likely to really do much
>for people out of the top zone anyway.

We hope the voting system will most likely help keeping the spam away, nothing else, nothing more. If your site is really the gem in the pile, it must be shiny enought to be slowly voted up. It happens all the time at Slashdot, so it is perfectly possible.

>Is the purpose of the directory to identify quality
>material or is it not?

We think directories must identify quality, AND list as many sites as possible. That's what will try to accomplish. We are counting on ODP editors, voting, citation ranking, and a few more tricks. Any other scaleable method you could think of would be welcome.

>Again, I am very curious about how you'd promote the beast
>and get things rolling...

Hey, it’s a kind of magic! ;)

This 58 message thread spans 2 pages: 58