Making a small directory with a good design is relatively simple. However, the problem with creating the theoretically "best" directory is that inevitably it will outgrow its ability to keep up to date and on top of submissions. The ODP has several thousand active editors but about a million unreviewed sites and probably many more worth listing but as yet unsubmitted. Zeal hasn't yet reached the fame level required to overload its submission and review process. Goguides and Joeant have struggled to get going over now quite a long period of time.
Without doubt, the major reason for the success of the ODP is its usage as the Google Directory and hence its significant impact on traffic, especially for smaller and new sites. Without that association, its profile would be low, the PR of its pages small and its worth as an inbound link vastly reduced.
SEs like Teoma and Alltheweb are good examples of search algorithms of a high enough quality to compete with Google, but their profile outside the webmaster population is so low as to be virtually worthless. After Yahoo and Google, people have heard of Ask Jeeves with its TV advertising (UK), MSN and Freeserve as default homepages and SEs like the BBC working off the back of a very high profile site only now entering the search field.
So in my opinion, the major questions are:
1) could we design a directory structure, software base and editing protocol that could outperform the ODP when it reached the size of the ODP? I think the Zeal, Goguides and Joeant models would all break down at that size.
2) if it resorted to paid submission or editing to solve point 1), would the directory provide the relative quality and unbiased listings of the ODP compared with e.g. Yahoo?
3) could it make the associations with major sites (almost inevitably Google) that would raise its profile and make it worthwhile to have a listing?
4) by the time the directory has evolved to the size and complexity required to compete with the ODP, will we still be playing the same game as we are now with PR and SERPs or will you have to pay to get listed by Google at all?
IMO, the biggest gripes against DMOZ are (1) the site is too slow and (2) getting a site included takes too long.
A new directory would have to have the hardware to address the site speed and the manpower to address the inclusion time. And it needs to do that without running into the problem with Yahoo -- it costs too much.
If a directory can have a business model that will address those issues without going bankrupt right away, I'm interested. The best compromise I'm aware of is Zeal, but it is kind of slow too, and it's not completely free. It does, of course, have that goofy linking structure.
I don't know a way to make one that is the best of all worlds since MONEY, like it or not, is important here ... but if somebody does, more power to him.
Hopefully the new hardware soon to arrive at dmoz will help the speed that the site runs at.
How many submissions per day does Zeal receive compared with the ODP?
To help guide you in thoughts for a new directory please keep these things in mind. First of all, money would NOT be an issue behind development and the server/bandwidth costs. As any site grows it expands, and so does the resources along with it.
However to figure out conceptually how the design would go into place of the database structure. To figure out what would make this complete you will need to think of what are flaws or pluses of each existing directory on the net. What makes these sites slow, what makes them unique and useful.
As the site grows, what would make it easy or possible to expand the site, the hardware. Networking design, and populations would need to be in mind also.
How about some kind of self-moderating (Slashdot style) directory, whereby submission to the directory was immediate and does not require an editor.
Users of the directory can then moderate entries - promoting the score of good entries and demoting bad ones.
Like Slashdot, there would be no overall deletion (free speech and all that), only the possibility that an entry was moderated down into oblivion and only viewable by users browsing the directory at that level.
Obviously loads of issues - but, with care and thoughtful development could this be made to work?
I'd be able to give some time (and resources) to this if anybody thinks it could be made to work...
I love the idea and think dmorison has a great idea doing it democratic style. Count me in.
What happened with Marcos?
He told us he was starting a new directory, using the DMOZ dump as seed, a few months ago.
Maybe we could learn from his experience.
/. does not post anything you submit to their front pages. They post all the users comments for it but in order for you to make front page or the news submissions it must be aproved by an editor. The site kuro5hin will post news and the members can vote for or against it, and by population of positive votes it will make it to the public eyes.
What would self moderation entail? For members/moderators to post votes for a site if its good or bad, or the whole public websurfers be able to. What would stop people from denying a site based on a popup or competition not liking it and so forth?
What other structures could be tried also? Would it be ideal to make the structuring data available as DMOZ has so other engines could utilize it?
using dmoz seems like you're just recycling dmoz. I think you need to start it from scratch and make it our own.
General Directories do not scale. At some point they get too big and unwieldly. Plus in the general topic area you have to directly compete with the spidering engines, and that is very hard to do.
When they are smaller, general directories can be more useful, but we already have several in Joeant, Goguides and a few more.
The other way to approach the general directory is to be like About.com or Suite 101. You are not trying to list every site on a particular subject, only the 20 or so that matter. And you tell them *why* they matter.
I disagree on automation. The strength of a directory over a search engine is the human reviewed results - not dreck searved up by some robot. Presumably, these are reviewed by somebody that knows a bit about the subject. When editors stop editing the advantages of a directory are greatly diminished.
IMO, the real way is to have smaller topic specific directories and leave general indexing to Google, ATW and the rest, or go the About.com route described above.
|/. does not post anything you submit to their front pages. |
I wasn't really getting at that aspect of Slashdot, more just the commenting mechanism.
|What would self moderation entail? For members/moderators to post votes for a site if its good or bad, or the whole public websurfers be able to. |
At this early stage of thinking, the whole public in my opinion (if they wish to of course).
|What would stop people from denying a site based on a popup or competition not liking it and so forth? |
This is why you allow the public as a whole to vote for or against a site. A site cannot be significantly upgraded or downgraded on the strength of one vote alone. I would anticipate people to vote against their competitors, but in the end their vote would be insignificant in the grand scheme of things.
|What other structures could be tried also? Would it be ideal to make the structuring data available as DMOZ has so other engines could utilize it? |
Absolutely. Should I get a mailing list going on this?
Just to add, RE: your comment regarding "what if a user doesn't like pop-ups".
If sufficient weight to such a directory was achieved, the self-moderating aspect would be a great way to get a feel for what was considered acceptable by the browsing public.
If it turned out that sites loaded with pop-ups were continuously moderated down, then it would serve as a deterrent to sites to use pop-ups.
Just an example I might add - i'm not standing for or against pop-ups in particular.
Structure of the Database... DMOZ structure? I hear a lot of people say that DMOZ db structure is slow and archaic in ways. Id think that it could be a lesson of what is right and wrong about it and how to learn from it to create something new.
Regards to popups, that part wouldnt matter to me, some of the best sites have them. :D How could the site be designed to have people watch the watchers if it were moderatored. We all know that dmoz has some currupted moderators. Even if it were public vote based it would still require some sort of moderation.
Small directories will never go away, infact I hope more popup, they are great resources. However a directory which is general, properly setup, easily workable would make a great site such as DMOZ or Yahoo's structure of information.
Relivent categories is a great thing in DMOZ which i think the new directory should also have.
One of the biggest problems I have seen (especially with Dmoz-like directories) is that they grow organically rather than logically. As a result, the directory structure becomes more and more confusing for the user. This tends to drive users towards the search engines. The number of entries in the directory also has an effect on usability and navigation. From having to reverse engineer the Dmoz RDFs to build a simple MySQL/static webpage version, I can appreciate some of the logic of the Dmoz people but I don't think that it translates well for the users.
One of the main weakpoints of Dmoz is in the country sections. The number of listed sites is rarely more than 25% of the sites related to that country. According to Dmoz, it has 144795 UK websites (combined .uk/com/net/org/info etc). I am currently finishing pre-indexing UK com/net/org websites (the current count is running at 1237162 websites with under probably another 100K to go) and it seems that the number of excluded websites is really in the millions when you take the .uk websites into consideration. There is of course a qualitative argument here - what is worth including? However because Dmoz depends on user submissions, it is always going to be lagging behind the real world. The Dmoz dataset may be good for an initial boost but it alone does not provide a good enough data feed. A new directory would have to have other sources of new websites. The same 'lack of coverage' applies to all the countries that I have tracked. (My work involves tracking com/net/org website usage for the main Europe countries at the moment.) It may be possible to create a network of good country specific directories rather than one huge generic directory like Dmoz or Yahoo.
The business model of a brand new directory would be the important aspect to get right.
|How could the site be designed to have people watch the watchers if it were moderated. We all know that dmoz has some currupted moderators. Even if it were public vote based it would still require some sort of moderation. |
Thinking out loud...
I envisage development of such a beast would go along like this:
1. Consider the possible ways in which attempts to abuse the directory could be executed
2. Design the rating and moderation system with (1) in mind such that the directory has as strong a natural defence as possible to the most obvious forms of abuse / spamming.
For example, you could apply a weighting to moderation gestures based on the frequency of moderation. This would mean that a site that was promoted extremely quickly (a sign of a successful spam attempt) could equally be demoted just as rapidly.
And then.. (to cover the situation you describe)
3. Consider how meta-moderation might work in cases where a creative spam attempt was successful in thwarting the natural defences. Similar to Slashdot's "bitch slap".
|brotherhood of LAN|
>lesson of what is right and wrong about it
I don't know how it was done, but it was well done (DMOZ at least). The URL structure is "uniform" and easy to use, no wonder it's used by search engines to accompany results. I'm sure this uniformality is the sort of thing the automated engines love.
Think of all the "news [google.com]" categories spread across DMOZ. Just by getting cats with news in the URL you could well be two clicks away from most of the major news events/websites on the web, maybe a good sign of a good structured directory :)
If automation can be done, I'm all for it. Can a directory still be called a directory if its automated? :)
Things like 404 checking, expired domains, sites changing content etc can maybe be improved upon at the current directories. The only editors I would "need" are to make sure the algo put the page into the right category ;)
From being a DMOZ editor myself I understand that MANY of the sites submitted to my categories are not the right ones, thus moving them is involved. If this happens on such a regular basis with DMOZ it would need a way to handle misplaced listings.
Automated processes of picking up bad links by using robots to flag them are important also.
|don't know how it was done, but it was well done (DMOZ at least). The URL structure is "uniform" and easy to use, no wonder it's used by search |
engines to accompany results. I'm sure this uniformality is the sort of thing the automated engines love.
For a search engine to get a potential search index from Dmoz, it is really just a case of extracting the links from the RDFs rather than indexing the whole Dmoz. I think there was a discussion on this forum last year that involved someone trying to spider all Dmoz to provide an up to date directory (during the time that the Dmoz RDF were not updated from September 2002 onwards). But it is a very nice way for a search engine to be more than just a search engine by offering a Dmoz fed directory.
|Automated processes of picking up bad links by using robots to flag them are important also. |
This is actually a trivial thing to do. A more important aspect would be to have a linkswamp detector that would flag and delete linkswamps and poached domains (websites where the domains are purchased after expiry simply for their PR or Dmoz value.). Again this is also relatively easy to implement on a simple level. The problem with Dmoz is that neither of these things is done on anything approaching an efficient basis.
Been over to Zeal and I can't see how to submit a site there. Maybe I'm missing the point of the site?
To submit a site on Zeal, you have to join. To join, you have to pass a quiz on how to write proper site descriptions.
It's Zeal's way of getting site descriptions that half-way make sense, so there's not so much junk for zealots to wade through as DMOZ editors have to.
In answer to an earlier question, Zeal crossed the 250,000 URL mark in March. That does not include the paid side of Looksmart's directory. So in order to catch DMOZ's 3.8 million ... it's just got to grow to 15 times its size.
The directory would have support for multiple languages as there are plenty enough webmasters out there for non-english sites.
Moderation by voting for or against a site is fine in principle, but in practice there are just too many sites and too few votes. Aside from the really high profile ones and the webmaster-relevant ones (e.g. dmoz), how many sites in Alexa have a review and rating? Very few - and of those, almost all have between one and three votes. Easy for the webmaster to promote their own site - that's why the Alexa stars system is currently meaningless.
I posted a thread on this a few days ago, but the idea seemed to die a death which I thought was a shame.
I'm not sure that the collection of people here are that interested in a collaboration project, although I got a few stickies from the few that were.
|brotherhood of LAN|
Trillian, good thread, I read it, I think the drawback is that the collaboration has to be niche, and someone here may already have vested interests in the niche :)
At least with a directory a collaboration wouldnt be focusing on any industry in particular, though I guess we'd be competing with people who already have directories.
EW, If anything does become of this, make sure it gets posted here :)
Trillian - can you post (or sticky) a link to the thread you referred to - I couldn't find it...
Any voting system depends on getting enough traffic to make it meaningful.. and here is the problem with sites that require interactivity - they only attract visitors if they already *have* visitors. Slashdot and Webmasterworld are interesting because they're *busy*. The same is true of visitor-ranked lists, you have to have enough votes to make the system meaningful.
DMOZ data is a good starting point though. Both Google and Alexa take the data and do interesting things with it. If you're clever about processing the RDF dump and merging your own data, then you can add sites, change the ordering, remove sites or whatever at will.
Imagine, for instance, an ODP with all the Geocities and Angelfire sites (etc) removed. That would remove a big chunk of cruddy popup infested sites. Remove parts of the directory that aren't relevant and stick to various themes (e.g. Computers) and it gets easier. Allow visitors to add directly to your directory as well as the ODP, maybe for a small one-off fee and it gets interesting.
@ d morison
Thread is here:-
I was approaching more from the angle of the people on here doing a collaboration project, but a directory style site was exactly what I had in mind.
Any large directory will need a very good search system. Once you get past a certain size people will prefer to search instead of drilling down through the structure.
This means also means you need more keywords for each listing otherwise you run into the same limitations with trying to search ODP
| This 37 message thread spans 2 pages: 37 (  2 ) > > |