|What IS the DMOZ? |
Is DMOZ simply an overwhelmed listing service?
The following 9 messages were cut out of the "Does the DMOZ Server Being Down Hurt the ODP's Credibility" thread by webwork - 6:57 pm on Dec. 9, 2006. It raises a somewhat interesting issue that was a bit tangential from the original proposition of the other thread.
I don't find one logic reason for google to continue using ODP data in google directory.
1. Why doesn't google start its own directory? Or even buy DMOZ and start charging for listings (well won't that would be better than paying more that a billion on youtube!).
2. You submit to DMOZ and wait for a year or so to see IF you can get your site listed.
3. Just because submitting to DMOZ is free doesn't mean that they can do whatever they want. If their editors can't handle the amount of submissions, go charge for them even for a small fee for god sake! Even worse, althought their editors can't handle the ammount of submissions, they don't accept new editors.
4. DMOZ data is unreliable. Large percentage of the links, titles and descriptions are broken.
5. DMOZ policy is very questionable with respect to google. They allow you to use their data on your site, but google would ban or at least penalize your site if did so.
[edited by: Webwork at 12:02 am (utc) on Dec. 10, 2006]
[edit reason] Spliced thread editing [/edit]
> 1. Why doesn't google start its own directory? Or even buy DMOZ and start charging for listings
It would be against DMOZ social contract. Ofcourse Google and anybody else can ofcourse start their own directory and charge as much as they want.
> 2. You submit to DMOZ and wait for a year or so to see IF you can get your site listed.
Many errors in this statement.
a. you can't submit, you suggest to the editors to look at a site
b. I rely don;t understand why people wait, haven't you anything usefull to do
c. you can not get your site listed, it was only a suggestion to the editors and if they think it is wothwhile to list it they will do so
> 3. Just because submitting to DMOZ is free doesn't mean that they can do whatever they want. If their editors can't handle the amount of submissions, go charge for them even for a small fee for god sake! Even worse, althought their editors can't handle the ammount of submissions, they don't accept new editors.
DMOZ is not a listing service, never has been and never will be. We accept new editors but only those that are interested in the directory and not those interested in listing thei own site(s).
> 4. DMOZ data is unreliable. Large percentage of the links, titles and descriptions are broken.
Did you report them?
It might be that at this moment a small part of listings is "broken" as we haven't been able to check any of them since end of October.
> 5. DMOZ policy is very questionable with respect to google. They allow you to use their data on your site, but google would ban or at least penalize your site if did so.
That is all upto Google. Just as DMOZ they prefer to list only the original source.
|DMOZ is not a listing service, never has been and never will be. |
Yes it is. When you have a site that has tons of links in tons of categories, and a page specifically to let people suggest adding their site ... that's a listing service.
If you don't want to be a listing service, then remove the submission functionality and replace it with a simple Contact Us form. Then, you can go about reviewing and adding sites as you wish. You can even surf for sites yourself if you want to. And when you say "you can't submit, you suggest to the editors to look at a site", it'll be completely true.
Or, if you do want to be a listing service, which everyone expects you to be, then I think it's obvious that it won't work without many many more editors.
|charging for listings |
It would be against DMOZ social contract
Does that mean you can't charge, or you don't want to? If you want to but feel that you can't, that's easy. Shut down DMOZ, start a new directory with a new name, and give away DMOZ's data to the new directory. New directory, new social contract.
I always see "suggest a site" ..which has a totally different meaning to "submit a site" .."submit" means a listing service .."suggest" means what pagoda is talking about ..and puts the editors and the ODP under no obligation to take any notice of anything "suggested" ..
looking in any dictionary ..on or off line ..submit and suggest are not interchangable ..they dont ask for "submissions" ..and thus have no
that's not playing semantics ..thats just not trying to read into their texts things to suit my or your anticipations ..being in the ODP is nice if / when it happens ..but it isn't a right ..and thus one has no right to ( female dog ) when one isnt in it ..or when it doesnt do what individuals want it to ..
BTW..I'm not an editor ..but I do consider the ODP to be valuable to surfers ..any value to webmasters should be incedental..
edited for clarity
[edited by: Leosghost at 5:53 pm (utc) on Dec. 9, 2006]
|and a page specifically to let people suggest adding their site |
DMOZ does not have such a page.
You can not suggest a site to be added. You only can suggest a site to be looked at. A small difference maybe but essential for DMOZ.
|then remove the submission functionality |
DMOZ does not have a submission function. Suggestion is completely different from submission.
|You can even surf for sites yourself if you want to. |
That is how I personaly find most usefull and listable sites.
|charging for listings |
It would be against DMOZ social contract
Does that mean you can't charge, or you don't want to?
Both. Our social contract prohibts us to charge and the editors don't want to.
|If you want to but feel that you can't, that's easy. Shut down DMOZ, start a new directory with a new name, and give away DMOZ's data to the new directory. |
Everybody can already do so. The DMOZ data is free to use, you only have to put some links back to DMOZ.
|that's not playing semantics |
Yes it is. Submitting a site means "suggesting that it be reviewed and then added".
We all know that DMOZ doesn't have to do anything. But the *perception* is that since DMOZ is a large directory, and I have a legitimate site, there's no reason why my site shouldn't be included in the directory.
If DMOZ was a large blog, we wouldn't be having this discussion. But it's presented as a directory, and everyone knows that directories can be added to because they're meant to be all-inclusive.
It's time that DMOZ either admits that they can't cope with the number of suggestions, and stop trying ... or find a way to cope.
>When you have a site that has tons of links in tons of categories, and a page specifically to let people suggest adding their site ... that's a listing service.
And if I had a building with a door, and rooms containing mattresses, would that be an accommodation service? No. All it would be, at most, is a facility that might possibly be be useable by the people who were actually providing an accommodation service.
But who knows what it really IS? It might be a hospital. It might be a jail. It might be a research lab. It might be an abandoned mattress factory, being re-used as storage for toxic waste. It might be something else altogether.
So you need to look at the facility more carefully. Does it really have ALL the features that would be needed by the providers of the service you envision?
Fact is, dmoz.org has never had the features needed by someone who wanted to provide a listing service.
For instance, a listing service would be concerned about "service levels" -- that is, how quickly listings were reviewed. But dmoz.org has no way of checking or tracking that; with the database structure, there cannot be any practical way of doing it; and editors have never pretended that was a concern.
A listing service would need to collect fees -- the ODP's charter doesn't allow that at all, and the ODP has collected, as service providers, people who don't want to do that at all. (Anyone who wants to do that can do it themselves, under another name, as you acutely point out. However, the idea that the ODP would have to be shut down first, will not bear close inspection.)
Ergo, it's not a listing service, and it's never going to be any use by people who want to provide a listing service.
So ... take it down? Right now it IS down. So you can start your own listing service today, and invite anyone who wants to do that, to help you. I'm not interested. But don't let that stop you: I won't try to set your priorities, any more than I'd consider letting you set mine.
If it looks like a duck, smells like a duck, and quacks like a duck, but has the word 'elephant' written on it, what is it?
You can suggest a link to ANY website, you know. Not just to DMOZ. I get emails like that all the time, both for my personal website and (especially) for an educational website I help maintain. If they're relevant, educational, and child-appropriate, I'm always happy to add them to the educational site; they improve our website. But am I *obligated* to add them? Heck no. Do I owe the webmasters who ask me to link to their Viagra and online gambling websites the dignity of a reply? Not anymore than I owe one to any other spammer (not at all, in other words.) Does it make our website a "listing service?" Dream on. Our website has its own purpose, and will only link to somebody else's site if I think it adds value for our users... AND if I happen to have time.
Is it that big a stretch to imagine that most of that also applies to a noncommercial directory like the ODP, only on a slightly larger scale?
|Do I owe the webmasters who ask me to link to their Viagra and online gambling websites the dignity of a reply? |
No you don't, but that's not what we're talking about. We're talking about sites just as relevant as the ones already listed, not getting the dignity of a reply.
|Does it make our website a "listing service?" Dream on. Our website has its own purpose, and will only link to somebody else's site if I think it adds value for our users |
The difference is that (1) your site has a main purpose other than linking to sites, and (2) you don't openly invite people to suggest their sites for review.
For the ODP, the question isn't, and never has been, merely the "relevance" of sites. It is "significant unique information" that makes a site listable -- if a site has that, then we'll LOOK for a topic it's relevant to. If a site doesn't have that, it doesn't matter what it's relevant to.
So, forget the "relevance". It's irrelevant.
Listable sites already get a response -- that is, they get listed.
And UNLISTABLE sites don't get a response, which is far better than they deserve!
So, which of those two possibilities do you have a problem with?
|I don't find one logic reason for google to continue using ODP data in google directory. |
It is the largest (both by participant count and output) effort on the planet to create a list/description of websites that are not utter crap that has even a modicum of editorial rigor and participant vetting.
No futher reasons are needed to make it a valuable addition to any SE ranking algorithm.
Note that most of the value of ODP to Google is unaffected by all the valuable websites that are not (yet) listed there.
Hutcheson, your points are reasonable. But if you invite site suggestions, I think each one deserves a yes or no response. With a little automation, it would be easy, and would probably improve everyone's lives.
mcavic, that suggestion sounds a-priori reasonable. The editors discussed this at length, and disagreed. (I was one of those who thought it reasonable.) We took the ODP Show-ye-me approach: that is, those of us who thought it might be a good idea tried it. The test involved dozens of editors, and thousands of site suggestions. (I was one of the more actively involved testers.)
We came, we tested, we were shown. There is no longer any disagreement: it's a bad idea. Its contribution towards a better directory simply isn't significant enough to waste any energy figuring out how to do it.
Lots of people think it's technically easy. Anyone may have an opinion: however, I'll claim the right to having an INFORMED opinion, based on at least eight years of experience in each of optimizing compilers, very large low-level database libraries, and ODP editing. And I've actually tried to map out how such a thing might be done. The specifications, let alone the implementation, were complex enough that implementation would have been a high risk. (This from someone who'd rip out and replace an entire p-code interpreter at the drop of a small hat.)
But, when it gets down to it, we just flat really DON'T want to tell people their suggestion is worthless: in our experience, it can only encourage them to rise to new heights of deviousness; in practice, it has no other effect; it never has any beneficial effect; it harms our anti-spam efforts.
And for sites that do get listed, again we don't HAVE to tell people anything in particular, because we broadcast the fact to the world in general.
So it sounds like a good idea: but anyone who had really thought it through should have recognized that it couldn't possibly have any good effect -- and I don't blame you for not figuring that out, because it certainly took ME (and us) long enough, even looking at the evidence.
But ... we've experimented, we've been shown, and that's the end of all speculating. The issue is dead until there's substantial new evidence, pointing a different direction: and I can't imagine what form that could possibly take in this universe.
I've been using PHP and MySQL for 4 or 5 years, and with them, just about anything is easy. :) But as you point out, it's not a question of "can you".
My thought was that giving someone a definitive no would help indicate that pestering you wouldn't do any good. And you already state that you're very selective, so a rejection doesn't have to mean that the site is worthless.
> We're talking about sites just as relevant as the ones already listed, not getting the dignity of a reply.
All sites "relevant" (but relevant in terms of the DMOZ guidelines) get a reply. They get listed. Maybe the reply sometimes takes a little longer as we all would wish but eventualy the reply will be there.
All other sites (from estimations some editors made about 90% of suggested sites belong in this category) don't get a reply as they are not "relevant" for DMOZ.
>I've been using PHP and MySQL for 4 or 5 years, and with them, just about anything is easy. :)
I've cleaned up after people with that approach. Often. It's sort of fun, if you have a keen sense of irony.
>My thought was that giving someone a definitive no would help indicate that pestering you wouldn't do any good.
Understandable. But you'd be amazed how many people said outright, "I want to know when my site is rejected, so I can immediately suggest it again."
Frankly, letting them pester us in ignorance (and therefore inefficiently) is less harmful to us, than making it possible for them to pester us efficiently.
It really doesn't matter: if people are courteous enough to follow the submittal policies, they won't be suggesting again, regardless of what we tell them. And if in spite of the submittal policies they will be suggesting again, I'm not sure what else we could tell them.
There's really nothing to say except things that our mothers would be ashamed to hear us say.
|I've cleaned up after people with that approach. Often. |
What approach? Designing a system around a powerful, simple, and fast language and database? You probably mean cleaned up after people who didn't know what they were doing.
|"I want to know when my site is rejected, so I can immediately suggest it again." |
Ok, but with a proper database, you can store all of the domains ever suggested, and refuse duplicates.
|The specifications, let alone the implementation, were complex enough that implementation would have been a high risk. |
If you have a Web site that lets editors see the pending URLs, and there are buttons to approve or reject a URL, all you have to do is make those buttons find the email address that sent in that URL, and shoot off an email.
>We're talking about sites just as relevant as the ones
>already listed, not getting the dignity of a reply.
That's in the eye of the beholder, isn't it? You'd be REALLY surprised how many people think their tangentially related ecommerce site or completely content-free Adsense page is "just as relevant" to our educational children's website as an interactive historical encyclopedia, just because it has the same words in the title.
>You don't openly invite people to suggest their sites for review.
Yes, I do. On our FAQ page, we invite people to send us the URLs of child-appropriate educational sites to add to our links. We also state that we won't respond to link solicitations from other sites. That's an editorial policy, just on a much, much smaller scale than the one the ODP is working with.
The fact of the matter is, unless you've paid someone to link to you from their website, they're under no obligation to do it, no matter how you slice it. It doesn't matter if it's the ODP, my small educational site, Slashdot, or Yahoo's occasional free listing of random websites. Sometimes a website wants to link to you, and sometimes they don't. Except for the internal rules of the website itself, you don't get an appeal. That's really just the way the Internet is... and frankly, I kind of like it that way.
|I don't find one logic reason for google to continue using ODP data in google directory. |
I don't find logic in the obsession that Webmasters have with the ODP. What gives? The importance of a listing in the ODP at this stage of the game is a moot point. There are bigger fish to fry as they say.
With the ODP, you submit it and forget it. Check back every now and then to see if it's been listed, if not, no big deal.
Google's reliance on ODP data has decreased over the years. It is still a "quality" source of data for their directory.
If you are looking for juice from "trusted" global sites, the new ODP's are (in alphabetical order)...
What percentage of visitors to Google are using the Directory tab?
What percentage of those using the Directory tab are Webmasters?
An ODP listing may not be what it once was. Google has many more "trusted" resources to rely on for it's algo. The Wiki is by far the "most trusted" of the lot.
Nothing compares to Human Reviewed Content!
Leave the ODP alone and let them do their thing. :)
>Designing a system around a powerful, simple, and fast language and database?
No, the attitude that causes catastrophic performance problems, in my experience, is the delusion that simplicty and speed inhere in the language or database. They don't, of course: computer science students will know the famous proverb, "There is not now, and never will be, a language in which it is the least bit difficult to write bad programs." (For "bad" you should read "complex, limited, slow, or all the above").
From my perspective, SQL is not and cannot be "fast". It is inherently slow, compared to any kind of competant hand-rolled database/algorithm design. It's only when compared to the programming efforts of nonprogrammers or incompetant programmers (which, face it, IS most of the population even of computer users) that it can seem fast.
Nor is SQL simultaneously "simple" and "powerful". Its "simplicity" comes at the expense of giving up substantial power and flexibility that industrial-strength database systems customarily provide their users.
For trivial problems on tiny databases, SQL trades off mammoth processor overhead (which most people have) for programming skill and time (which are expensive and rare.) For complex problems on extremely large databases, the presence or absense of a Structured Query Language is ... a notational convenience, and nothing more; no substitute at all for the real knowledge and skill required to ensure that all algorithms on the database complete in the required time.
So, your statement tells me everything about your personal experience, and nothing at all about theoretical computer science--or even about the challenges facing the ODP developers, who ARE working on EXTREMELY large databases -- orders of magnitude larger than MySQL can touch -- and on moderately complex performance issues in the algorithms.
Again, the attitude that "I can easily write something that'll run fast enough on my trivial database" in practice, translates into the kind of performance catastrophes that get people like me involved to rip out that simple naive code and replace it by something that really IS fast, over datasets orders of magnitude larger than the toy tests run on the original code.
|SQL is not and cannot be "fast". It is inherently slow, compared to any kind of competant hand-rolled database/algorithm design |
I agree that SQL has a lot of overhead. But if your hand-rolled design doesn't allow you to add a simple feature, and mine does, but both perform well in production, which is better? There's nothing wrong with overhead if you're getting something in return.
|who ARE working on EXTREMELY large databases -- orders of magnitude larger than MySQL can touch |
So how large? I'm working with a U.S. phone directory in a MySQL table with 105 million rows that queries in 0.2 to 0.4 seconds. It's a pain to build and reindex, but then, it's a lot of data.
|computer science students will know the famous proverb |
I graduated in computer science in 2000, in a degree program that emphasized ANSI C, Unix, and PC hardware. But the proverb tells me as much as "there's never been a car invented that would avoid crashing when you point it at a brick wall and floor it."
|I can easily write something |
If I had the time, I'd offer to prove it to you. It would be educational for one of us. Which one, I wouldn't stake my life on. :)
Well, you're seeing how long it takes to build the ODP database. And its size exceeds the maximum size MySQL can manage.
So, 2 terabytes [dev.mysql.com], then?
Hi ODP guys, can one of you give us a hint at the annual costs of running the ODP , in $US.
AOL doesn't tell us that. But can't you guess? Go to your favorite ISP and ask what it'd take to buy and run and maintain, say, ten top-end Sun servers: Solaris or Linux, no need to pay the Microsoft tax or take the Microsoft performance hit. Throw in one developer's hindquarters and a couple of ribs of manager. Age well, and hope something doesn't explode in the oven.
Okay, lets try
10 Sun Fire X4500 including operating system, at $50,000 each special dmoz discount :), total $500,000 spread over 4 years, $125,000/pa
1 tech for $100,000 per year,
A couple of full time editors, perhaps job sharing $150,000 total
bandwith cost, hmm, a wilder gues here,,,, $200,000 per annum
other attributable costs,,$100,000
Total guess,,, 125,000+$100,000 +$150,000 +$200,000+$100,000
thats $675,000 give or take a few more wild guesses
Okay, now you can laugh
Back to the original points
|5. DMOZ policy is very questionable with respect to google |
I think you have this the wrong way round. DMOZ encourages you to use their content. But if you want your site to do well in Google then avoid using DMOZ data - however much you think your users would welcome the content. That is a Google Dictat and not a DMOZ policy.
>But if you want your site to do well in Google then avoid using DMOZ data - however much you think your users would welcome the content. That is a Google Dictat and not a DMOZ policy.
If you don't have value to add to the ODP data, everybody's probably better off if you just link to directory.google.com, or dmoz.org, or the appropriate subcategory thereof.