Welcome to WebmasterWorld Guest from 126.96.36.199
unless they loss all their data or applications
According to Rich Skrenta, one of the DMOZ founders, that's exactly what happened. Server crashed, no backups and "during unsuccessful attempts to restore some of the lost data, ops blew away the rest of the existing data on the system." He thinks that they're currently trying to recover stuff from the last RDF dump.
He doesn't hold out much hope for DMOZ's survival under AOL and sounds pretty glum for its continued existence at all.
[Heads up, Webwork, link drop to somebody elses personal blog on the way, though I do think it can be considered somewhat authoritative ;-)]
Rick Skrenta: RIP DMOZ 1998-2006 [skrenta.com]
It just shows you that the editors that post here were either fully in the dark themseleves or just didnt want to say the full extent of the problem .
So it started effectively 12 mths ago when AOL didnt want to invest further into providing back up facilities - that was kept a bit hush, hush! but makes perfect commercial sence - i think AOL have been brave to keep funding this project anyway imo esp in view of them having to lay off staff elswhere currently.
Following the recent technical issues it looks like dmoz were left without any recovery plan then, to fall back on. Add to this the AOL staff layoffs and i would agree with the blogger that DMOZ isnt going to be a prority issue or a likely candidate for further staff resources or investment.
Im not sure i would agree that if AOL draw a line under the whole experience that DMOZ could survive on an independant basis, well certainly not in its current format. For one thing it requires someone or company that would take the financial liability on and in its current format that would be pure madness imo.
From this point onwards, IMO i think for it to survive it would need to go commercial like business.com or yahoo.com and charge listing fees or generate an income in some other way, sponsored adverts etc?. unless anyones got a better idea for it to be self funding.
I would be interested in the editors views that post here. If the directory went independant and started charging listing fees would you still wish to work with it or do you now feel that the directory has reached its natural end anyway and perhaps should now be consigned to history anyway?.
Also, do you editors feel a bit cheated that "if the DMOZ Rip" blogs information is correct that you have been kept so much in the dark over this issue? - do you think AOL could have kept you better informed? - do you disagree with the blogs and still hold out hope that matters will continue on as before?
What ever the outcome of this saga, imo it looks like dmoz is going to have to change its format one way or another and i would agree with the blogger that if it can't it will finish - would you agree?
(1) As for editors being in the dark or misinformed: I read skrenta's post, and I read the internal forums, and for that matter, I read these forums. And the more technically astute readers here were thinking pretty much along the same lines as my own mental picture of what was going on. It was obvious almost from the beginning, to any technically astute reader, that there were problems with the backups. I even mentioned that issue several times in public forums. My take is that rich misunderstood both what was said here, and what skrenta said, to make a much bigger difference than actually was. I don't know that skrenta has any more information about the current situation than we editors have, or has given any really significant details. As I said, I'm fairly technically inclined, and the additional details did not add anything to my understanding of the situation. It's clear enough that the problem wasn't all hardware, and it wasn't all operations. But both had problems. (Someone described it as a "perfect storm", and for anyone who hasn't personally sysadmin'ed a large heterogenous industrial-strength network, I think that conveys more information than any number of technical trivia.)
Skrenta's opinion on the organizational cause has some truth in it, no doubt (that had been an ongoing concern among editors for years), but I'm not fully of his viewpoint, and I don't think he has the whole picture. (For instance, even in the skrenta days the ODP never had a failsafe server. That was something the AOL folk were planning even before the outage.)
I'm sure that anyone who has extensive experience working with a large corporate sponsor for a non-income-producing activity has a useful opinion about what it takes for such a venture to be successful. That leaves out me and Rich both (and for that matter, skrenta is more of an entrepreneur spirit), so I'll just keep an open mind until I meet someone WITH a clue on THAT subject. But I know this: people who do know more than I do, seem to feel that finding another sponsor would not be difficult. (I've claimed otherwise in this forum before: so this is a tentative retraction of what I said before.)
But corporate sponsorship is not the only option. Wikipedia and Project Gutenberg are independent, and if it looked like AOL was about to abandon the ODP -- which it doesn't -- editors would be considering that option.
(2) Paid site reviews is still dead on arrival. Yahoo does that, and so far we've done better (at what matters to us) than Yahoo has done. We'd have to be insane and stupid both to give up what has worked better, in exchange for what hasn't worked as well! And that would be catastrophically disruptive for the community, most of whom signed on for something else. And besides, you'd have to be a pretty devious shyster as well as crooked to try to break the Social Contract that way--I don't think the ODP has the talent to do what. (There are probably some reasons I overlooked, but three impossible things is as much as I can manage after supper.)
But editors have been discussing what changes might be made. "Going on as before" is an option, but the outage has given us time to think about other possibilities. Of course, as anyone who's ever talked to an editor knows, the most popular change would be getting rid of site suggestions altogether. That's not the approach I prefer, but I could live with it. (I don't think the Social Contract would permit it, by the way.)
The challenge, as always, is to find a way of making use of information from people (site suggestors) most of whom are unreliable and some of whom are downright malicious. Is there a way, I keep asking myself, to let site suggestors build reputations? Or are frequent suggestors so invariably spammers that the issue doesn't arise? I haven't ever come up with a good answer. Other people are asking similar questions, though, and someone may yet come up with a good idea.
(3) The ODP isn't like a ponzi scam. It doesn't have to keep growing to remain alive. It has the flexibility to grow or shrink as circumstances change. The real threat would be a more effective methodology with a similar product -- and so long as no such thing exists, there isn't a threat from that direction.
Thank you also to everyone else who has contributed.
I think there's hope for the project. Perhaps a transfer of "support" to Google? (Frankly, if they don't jump on this then that other bright guy - the one whose company picked up Ask.com - might be savvy enough to step forward.) Perhaps a new version or even two new and 'competing' versions of the project, one with editorial review and one with some version of 'member review'. Evolution and keeping the best of what already works.
Interesting times. I smell life if the project is willing (and a whole bunch of other things needed to sustain life) but not compromise of the basic tenets of the project.
It's something I do in my spare time, for fun. My work's still up there, available for anybody who happens to be interested in one of the same topics I am to use.
I'm not a tech person, but the explanation they gave in the internal forums for what went wrong seems plausible to me. I've been working with computers for a long time now, and it doesn't faze me.
Bottom line is, none of MY work was lost. I feel bad for the people on the tech end, having to reconstruct stuff from RDF's or whereever to the point where we can resume editing like we were before, in whatever (inscrutable to me) way they do that. But all the categories I organized and descriptions I wrote are still out there as we type, and they'll still be part of the new model, so what's there for ME to feel cheated about?
If I was a tech person on the project I'd probably be pulling my hair out, of course. But as it is, I'm just looking forward to seeing whether we get any neat new functionality out of this once everything's rebuilt. If AOL was going to ditch the project, I think they would have done it as soon as the big crash happened, and they didn't do that, so I'm not too worried. And besides, even if worst did come to worst, the archives aren't going anywhere. Anyone who needs to know about an obscure author would still be able to find everything I've organized and categorized about him if they want.
So no, not too stressed about it, to tell you the truth. *one librarian's two cents*
Many development functions are available again. Not everything: work on data reconstruction and network issues still continues in parallel.
My "feeling" of the state of the system: I would not be surprised at brief recurrent outages for awhile: we're operating on new servers and a new network configuration. I'd expect more functionality to show up in drabs and dribbles, whenever the techs are confident all the necessary code and data are in place (that is, not on any fixed schedule.) I'd be surprised if site suggestion functionality didn't return, but even more surprised if it returned quickly (i.e. this year.) (But I've been surprised before.)
As I don't believe there IS a schedule, I think it's pointless to ask what it would be if it existed, which it doesn't. It's the mode of "fix the problems, one or two at a time, beginning with those which are most critical to the quality of editors' work, taking as long as it takes and finishing as soon as you can."
With all this uncertainty, it's time for this billion dollar company to decide if they still want to rely on what's now become an unreliable source of information. I suspect that Google will move away from Dmoz and this will be the final blow to this once important web resource.
Would a search enterprise be wiser to adopt and support the theoretically competing model or would they be wiser to shun it?
Does the idea of "keeping ones enemies even closer" make sense in the search world? Why let the DMOZ drift into someone else's gravity?
Might a supporting search enterprise 'work with' DMOZ in a mutually beneficial manner? How?
I have a rough instinct that the wiser enterprise would pull DMOZ into its orbit, even at risk of some measure of competition. Why?
1. There might be some data that could be shared that wouldn't be part of the RDF files that could be of value.
2. There might be a partnership in emerging technology that could be more fully vetted by a volunteer staff of 1000s of volunteer editors.
3. There might be some patents that could arise from the relationship, where revenues might be shared.
4. In a search world that is laboring to define 'signals or signs of quality' it boggles my mind that some search player hasn't sought to capitalize on a dialogue with a volunteer staff that has been charged with just that task for years.
5. Want to (better) filter spam from your SERPs? I would imagine DMOZ has some data that might help.
These are just thoughts off the top of my head but my basic instinct is that there's value to be yielded in a well managed and supportive relationship with a volunteer project of this size and history.
If Barry Diller is still on his game he and Ask.com might be the angel 'investor' for finding value and mutual benefit in this project.
[edited by: Webwork at 3:57 pm (utc) on Dec. 19, 2006]
$50.00 per submission ( 2 editors vet and if not acceptable NO REFUND and no dispute )
Money from submissions would fund better H/W S/W and xx full time editors who not only surf to find sites for inclusion , also to check current sites for still being correct and keep a check on spam inclusion.
The concept of DMOZ is still needed but funding should be found and by charging for submission would help to grow DMOZ and could even be used to tell Joe Public about the resource
If they do go G route or any other big boy the chances of the site being forced to change is greater
My own view is G had a lot of help early on with DMOZ providing seeding so it would not do G any harm to dip in the pocket and just give DMOZ one or two million with no strings attached
Noone is searching DMOZ to find content - it only exists because a bunch of internet academics with too much time on their hands refuse to accept the reality of what is happening with the internet. In my opinion, DMOZ is mid-90's technology that should have died a long time ago. I hope it does go away and the sooner, the better.
[edited by: Webwork at 5:59 pm (utc) on Dec. 19, 2006]
On top of that, maybe even creating an internal PPC engine with minimal text ads...that would be enough to retain 5-7 full-time people who can resolve basic problems, like disaster recovery and programming issues. Once that's done long-range planning can help turn it into something more web 2.0ish where it can be a mix of editors/visitors who run it.
Save the few 100K to buy some Sun servers. Splurge for a backup server if you really want to. ;)
Strange. In all these years noone has tried and succeded doint just that.
The problem is that, from a business perspective, there's little it can really offer compared to other, more profitable ventures that most organizations are working on.
I see it from a different perspective. The ODP has a lot to offer a prospective buyer. I believe the official launch date was in 1998 June. That's 9 years of site reviews by humans! I think it would be rather difficult to replace that. There is a solid foundation there to start with. We all know that certain categories need to be restructured, that one is a no brainer. But, for the most part, the ODP is a diamond mine of Data and Editors.
Human Reviewed Content Rules!
It's still an important layer combined with algorithmic results.
Most people "google", not "dmoz". The only real use is from the search engines and SEO's. Search Engines use it because of the human approval of websites listed, so they include it in their algorithm so they don't have to pay humans themselves to do it and haven't gotten their algorithms good enough to do it automatically. And SEO's only try to get their sites in it because they know it's used by the SE's in their algo's (plus it's a link). The normal average surfer doesn't use it, and a lot don't even know what it is.
If the search engines didn't use it in their algo's then no one would care other than the editors themselves? Why are the editors doing this valuable service for the SE's for free. I can understand and do understand the need for DMOZ 10 years ago. But, in today's web environment, all I see are the editors doing free outsourcing work for the SE's.
>Most people "google", not "dmoz".
This could be said about almost any site: most people don't visit it from one year to the next. Are all the sites in the world EXCEPT the Alexa top 100 "of no usefulness?" Do we dump all the llama-breeders' sites because most people will never buy or rent a llama? Are all the books in the library except the top-10 bestsellers "of no usefulness"? Should we go through and discard volume "V", "X", and "Z" of the encyclopedia because the top five volumes get 80% of the use? Should we take all the classical music radio stations off the air because only 2-5% of the population cares about that?
There's an attitudinal difference here. If the ODP can be the best internet resource for some task for just 2-5% of internet users, how many sites on the web can say more? A hundred? A thousand? absolutely not more than that. That's still in the top one-thousanth of one percent of all websites.
And each person who doesn't use the ODP -- doesn't impact its value positively or negatively. All that affects its value are the people who use it.
This is true for any site: and even in the real world a rarely-read book doesn't detract from the value of the library (unless the library runs out of shelf space, which is not an issue online).
That is a tautology at best--but usually just a delusion.
The ODP replaced directories (like Lycos, Netscape, etc.). And Looksmart replaced directories (like Excite, Altavista, etc.) But ever since the last millenium, search engines have considered search to be a complement to directories (or vice versa.)
It's funny, webmasters in here will set up multiple sites presenting the same material -- to focus on DIFFERENT TARGET AUDIENCES -- but can't see how directories and search engines might serve ... different target audiences.
There is no silver bullet. In every field, whether carpentry, rocket science, civil engineering, or internet use, the skilled person will know how to use a variety of tools; and will know which tool works best in which situation. What would an auto mechanic say if you told him he had too many tools, he should get rid of the ones that he only uses on 5% of cars?
Don't take my word for it: just go try it. Then, with two or three words substituted, you'll know what the intelligent internet user would think of a proposal to remove the secondmost (or, at worst, perhaps third- or fifth-most) useful tool for internet indexing?
This could be said about almost any site: most people don't visit it from one year to the next. Are all the sites in the world EXCEPT the Alexa top 100 "of no usefulness?"
My post wasn't trying to say it's not useful, I was trying more to say it's usefulness is on the back end, not the front end. Back end is usually expense and supports the front end. In DMOZ's case, it is backend staffed by unpaid volunteers supporting front end's of other revenue generating SE's. None of which are willing to give DMOZ any revenue or even any of the proper operating expenses. AND the editor's seem to be supporting this model. That's what I don't get.
I understand why the SE's don't mind using DMOZ as a backend, that's easy, it's free, except for AOL (but they've sure tried to keep from spending the money). I don't understand why the DMOZ community (primarily the editors) allow it. Even if owned by AOL, very few corporations will turn down projects that turn pure expense into income, even if it doesn't completely cover all the expenses.
The SE's will use the "free backend dmoz" until they feel their algorithms can handle the rest or until it dies (which the SE's won't be harmed at all from). DMOZ is just being used and abused, and the editors seem to defend this model.
And no, I'm not a dmoz or directory hater, I'm just giving my viewpoint on the situation, what I feel got them there, and suggest that they need to turn the dmoz into a front end or create a front end for it. A front end that supports the back end.
Maybe it's just a case of the model changed without the editor's realizing it or wanting to acknowledge it, but the model has changed.
It's riddled with problems and not worth anything to anyone. If a big player like Google is going to enter this business, they need to do it the right way. By the time they clean out these problems, they might as well start over with something new. It's a dinosaur from a technology standpoint. Dead links, redirects - it's a nightmare to maintain in its present form.
It served its purpose, but it's time to move on and thank those involved for their heroic effort in building this directory.
You guessed it: I had a bone to pick with these people. I admit it. I'm only human after all.
The lack of a backup is symptomatic. The ODP seemed to have become a playground for amateur bureaucrats.
Two minor points:
ODP editors (who do the editing) are not the ODP's system administrators.
AOL Operations (who do the system administration) are not the ODP's editors.
I guess you're talking about the editors when you say things like "amateur bureaucrats" and "well-meaning 'librarians'"- but what does this have to do with system administration? Or were you talking about AOL Operations? Or both? :-)