Welcome to WebmasterWorld Guest from 22.214.171.124
You can go to the resource zone and read the rationale of the DMOZ editors as written by meta-editor Mr. DonaldB.
I have to say that I agree with them/ him. It ends up becoming a source of frustration for submitters and editors alike - asking them what's going on, them saying nothing but it's "awaiting review, feel free to ask for another status check in 6 months."
There may be some reassurance for a webmaster to know that their site submission application hasn't evaporated into electrons... but that's about it.
My own site took approximately 5 months to be included since submission - and since it was my first site, I'll admit that I became far too obsessed with what DMOZ thought about it.
On the face of things, this should make it easier for those editors who volunteered time at the #*$! to spend more time editing.
And even if there was, it would not be made available for public use. Check RZ for the numerous questions about this, which get the same answer again and again.
Even the closing down of the status requests does not stop that question being asked there.
there is no way for anyone, including an editor, to check the status of a site without knowing the exact category it was submitted to
That's not true.
The problem with letting submitters log in and track the status of their submissions is that it gives spammers a better look at what tactics are working and which aren't; making the problem larger.
Such a system could dispaly "Pending", "Added", or "Deleted" depending on the url's current status, but most of this info could be gleaned by looking in the category and/or a search. Any additional information would be used to abuse the system.
Webmasters who submit zero-content sites could read the guidelines and realize their site isn't likely to be added.
I was just saying that you can still check it, but you're right, if it hasn't been moved or otherwise modified you won't be able to tell if it has been deleted or is still just unreviewed (you can't even be sure the person asking gave you the right url).
So I guess what I meant was that with the url you can get some info from the logs and if you see nothing you can look at the site real quick to make an educated guess as to whether most editors would consider it for inclusion or just delete it.
"...the dmoz system does not use a database is the strictest sense of the word. Data is saved in lots of flat files using the disk's directory structure..."
AFAIK, DMOZ was started three years after, for instance, MySQL was started, and I'd imagine there were other free or cheap SQL DBs available.
In any case, it probably wouldn't be that difficult to create a layer that would take SQL commands and store in flat files or take their flat file API (if they even have one) and store to MySQL. And, it probably wouldn't be difficult to take a snapshot of the various queues, put them on another computer(s), and use that to dole out the information to submitters.
But, I thought of a way to do this that I think would make everyone happy. I filled out DMOZ's feedback form asking that someone contacts me. I'm not expecting much but you never know.
mySQL may look great now, but in 1998, it might have been an unknown.
It would be great to upgrade the whole database, but who will pay the bill. I think it's more than a days work.
Remember it's not just the database, but all the controls to manipulate it, extract the RDF, etc., including a large number of editor developed tools that have become indispensable.
However, most of the time I use Java's or the OS'es API to do the drawing.
I'll simply say, "draw a line", and Java or the OS will take care of it for me. Java will do the same: hand it off to the OS. The OS will then hand it off to the driver for the user's monitor, and the driver is the actual software that does the drawing. Each layer is independent, and only knows what it needs to know. Each layer top doesn't try to reach lower down and, for instance, draw directly to the screen.
Likewise with the ODP software. I realize it's from Netscape, but there's a good chance they designed an API for the file storage.
That means all those tools should only be dealing with the API. A tool that saves submissions should save them through the API, rather than accessing files directly.
I don't know if that's how it works, but that's how I and most other software developers would have done it.
So, as long as there aren't hacks or strange dependencies or the like, hopefully a new implementation of that API could be written that would use a DB instead of files.
I've worked with databases for many years; I've worked on very large, very complex, very heavily used databases. Every job I've ever had, at some point I've run across a database problem where performance was critical (and ordinary competent programmers had produced an unusable system) where I could speed up the critical section by a factor of 100 or more. (That's not 100 percent, that's one hundred times.)
So I think I have a feel for the scope of the problem. The ODP databases have about 100 million records; there is an average of about an update per second, (peaks an order or two of magnitude above that) and easily hundreds or thousands of accesses per second, not even counting the user-based search (there are actually three different search capabilities.)
The original ODP back end designers considered using an off-the-shelf database, and rejected it for performance and reliability reasons. Obviously performance and reliability have improved rapidly (at least for Open Source Software -- Microsoft product performance tends to drop 30-80% every major release, and reliability is what it's always been -- frequent crashes guaranteed.) So I don't know if the same decisions would be made today. SQL is convenient for people who don't understand database design, and usable by people who do, but performance has never been its strength... And, of course, "using flat files" doesn't mean what someone who HASN'T written a custom database engine thinks it means.
Anyway: assuming AOL wanted to put a lot of expert database programming time into the ODP, it is not at all clear that rewriting the underlying "custom" database would be a high priority. It is absolutely certain that going to SQL for the "flexibility" gain could have catastrophic performance implications: that "flexibility" works great for one-off reports on tiny databases, but for anything you're allowing editors (or worse, surfers) to generate on the spur of the moment, you'd better have either tuned indexes or a spare googleplex.