Forum Moderators: open
"How soon is soon? The ODP is a complex bit of software, and I'm constantly amazed at how well it does what it does. I have confidence in the programmers (and as a programmer, I can say that is something I DON'T have in 95% of the programmers I know)."
Oct 31, 2002 rafalk wrote:
"With regards to the rdf dump, it is *supposed* to be produced on a weekly schedule. Right now it's five weeks behind."
NOW
Do they need a new programmer?
A month ago it was announced they were down to 11 errors in the database and they were hand correcting it. Then in the DMOZ newsletter it was announced that as we read the newsletter the RDF Dump would be finished.
I have contacted Autumn a couple months ago and she said that the dump would be ready as soon as possible.
The dump is not only not available, but there is no information being given out. The Data Users are the only source of on-site advertising for DMOZ and it's recruitment of editors. Netscape states that Data Users are it's first priority...in return for the attribution they place on every page of their sites.
Google and every other Data User should be very angry about Netscapes poor performance in such a valuable resource. If I were the founders and the editors I would demand an accounting, as a Data User I have no voice and Netscape has demonstrated absolute contempt toward my business problems.
The people in power at DMOZ would have us believe that the lack of a Dump poses little problem because Google crawls. They keep saying that SOMEDAY there will be a dump and a search database on DMOZ. I now fear that the dump will be done, but changed so that the software that the Data Users have purchased will not parse it.
I find it interesting that Netscape remains silent about this disaster. The former DMOZ Engineers maintained a message site for Data Users, giving out information and responding to questions...and warning about changes. This site has been abondoned.
And the broken links is still there ... dead DMOZ ...
I would guess that the last RDF dump Google got was from the 19th of September.
Also suggest reading Google & The ODP [webmasterworld.com]
The people in power at DMOZ would have us believe that the lack of a Dump poses little problem because Google crawls. They keep saying that SOMEDAY there will be a dump and a search database on DMOZ. I now fear that the dump will be done, but changed so that the software that the Data Users have purchased will not parse it.
Dumpy, the Dmoz RDF is RDF and the tags are published in the same directory as the RDF. If a new tag is added, then sometimes the software used to parse it has to be updated. This is, or was, a fairly regular event though it is not the doomsday type affair that you seem to consider it. (I don't know if you actually have written a parser for the RDF or have just used htDig to create your version by spidering Dmoz. I wrote a parser/database/static html page publisher in order to use Dmoz last year. It was not that complex and only took a few hours. ;)) The core structure of the RDF - the structure and content are the important aspects and all of the parsers tend to work on those.
Google apparently does spider Dmoz as fodder but it may just use the RDF to produce its directory.google.com site. Thus a site may appear in Dmoz's directory and it will be picked up by Google. It could take anything from a few days to a few weeks before it appears in Google. Google may use freshbot on frequently updated parts of Dmoz but you would have to ask Googleguy about this.
The tags listings from the failed updates seems to indicate that Dmoz is trying to streamline the directory structure and eliminate the linkswamps and cybersquatters rather than making sweeping changes to critical tags. While this may alter the structure of some topics and their subpages, it will probably not break the parsing software that the data users use to import the data. If it did affect the software then the data users who use POD and similar scripts to create a live directory link to Dmoz would be the first affected.
Regards...jmcc