Forum Moderators: open
Do you know of any software/scripts that will parse RDF dumps into a database and then display it on the site? Something usable and customizable.
Processing the RDFs from Dmoz is easy enough. And plug and play solutions based on php or perl are available - these pull the data from Dmoz dynamically thus saving you the hassle of developing your own system. (I could not find any decent programs to process the RDFs into MySQL so I had to write my own.)
Regards...jmcc
Just my opinion of course... I personally would build my own directory instead both because it offers something for the user that's not identical to something they can get elsewhere, and it's probably safer in the long run. The way Google totes "original content" I wouldn't be surprised if they eventually dropped dmoz clones out of their index.
>Will it rank high in SEs?
Why would it? There are ready 10,000 DMOZ clones out there. If you're linking to it from a high PR site, sure, but otherwise probably not.
Well, the difference would be that I will remove all the dead/expired sites from my directory and also allow users to add their sites right into my directory.
On a small scale directory this is easy to do. However Dmoz is a rather large directory with at least 3.5 million links. The problem is that cybersquatters tend to target domains that are in the Dmoz directory so that a domain can change hands between Dmoz updates. This is a big problem with using the Dmoz dataset. Dead sites are easily flagged as such but if they are reactivated then they have to be checked again. The whole process of dead/expired/active checking has to be run on a continual basis and it has to be highly automated to be effective.
Everybody seems to think that running a web directory is easy until they try it and learn otherwise. On a small scale it is relatively easy but when you get to country level, it can become a fulltime job.
Regards...jmcc
Palehorse: I found the soft, but it doesn't seem to support DMOZ dumps. Did you write that yourself, or they actually include it, but don't say it on their site?
I realize the difficulties involved in running a directory. Maybe not to full extent, but I know that it's not an easy task. And know that it needs continuous development and support.
Good. In that case welcome to the club and good luck with your directory. :) Gossamer Threads is a good program and there is a flat file version available for non-commercial/personal use. The Gossamer Links product is a MySQL based version but it is expensive. However it has a lot more features and is held in fairly high esteem by those who use it. There is an earlier version, Links 2.0 which has a lower licence fee.
Dmoz seems to be getting its act together with updating the RDF files every week or so. The next update should be tomorrow. If your categories are not frequently updated, it may be better to write some kind of low speed crawler that will check the Dmoz pages for updates (the Page Last Updated data is included at the end of each page on Dmoz) and then use a web orientated system to pull the data in. The alternative is to slice your categories from the main content and structure RDFs, check your categories against the 'last updated' data and update accordingly.
Once you have the data in MySQL, it is then just a simple case of generating pages from this in whatever language suits. The hard part is trying to figure out what the Dmoz people were up to when they created the structure. It seems to be a case of fossilisation rather than a coherent structure.
Regards...jmcc
Copying the entire DMOZ is not effective in my opinion, but using the data to get a local or a content specific directory off the ground is what DMOZ is partially made for. It is Open Source data for sharing and using.
The gossamer-threads folks have a couple of good programs that work very well to set up directories that are interactive, i.e. people get to edit and add their own links. Sourceforge has some open source scripts that you should be able to modify and use also.
Don't set them up expecting to make big bucks, but they should pay for their own server space after a couple of months of being around.
it may be better to write some kind of low speed crawler
Good idea
But let me highlight the term low speed -- it's as per their robots instructions:
[dmoz.org...]