Forum Moderators: open

Message Too Old, No Replies

ODP RDF Dump...using a small section...

Want to make static pages from one subsection

         

Canton

7:09 am on Dec 14, 2003 (gmt 0)

10+ Year Member



I'm a bit new to the process of taking an RDF data dump (ODP) and using it on a site, but what I'd like to do seems that it should be simple...

I'd like to take one second-level category and use just that portion of the ODP to make static pages to add to my site (a personal project), and then update it every so often with new static pages (not too often).

The only problem is that the files available in the rdf section of ODP seem quite jumbled together, with no easy way to determine where the second-level category I want is located...

Any experiences with grabbing/using a small sub-section of data from the dump that doesn't include parsing it into a dbase, etc.?

~Canton

choster

10:11 pm on Dec 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unfortunately, it's not easy to do because all of the sections of the directory are crosslinked with one another-- any subcategory labeled with an @ indicates that it is a virtual subcategory whose actual location is in a different branch altogether. You'd either need to exclude those links and remove their labels, or develop some other point where the parsing would "stop."

For smaller projects, many webmasters use screen-scraping scripts (many commercial and noncommercial tools [dmoz.org] are out there).

Canton

12:34 am on Dec 16, 2003 (gmt 0)

10+ Year Member



Thanks choster - offhand, can you specifically recommend a good commercial script that does this? If so, and you have a minute, please sticky me.

Thanks again,

~Canton

jmccormac

4:49 am on Dec 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The content.rdf and the structure.rdf have to be used together. The sub levels are clearly defined in the structure.rdf and content.rdf so it is possible to slice at the upper and lower limits of the slices you require in both RDF files. However I am not sure that you could parse the data efficiently from both files to generate static pages. It is possible but it would take a lot of regexp work. It is simpler to parse the data into SQL and load it into a database. Then you could use a page generation script to generate static webpages from the database. (This is essentially what I do with the Irish section from the RDFs.) Luckily the Irish section of Dmoz is very small - approx 10k websites.

If you only require a few pages then it may be a simpler thing to download the relevant pages from Dmoz and slice and dice the HTML. Or perhaps an even simpler solution would be to use some kind of Open Source php or perl program that actively takes the data from the Dmoz website and integrates it with your website.

Regards...jmcc