Forum Moderators: open

Message Too Old, No Replies

DMOZ RDF dump problems

Old and bad RDF files

         

slovenac

4:33 pm on Feb 11, 2003 (gmt 0)

10+ Year Member



I am trying to make use of dmoz rdf dumps but they seem to be very bad and old.

RDF is bad xml, no parser will parse it, some attributes miss the closing ", probably because of strange encodings for some languages.

Some resources have duplicate entries in the dump.

Dumps are as old as hell. In tags.html is writes 'Rdfs not pushed'. Does that meen they are not dumping because of the problems.

Can anyone please direct me to more info on these topics, or comfort me with some gentle words of soon recovery.

Slovenac

tschild

4:58 pm on Feb 11, 2003 (gmt 0)

10+ Year Member



Do you refer to the newest RDF dump (without catid tags) at [rdf.dmoz.org...] or to the old files at [dmoz.org...]

As to the closing quotes, are you sure your parser is UTF-8 aware?

heini

5:09 pm on Feb 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome at WebmasterWorld to both of you!

The problems with the RDF dump has been one of the hottest ODP related topics over the last weeks here in this forum. You'll find lots of info here, with this thread being one of the lot:
[webmasterworld.com...]

g1smd

8:48 pm on Feb 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The format of the ODP RDF dump is NOT the same as the format described by the W3C RDF standard. The ODP version uses a somewhat earlier (and now non-standard) format.

slovenac

9:02 am on Feb 12, 2003 (gmt 0)

10+ Year Member



Do you refer to the newest RDF dump (without catid tags) at [rdf.dmoz.org...] or to the old files at [dmoz.org...]
As to the closing quotes, are you sure your parser is UTF-8 aware?

I tried Xerces-C++, version 2.2.0, and Expat, they should be ok with UTF-8, but they break. I am using the old files from dmoz/rdf.

This is what Xerces SAXcount utility writes out:

Fatal Error at file E:\DMOZ RDF\content.rdf.u8, line 4691225, char 31

Message: Invalid character (Unicode: 0x3)