Forum Moderators: open

Message Too Old, No Replies

Importing ODP RDF Dump into MySQL

Need a script.

         

Tobsn

10:26 pm on Feb 15, 2002 (gmt 0)



Is there a way to import the RDF dump from ODP
into MySQL or a MySQL supported format?
Please PM to dmoz@tobi.li
Big thx!

greets,
Tobsn

heini

1:40 pm on Feb 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Tobsn, welcome to WebmasterWorld!

I havn't tried it , but you could check out the sourceforge section here:
[sourceforge.net...]

If you find a good one, please report back :)

Brett_Tabke

3:27 am on Mar 7, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Nope, I've not seen any that can do a mysql dump at all. The ones that dump to flat html files work pretty good, but that is some serious disk space.

nosanity

10:07 pm on Mar 7, 2002 (gmt 0)

10+ Year Member



I took a little time before to try to import ODP data into a MySQL database, but dropped the idea when my idea required a very large quantity of memory.
Today it dawned on me though. If you were to read only information between certain tags (considering ODP data is just XML), insert that data into a database, then read the next "set" of data, the memory requirement would be quite low. However, the insert and update process would be quite CPU intensive. It would be quite a huge number of loops interpreting data, and inserting into a database, then downloading the new ODP data to update the database. Several million loops. Now, the next task in my head is to lower that.

If anyone has read much about the new AMD CPU technology, you will get a better understanding as to what I am talking about.
[amd.com...]

If you were to have a main process running as the heart, which controls two other processes (quite similar to the North bridge and south bridge). The NB process would control the memory handling, and the data interpretation, while the SB would handle the file input and database output. You could then split the processes down the center by dividing the file in half. This would be quite similar to SMP. (Yes, I am a geek, I know) The application would be a huge project, but I believe this would cut down on CPU cycles, memory usage, and amount of time it would take to import the entire ODP database.

Also, considering that MySQL is one of the lowest end database applications, you might want to considering using at LEAST PostgreSQL or Interbase. I would use Oracle.

Well my two cents, anyone care to try the development? *grin*

-noSanity

bird

10:52 pm on Mar 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you look at perllib/Dmoz/rdf2db.pl in the odptools? Seems to be exactly what is asked for here:

SYNOPSIS
rdf2db [-d] [-f] [-i] [-p] [-c] structure¦profiles¦content¦content_about

DESCRIPTION
This program downloads any rdf from dmoz, parses and imports the contents into MySQL table of the same name as the rdfname given at the command line.

nosanity

11:03 pm on Mar 7, 2002 (gmt 0)

10+ Year Member



FYI: Anyone looking at ODPTools on the sourceforge site, will notice the files section is empty. Try using CVS. Also, it should be noted that they are quite confused as to what stage they are at. Planning, Pre-Alpha, and Production all at the same time. Don't get as confused as me. :)

-noSanity

amoore

11:10 pm on Mar 7, 2002 (gmt 0)

10+ Year Member




Well my two cents, anyone care to try the development? *grin*

sure. Anyone willing to pay for it? I've done one top level of the directory before. I don't think that doing the rest would be too bad. It does take a while, though.

goa103

3:37 pm on Apr 25, 2002 (gmt 0)



Hello,

I have a few questions about ODP Data and RDF Dump:
1) What kind of DB can support such a HUGE amount of data (I think MySQL can't handle a 500Mb database)
2) Is there any ODP Parser or Converter available (a serious product, not a small php or perl script)

Thanks,
JM

bird

5:20 pm on Apr 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think MySQL can't handle a 500Mb database

What exactly makes you think so?

How Big Can MySQL Tables Be? [mysql.org]

MySQL Version 3.22 has a 4G limit on table size. With the new MyISAM table type in MySQL Version 3.23, the maximum table size is pushed up to 8 million terabytes (2 ^ 63 bytes).

[...]

This means that the table size for MySQL databases is normally limited by the operating system.

jatar_k

5:24 pm on Apr 25, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



welcome to webmasterworld goa103

I can attest to the fact that mySQL can handle those sizes, I have a DB in excess of 1 GB and no problem.

goa103

7:36 pm on Apr 25, 2002 (gmt 0)



to jakar_k

Thanks! Well I guess the things I heard are only rumours. But I wonder if you have REAL example, company, working site of a BIG MySQL DB. Who use it ? I thought pro only used Oracle, Access...

Maybe I should talk about it somewhere else ? There's no forum on MySQL.com. I have so many questions about DB: backup, security, ...

nosanity

5:58 am on Apr 26, 2002 (gmt 0)

10+ Year Member



Access? Now that is funny. Oracle is designed for huge databases, but you cannot leave out MySQL. It does not have nearly as many features, but it has never crashed on me, and recovery from someone else's database has always been easy. Another database I would not count out is Interbase or PostgreSQL. Take a look at all three, the price is much better than Oracle.

noSanity

goa103

2:18 pm on Apr 26, 2002 (gmt 0)



Hi,

Well I used Access for a game project in a very big company and it perfectly work. but we were only 20 on the project :). about Postgre... it's not free (I think) and It doesn't work under Windows :(. I think MySQL is a good deal because it's free, powerful and moreover easy to install. setup.exe :). I just want to know if some pro uses it to manage complex project...

nosanity

9:37 pm on Apr 28, 2002 (gmt 0)

10+ Year Member



I have of course set up a customer database system and scheduled report manager using MySQL for an SEO company. It works wonderfully. Roughly 90k records throughout all the databases I have in there.

-noSanity

DerOle

3:31 pm on Apr 29, 2002 (gmt 0)



I've written an ODP Data parser, which parses the RDF fiel and inserts it into a MySQL DB.
It currently parses only the structure, maybe the contents is coming later.
You can get it here :
[ohardt.com...]

Bye,
Oliver

quickhollywood

12:51 pm on May 6, 2002 (gmt 0)



well after hours of trial and error, I finally managed to make a makeshift parser. Well the parser part was easy, but wanted my parser to handle the UTF-8 to Unicode encoding and actually create the directories and break up the xml into ADO recordset save xml so I can access them that way.

In any case, it seems some sort of easy parser is needed... I know I've spent plenty of time looking for one. If those that are currently working on one doesn't have one finished, I'll try and wrap mine up into a user friendly application.

[quickhollywood.com...]

TeddyBare69

5:33 am on Jun 2, 2002 (gmt 0)



I had just started working on this when I seen your message. I don't know if what I have already is what you are looking for. I have php code that will insert the content.rdf.u8 file into two tables in mysql. It still has one major data issue. Doesn't really like dmoz's formatting. Netscape has that problem with all of there products though. I don't know if I can just past the code here on the board or not? If somebody can tell me that it is ok to past the code or give a better way of making it available.

Currently it gets about 200,000 links with 14,000 categories before it finds an xml formatting issue it can't deal with and stops. The nice part is it does this in 9 minutes.

Once I find a fix for the code I will post a messages again letting people know were it is and how to get it.

DerOle

11:51 am on Jun 2, 2002 (gmt 0)



Hi,

dmoz/odp is having some problems with the RDF dump. It contains some strange bytes which redenrs the XML data useless cause every parser will report "not well formed" errors.

I've written a small java program which removes these bytes so you can parse it later.

You can get it here :

[ohardt.com...]

I can also help you with publishing your php code.
I have written such a parser in php too some time ago, but php had major problems with handling the 900 MB file. Some strange errors occured, that's why I switched to Java.

Bye,

Oliver

TeddyBare69

2:03 pm on Jun 2, 2002 (gmt 0)



Funny story, I started with your program mainly because I didn't want to write it myself. I have programmed multiple apps in Java but not for a long time and didn't like it very much while I was doing it. When I was trying to use your app I couldn't seem to get the right jar files in the right places to get it to work. That is when I did a quick search for php and xml parsing and found a couple really good examples and within about 2 hours I was parsing xml for the first time. The learning experience alone was worth the two hours. As I am looking at the structure file I am thinking I have a few more hours of learning to go though.

I am curious about a performace comparison between php and java. How long does your java application take to insert the content and the structure file? I will take another look at your code and see if I can figure out how to put the data integrity fix into my php code and then we can do a quick comparison. Java coding and PHP coding have about the same turnaround time for cranking out functioning code but if one is faster running than the other it would be learning to like it. As you read in the first message, I have a server already setup with Postgresql and MySql and PHP that has no production on it, yet. I will definately let you know if I get stuck in any major way.

Tobsn

1:33 am on Jun 24, 2002 (gmt 0)



www.s3x.biz
done.
;o)
i parse it with php4.5-dev (spezial edition only for me.) under win32 with expat.
On s3x.biz is just a "demo" db with max(5) items per cat. The parsing needs just 1 hour for the full content _and_ structur.

greets,
tobsn