Forum Moderators: open
greets,
Tobsn
I havn't tried it , but you could check out the sourceforge section here:
[sourceforge.net...]
If you find a good one, please report back :)
If anyone has read much about the new AMD CPU technology, you will get a better understanding as to what I am talking about.
[amd.com...]
If you were to have a main process running as the heart, which controls two other processes (quite similar to the North bridge and south bridge). The NB process would control the memory handling, and the data interpretation, while the SB would handle the file input and database output. You could then split the processes down the center by dividing the file in half. This would be quite similar to SMP. (Yes, I am a geek, I know) The application would be a huge project, but I believe this would cut down on CPU cycles, memory usage, and amount of time it would take to import the entire ODP database.
Also, considering that MySQL is one of the lowest end database applications, you might want to considering using at LEAST PostgreSQL or Interbase. I would use Oracle.
Well my two cents, anyone care to try the development? *grin*
-noSanity
SYNOPSIS
rdf2db [-d] [-f] [-i] [-p] [-c] structure¦profiles¦content¦content_aboutDESCRIPTION
This program downloads any rdf from dmoz, parses and imports the contents into MySQL table of the same name as the rdfname given at the command line.
I have a few questions about ODP Data and RDF Dump:
1) What kind of DB can support such a HUGE amount of data (I think MySQL can't handle a 500Mb database)
2) Is there any ODP Parser or Converter available (a serious product, not a small php or perl script)
Thanks,
JM
What exactly makes you think so?
How Big Can MySQL Tables Be? [mysql.org]
MySQL Version 3.22 has a 4G limit on table size. With the new MyISAM table type in MySQL Version 3.23, the maximum table size is pushed up to 8 million terabytes (2 ^ 63 bytes).[...]
This means that the table size for MySQL databases is normally limited by the operating system.
Thanks! Well I guess the things I heard are only rumours. But I wonder if you have REAL example, company, working site of a BIG MySQL DB. Who use it ? I thought pro only used Oracle, Access...
Maybe I should talk about it somewhere else ? There's no forum on MySQL.com. I have so many questions about DB: backup, security, ...
noSanity
Well I used Access for a game project in a very big company and it perfectly work. but we were only 20 on the project :). about Postgre... it's not free (I think) and It doesn't work under Windows :(. I think MySQL is a good deal because it's free, powerful and moreover easy to install. setup.exe :). I just want to know if some pro uses it to manage complex project...
Bye,
Oliver
In any case, it seems some sort of easy parser is needed... I know I've spent plenty of time looking for one. If those that are currently working on one doesn't have one finished, I'll try and wrap mine up into a user friendly application.
[quickhollywood.com...]
Currently it gets about 200,000 links with 14,000 categories before it finds an xml formatting issue it can't deal with and stops. The nice part is it does this in 9 minutes.
Once I find a fix for the code I will post a messages again letting people know were it is and how to get it.
dmoz/odp is having some problems with the RDF dump. It contains some strange bytes which redenrs the XML data useless cause every parser will report "not well formed" errors.
I've written a small java program which removes these bytes so you can parse it later.
You can get it here :
[ohardt.com...]
I can also help you with publishing your php code.
I have written such a parser in php too some time ago, but php had major problems with handling the 900 MB file. Some strange errors occured, that's why I switched to Java.
Bye,
Oliver
I am curious about a performace comparison between php and java. How long does your java application take to insert the content and the structure file? I will take another look at your code and see if I can figure out how to put the data integrity fix into my php code and then we can do a quick comparison. Java coding and PHP coding have about the same turnaround time for cranking out functioning code but if one is faster running than the other it would be learning to like it. As you read in the first message, I have a server already setup with Postgresql and MySql and PHP that has no production on it, yet. I will definately let you know if I get stuck in any major way.
greets,
tobsn