Page is a not externally linkable
- Code, Content, and Presentation
-- XML Development
---- XML Datafeeds to Database


dylanz - 5:40 am on Jan 15, 2009 (gmt 0)


I'm getting a huge product XML file (3gb) from Commission Junction. My current issues are the following, and I would love any insight/suggestions into any of them:

1. The file doesn't validate to its DTD.

I'm using "xmllint" to verify, and it is indeed broken. Not the end of the world, as I'm running a couple "tr" commands to remove all kinds of funky characters and get the file in working order.

2. Getting the data into my database.

Currently, I'm using libxml's Sax parser to read the file via a stream, which keeps the memory footprint low. However, for each node, I'm having to do a read in my database to see if that product exits, then update that record or create a new one if it doesn't exist. This approach is unfortunately going to take hours (it's running 5 hours plus already.

Any suggestions? Could I do the process differently, or speed it up in any way?

Any feedback will be appreciated! Thanks!


Thread source:: http://www.webmasterworld.com/xml/3826763.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com