Forum Moderators: coopster

Message Too Old, No Replies

Searching large XML files with PHP

         

eelixduppy

7:04 am on Jul 8, 2008 (gmt 0)



This is just a hypothetical question here, but I'm curious how you would search an XML file (specifically talking about large XML files).

There are two basic ways that PHP can parse XML:
1) SimpleXML [php.net] (DOM parser)
2) Expat [php.net] (SAX parser)

The latter isn't used as much from what I've seen, but I have used it from time to time for various things. It's just that SimpleXML is so....simple ;)

Anyway, when you are dealing with large files in PHP, SimpleXML has to load the entire contents of the file into memory in order for the script to parse through the contents. This can slow things down or actually kill the script if the max memory limit is reached. Expat, however, doesn't, and this is where I run into trouble deciding which would be better to use in searching large files. I do know, though, that SimpleXML supports XPath with is used to perform queries within XML, so that is a plus for SimpleXML.

Anyone have thoughts on this? I realize there are a few PEAR classes that handle XML, too, but I haven't played with those at all.

jatar_k

4:25 pm on Jul 8, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



not to distract but maybe if they are that large then xml is the issue

[webmasterworld.com...]

coopster

5:26 pm on Jul 8, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Expat uses file system functions to read and parse the file. Have you measured performance between disk reads versus a one time read and memory processing? I would be curious to see the difference.

Typically you tend to know the file size of your xml doc so setting the time limit to zero when processing large xml files is a first step. As you mentioned, memory limit is another value to set.

Other options are to use system commands like grep for quick scans.

eelixduppy

5:57 pm on Jul 8, 2008 (gmt 0)



Thanks guys. I'm going to do some testing within the next couple of days. I'll get back with anything I find.

dreamcatcher

6:11 pm on Jul 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is reading the file into a string any quicker? Rather than using the xml functions?

dc

eelixduppy

6:15 pm on Jul 8, 2008 (gmt 0)



Reading it into a string would be like using SimpleXML, as it loads the entire XML document into memory. It might be smaller, however, as SimpleXML creates an object that resembles the DOM of the XML.