Forum Moderators: coopster

Message Too Old, No Replies

xml parser functions- need to understand

help understanding this function

         

Tanushiheadbash

5:29 pm on Dec 22, 2006 (gmt 0)

10+ Year Member



Hi
I have been using php initially using others scripts, sometimes modifying them, and laterly writing a lot more of my own stuff. On the whole I am gradually building up a fair amount of experience and am starting to branch out to wanting to use more of the functions out there away from the absolute basics. In particular I want to start using xml functions more- I have some in use already but most are pretty well as used in books and adapted or are cobbled together things. I do want to really understand how this works. This bit code comes from the php manual- I primarily use php4 still for server reasons, though most if not all code I have written is php5 compatible. I have gone through this a dozen times and there are some bits of this I am really not grasping- though I have trawled the manual and other books could someone explain- call me a noob!

The three variables passed into the startElement function $parser, $name and $attrs where exactly are these coming from and what do they contain and why? They just seem to magically appear!

<?php
$file = "data.xml";
$depth = array();

function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
}

function endElement($parser, $name)
{
global $depth;
$depth[$parser]--;
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
Also why when I use this bit script exactly as is though using a simple xml file as input of my own , does it return a notice error:
Notice: Undefined offset: 2 in C:\www\webroot\test.php on line 8
Notice: Undefined offset: 2 in C:\www\webroot\test.php on line 12

I have tried it with various other xml files with the same result- it seems to do what its meant to- ie show the tag structure of the xml feed but as I say always with these errors? Any pointers would be most welcome- as I say i could carry on blindly- making things work just accepting that it does but I won't be learning by doing so.

Many thank sin advance- Kenny

Psychopsia

6:47 pm on Dec 22, 2006 (gmt 0)

10+ Year Member



startElement function $parser, $name and $attrs where exactly are these coming from and what do they contain and why?

Hi!

They are function arguments [php.net] passed to the function when you call it.

alfaguru

6:51 pm on Dec 22, 2006 (gmt 0)

10+ Year Member



Kenny, the "parser" value passed in is a handle which represents the XML parser. It's not a good idea to use it as an offset into an array as in a future version of the library functions it might not be a scalar value. Its type is not guaranteed.

The notice you are getting is because the depth value is not initialised. A simple test will get rid of the problem:


if(!isset($depth[$parser])) $depth[$parser] = 0;

But since you only have one parser in use at a time, it would make more sense to replace all occurrences of $depth[$parser] by $depth;

Returning to your original question, the way it works is that the parser steps through the XML document, looking at each node in turn and whenever it gets to an element declaration it calls the element handler you've defined with the tag name and attributes it has extracted.

So you create a parser and tell it the names of your functions, then pass it a document to parse. The parser calls back the functions you've specified with data from the document.

The advantage of this approach is it's fairly simple. The disadvantage is it's hard to process complex documents this way. Use the DOM parser if you need more flexibility.

Tanushiheadbash

8:26 pm on Dec 22, 2006 (gmt 0)

10+ Year Member



Hi
Thanks for the quick response though I am not entirely sure you have answered my question - or maybe its me being thick! I understand the principle of how a user defined function works in that say I write a function that takes the arguments $a, $b and $c I could say:
function myFunction($a, $b,$c){
do something with them here;
}
and then to use it I could for example do
myFunction( 1,2,3) which would transpose 1,2,3 for the arguments $a,$b,$c.
So this far I am perfectly happy.. In this xml parser function , and as I say I took this example directly from the php manual- the function defined as
function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
}
the arguments are $parser, $name and $attrs.

But when the function is called its done like this
xml_set_element_handler($xml_parser, "startElement", "endElement");

Where do $parser, $name and $attrs come from? I don't see anything being passed into startElement.
I presume they are somehow inferred by the xml_set_element_handler but how do I know what which is which form the xml file?
Am i making sense- probably not!
... ah wait a minute I think I have just stumbled on the answer looking back at the php manual!
Quote:-
"The function named by start_element_handler must accept three parameters: start_element_handler ( resource parser, string name, array attribs)


parser
The first parameter, parser, is a reference to the XML parser calling the handler.

name
The second parameter, name, contains the name of the element for which this handler is called. If case-folding is in effect for this parser, the element name will be in uppercase letters.

attribs
The third parameter, attribs, contains an associative array with the element's attributes (if any). The keys of this array are the attribute names, the values are the attribute values. Attribute names are case-folded on the same criteria as element names. Attribute values are not case-folded.

The original order of the attributes can be retrieved by walking through attribs the normal way, using each(). The first key in the array was the first attribute, and so on. "

I gettit now Duh- feel stupid asking now! I just haven't followed through this properly - I see how they appear from this!

So sorry to have been so dumb!