homepage Welcome to WebmasterWorld Guest from 174.129.163.183
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Parsing XML
outputs when I don't want it to
lorax




msg:4622630
 9:57 pm on Nov 11, 2013 (gmt 0)

I'm working on a PHP function that parses an XML feed into HTML - a combination of something I got from W3 Schools and my own modifications. The script parses the XML feed the way it should but it immediately outputs to screen wherever the code block exists in the page structure.

I've been trying to modify this - to put the output into a variable that I can echo later and in the proper place. The reason I want to do this is because I need conditional logic in the script to evaluate which XML feed is needed. I can't find what initiates the output. Looking at the XML functions on PHP.net isn't very helpful - I can't tell which function is responsible.

Here's the core script with a sample conditional. If you can help, I'd appreciate it.


function parseme($file) {
$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
}
if(is_page('89')) {
$coursetitle = 'PageTitle';
$courses = parseme("http://webaddresstothefeed");
}


 

swa66




msg:4622634
 10:15 pm on Nov 11, 2013 (gmt 0)

What are your startElement, endElement and characterData functions ?

lorax




msg:4622769
 12:23 pm on Nov 12, 2013 (gmt 0)

Thanks for the reply. I found a work around but I'd still like to know what actually outputs the data.


function startElement($parser, $name, $attrs)
{
global $map_array;
if (isset($map_array[$name])) {
echo "<$map_array[$name]>";
}
}

function endElement($parser, $name)
{
global $map_array;
if (isset($map_array[$name])) {
echo "</$map_array[$name]>";
}
}

function characterData($parser, $data)
{
echo $data;
}

lorax




msg:4622796
 3:00 pm on Nov 12, 2013 (gmt 0)

** Modified approach **
I moved the functions off to a includes file that get's called by each site as needed.

This function now looks like:


function parseme($subject) {
$file = "XMLFEEDHERE".$subject;

function startElement($parser, $name, $attrs)
{
global $map_array;
if (isset($map_array[$name])) {
echo "<$map_array[$name]>";
}
}

function endElement($parser, $name)
{
global $map_array;
if (isset($map_array[$name])) {
echo "</$map_array[$name]>";
}
}

function characterData($parser, $data)
{
echo $data;
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("Could not open XML input. Please <a href=\"mailto:webmaster@\">contact us</a> to let us know there was a problem.");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d || Please <a href=\"mailto:webmaster@\">contact us</a> to let us know there was a problem.",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
}


Then I call the function from within my template like so:


<?php parseme('subject'); ?>


The function works great when I call the parseme() function once. If I call it again (to get two subjects instead of just one for example) the script hangs - nothing is output (not even HTML) after the first time I call parseme(). Which leads me to wonder a) what's hanging up the function and b) how do I find it?

swa66




msg:4622825
 5:14 pm on Nov 12, 2013 (gmt 0)

I found a work around but I'd still like to know what actually outputs the data.

See the echo in the each of the callback functions (startElement, endElement and characterData): those do the output ...
If you modify them to add what they would output to a global variable you'd be set.

You link those callback functions to your parser using this code:

xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");


Does that answer your question ?

lorax




msg:4622829
 6:13 pm on Nov 12, 2013 (gmt 0)

Yes indeed it does. I saw the echo but for some reason was thinking it wouldn't output to screen because it was inside the function.

Any thoughts on the second issue?

phranque




msg:4622852
 7:40 pm on Nov 12, 2013 (gmt 0)

"echo" by default goes to STDOUT, which for a server process is the response that gets sent to the browser.

phranque




msg:4622853
 7:46 pm on Nov 12, 2013 (gmt 0)

Any thoughts on the second issue?

try using flush:
http://php.net/manual/en/function.flush.php [php.net]

lorax




msg:4622872
 9:43 pm on Nov 12, 2013 (gmt 0)

Meh... no love there either but thanks.

swa66




msg:4622951
 11:47 am on Nov 13, 2013 (gmt 0)

Your 2nd issue ... not sure without some testing myself.

some observations:
- you don't seem to call fclose() on the file handle
While this is not a problem when you quit the script (all file handles will get closed automatically), but if you continue to use the handle that already has read till the end of the file ...
- feof() can hang in certain conditions according to the manual: [us3.php.net...] you might have one of those due to the above
- AFAIK the parser only accumulates the input till it is told the xml is complete. This is done by setting the 3rd parameter to true on the xml_parse() call. So you'll not get gradual output as the input is read, but all in one go instead.

Other than that: try logging where the script stalls out on you on the second call.

lorax




msg:4622991
 4:47 pm on Nov 13, 2013 (gmt 0)

Thanks swa66. I tried the suggestions but again nothing changed. I'm working on building an error handler.

lorax




msg:4623003
 5:32 pm on Nov 13, 2013 (gmt 0)

"echo" by default goes to STDOUT, which for a server process is the response that gets sent to the browser.


So if I change those "echo"s to variables how would get the output to show? I assume I'd need to return something from parseme() but what?

xml_parse simply returns 1 or 0. I don't see a way to get the result into a variable that I can use.

lorax




msg:4623004
 5:43 pm on Nov 13, 2013 (gmt 0)

Just for grins I change the "echo"s to vars and tried the function. To my surprise it didn't change a thing.


function startElement($parser, $name, $attrs)
{
global $map_array;
if (isset($map_array[$name])) {
$startme = "<$map_array[$name]>";
}
}

function endElement($parser, $name)
{
global $map_array;
if (isset($map_array[$name])) {
$endme = "</$map_array[$name]>";
}
}

swa66




msg:4623136
 2:07 am on Nov 14, 2013 (gmt 0)

Does your code create a $map_array?
-> otherwise those echo's never did anything (the one in the function characterData will).

Also a variable in a function has a limited scope ... if you want to use it outside the function, you'll have to make it global ...

swa66




msg:4623137
 2:13 am on Nov 14, 2013 (gmt 0)

So if I change those "echo"s to variables how would get the output to show?


Try this:


$output='';
//[...]
function characterData($parser, $data) {
global $output;
$output.=$data;
}
//[...]
//[call xml_parse as you need]
//[...]
//[where you need the output]
echo $output;
//[...]

lorax




msg:4623239
 2:32 pm on Nov 14, 2013 (gmt 0)

Thanks swa66. I'll see what I can do with that info today.

lorax




msg:4623249
 2:54 pm on Nov 14, 2013 (gmt 0)

Update and clarification. This is a WordPress install. I use a global functions file (in a directory outside of all our WordPress installs) and then use custom functions inside the functions file for each install.

The global installs file contains this code block:

// CourseLeaf XML Parser
function parseme($subject) {
$file = "XMLFEEDHERE".$subject;

$output='';

function startElement($parser, $name, $attrs)
{
global $map_array;
if (isset($map_array[$name])) {
$startme = "<$map_array[$name]>";
}
}

function endElement($parser, $name)
{
global $map_array;
if (isset($map_array[$name])) {
$endme = "</$map_array[$name]>";
}
}

function characterData($parser, $data)
{
global $output;
$output.=$data;
}

$xml_parser = xml_parser_create();
// use case-folding so we are sure to find the tag in $map_array
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_set_element_handler($xml_parser, $startme, "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if (!($fp = fopen($file, "r"))) {
die("Could not open XML input.");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
}


The functions file for the WP install has this code block - for the output.


<div class="format_text">
<?php echo parseme('CJ');?>
<?php echo parseme('SO');?>
</div>



Prior to making the change suggested by swa66 I was able to get output from the parsme() but not the second. Now I get nothing after the page opening div.

swa66




msg:4623283
 5:13 pm on Nov 14, 2013 (gmt 0)

I'd close the filehandle right after (or before) the xml_parse_free() call.
fclose($fp);

Next you'll need to print the $output somewhere appropriate...

Try at first after the first call the parseme(), cause likely the second call is still hanging as it was before.

Wordpress ... complicates things as you know.

Do you have a $map_array variable defined ? [I guess not] -> output it to try to see what's in the array (print_r() will help)

xml_set_element_handler($xml_parser, $startme, "endElement");

Sure that should not read
xml_set_element_handler($xml_parser, "startElement", "endElement");

?

lorax




msg:4623616
 8:58 pm on Nov 15, 2013 (gmt 0)

Getting closer ~ sort of.

Digging into XML functions I've learned that the start and end elements don't apply to my situation. The data I have isn't complex at all. Here's a sample:


<courseinfo>
<course code="CJ 101">
<![CDATA[
<div class="courseblock"> <p class="courseblocktitle"><strong>CJ&#160;101. Introduction to Criminal Justice. 3 Credits.</strong></p> <p class="courseblockdesc"> A general survey of the principles, system, and process of criminal justice. Introduction to conceptions and definitions of crime, criminal law, and due process. Examination of the organization and operation of the three basic components of the criminal justice system -- the police, the courts, and corrections -- individually and in relationship to one another. Offered in fall semesters.<br /> </p> </div>
]]>
</course>
<course code="CJ 102">
<![CDATA[
<div class="courseblock"> <p class="courseblocktitle"><strong>CJ&#160;102. Substantive Criminal Law. 3 Credits.</strong></p> <p class="courseblockdesc"> This course presents the development of criminal law in the United States and discusses its principles, sources, distinctions, and limitations. The following topics are covered in detail: criminal liability; offenses against persons, property, public peace and public justice; preparatory activity crimes; and defenses available to those charged with criminal activity. Offered spring semester.<br /> </p> </div>
]]>
</course>
...
}

I was able to manipulate the output (like get rid of the line breaks) but still, running the function twice still doesn't work right. Truncates the output script after the first time the function is called - no error, no closting html - just quits.

And that's with the 'die' function commented out.

So, with the above XML feed, the startElement and endElement functions are irrelevant - I've stripped out the code and I still get the same output (when I run parseme once).

The characterData is the only function that seems to do anything. Here's the current incarnation:

function characterData($parser, $data)
{
global $output;
$output .= str_replace("<br />\n","",str_replace("&#160;"," ",$data));
}


I've unset the variables ($data, $subject) and closed the file ($fp). Nothing.

Then I began testing outputs like so:


// Testing block
if(isset($GLOBALS['output'])) $answer = "it's global"; elseif (isset($output)) $answer = "it's local";
return $answer." -> ".gettype($output);


The odd thing is - output is NULL. There's nothing in it. So either I'm scoping it improperly or something is killing it.

lorax




msg:4623618
 9:03 pm on Nov 15, 2013 (gmt 0)

TO be clear, there are two issues that I see. The first is just getting control over the concatenation of the string results of the parser ($data and/or $output) into something I can return for use.

And then, why the script breaks when it's called back to back like so

parseme('cj');
parsme('so');

lorax




msg:4623638
 9:51 pm on Nov 15, 2013 (gmt 0)

Finally!

It was the XML parser itself that was causing the issues. Probably because my XML feed is so simple. I opted to read in the stream and then just parse it without the xml functions giving me this:

function parseme($subject) {
$data = '';
$output = '';

if (!($fp = fopen("http://catalog.example.com/ribbit/?page=getcourse.rjs&subject=".$subject, "r"))) {
//die("Could not open XML input. Please <a href=\"mailto:webmaster@example.com\">contact us</a> to let us know there was a problem.");
}

while ($chunk = fread($fp, 4096)) {
{
$data .= str_replace(']]>','',str_replace('<![CDATA[','',$chunk));
}
}
// Testing block
return $data;

fclose($fp);
unset($data);
}

[edited by: phranque at 10:35 am (utc) on Nov 17, 2013]
[edit reason] exemplified domain [/edit]

swa66




msg:4623681
 3:13 am on Nov 16, 2013 (gmt 0)

That's dangerous ...
What if it's not all in the same chunk ?
Better read it all into the data variable and then do the str_replace ...

If that's all you need, maybe [php.net...] would be of more use to you. It's pretty easy to use and actually parses the xml (storing it in an array, so easy to access it).

This is probably the right spot to start with simplexml: [php.net...]

lorax




msg:4623827
 6:53 pm on Nov 16, 2013 (gmt 0)

Not being sarcastic but I don't see the issue? The chunks are concatenated then returned. Or at least that's what I thought I was doing?

swa66




msg:4623847
 8:57 pm on Nov 16, 2013 (gmt 0)

The issue is one of crossing boundaries.

e.g.: You look for "<'[CDATA["
- what if "<'CD" was the end of a chunk you read: it would not be recognised as "<'[CDATA[".
- the next chunk could start with "ATA[" and it too would not be recognized as "<'[CDATA["
Yet after it's concatenated it still forms "<'[CDATA[" but still would go unrecognized.

So you need to concatenate them first and only then do the string replacements.

I'd replace
while ($chunk = fread($fp, 4096)) {
{
$data .= str_replace(']]>','',str_replace('<![CDATA[','',$chunk));
}
}

with

while ($chunk = fread($fp, 4096)) {
$data .= $chunk;
}
$data = str_replace(']]>','',str_replace('<![CDATA[','',$data));

lorax




msg:4624035
 9:11 pm on Nov 17, 2013 (gmt 0)

I took the weekend off. Will look into this on Monday AM. Thank again swa66

lorax




msg:4624223
 1:01 pm on Nov 18, 2013 (gmt 0)

Took me a weekend away from this to fully grasp the way the error could occur. Had my DUH! moment this am while lying in bed thinking about the day. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved