Forum Moderators: coopster
I apologize if this has been asked and answered. I couldn't find a good solution anywhere. Thought I'd ask the experts. : )
I'm parsing an XML file, and I have run across a problem in the output. The SimpleXML construct is having "issues" because some the data is in the form of "&", "<", ">", etc.
What I'm looking for is a regex pattern to replace all of the HTML characters INSIDE the XML tags, and not within the tags (or XML tags). I tried htmlentities() on it, but that screws up all of the XML.
So, I need:
<tag name="tag">Replace the < and & signs but not the ...</tag>
Any guidance? TIA!
Well, the regex was really super simple. The following gathers what I need:
preg_match_all("/<proto name=\"packet\">(.*?)<\/proto>/is", $xml, $matches);
From this I get my array of matches. Now my problem is updating the data. I tried this:
foreach ($matches[1] as $match) {
$replaced = str_replace(array("&", "<", ">"), array("&", "<", ">"), $match);
$xml = str_replace($match, $replaced, $xml);
}
I just get a blank page.
What would be the fastest way to update these items? This file is currently 3 MB's, but it could MUCH larger.
TIA!
$tagStart = '<proto name="packet">';
$tagEnd = '</proto>';
$tagStartLen = strlen($tagStart);
$tagEndLen = strlen($tagEnd);
$corrected = '';
$ptr = 0;
while(($posStart = strpos($xml, $tagStart, $ptr)) !== false) {
// Append text up to and including tagStart, advance ptr.
$corrected .= substr($xml, $ptr, $posStart-$ptr+$tagStartLen);
$ptr = $posStart + $tagStartLen;
// Find tagEnd.
$posEnd = strpos($xml, $tagEnd, $ptr);
if($posEnd === false) // No tagEnd!
$posEnd = strlen($xml) - 1;
// Append corrected tag content, advance ptr to beginning of tagEnd.
$corrected .= htmlentities(substr($xml, $ptr, $posEnd-$ptr));
$ptr = $posEnd;
}
// Append rest of text.
$corrected .= substr($xml, $ptr);
Hope this helps.