Forum Moderators: coopster
Here is the dilemma. I can't for the life of me grip the stated "Simplicities" of regular expressions. I understand, sort of, the processes that PHP's preg_replace or preg_match functions perform but detailing the expressions that make them do exactly what they are supposed to is driving me to drink.
I need nothing more than to parse a page, replacing any <sup></sup> tags with some alternatives. To make this a bit more clear, when I locate a <sup> tag, I need to know what is contained within that tag as well. <sup>1</sup> in some perhaps really crazy cases these tags might contain linked numbers <sup><a href="#">1</a></sup>, and finally some wise guys might even might have a moment of self reflection and decide in all their wisdom to apply css to these. <sup class="Retarded"><a href="#">1</a></sup>. There is no reason for these to be applied, but from a users perspective I have to be prepared for this.
When the user states a <sup></sup> tag they are doing this so that I can recognize this with a script and use it as it is intended. So here is what I have. I use a copied html page from Wiki.. as myfile.html that has some of the <sup></sup> tags already and they also include css, id, and links.
$source = "myfile.html";
$fp = fopen($source, "r") or die("couldn't Open $source");
while (!feof($fp)) {
$line .= fgets($fp,1024);
}
fclose($fp);
// Remove the Garbage Tags for testing
$line = strip_tags("$line","<sup>,<div>,<a>");
preg_match_all('/<sup>(.*)<\/sup>/s',$line,$result);
for ($i = 0; $i < count($result[0]); $i++) {
// Used for Testing the Contents of $result
echo "Works: ".$result[0][$i]."\n";
}
foreach($result as $key => $val) {
// if $results where working, we could use our $result data to make the necessary changes.. but!
}
Once I had the $results data changed to what I needed it to be changed to then I could use some thing like.
preg_replace('/(<sup[^>]*>)(.*?)(<\/sup>)/is','$changes',$somesource);
I mentioned that I have had to bits and pieces the expressions together, I haven't a clue if they are right or wrong, honestly, I don't have a clue whether or not the whole intention is right. Thus, I need your expertise to steer me right.
Thanks in Advance.
So basically you want to strip any attributes or additional HTML that may be contained within the
<sup></sup> tags? Something like this?
$pattern = "/<sup[^>]*>(.+)<\/sup>/i";
$new_string = preg_replace($pattern,'<sup>\\1</sup>',strip_tags($string,'<sup>'));
$pattern = "/<sup[^>]*>(<[^>]*>)*([^<]+).*<\/sup>/i";
$new_string = preg_replace($pattern,'<sup>\\2</sup>',$string);
I'm sure the pattern can be optimized a bit, but this will do the job.