Forum Moderators: coopster

Message Too Old, No Replies

Preg_replace and preg_match functions for PHP help

         

realnoodle

2:36 am on Aug 1, 2007 (gmt 0)

10+ Year Member



Okay, I've watched this forums posts for a very long time and I know if there is anyone out there who can help me get this unbelievably frustration issue resolved, aye, that be you fellas!

Here is the dilemma. I can't for the life of me grip the stated "Simplicities" of regular expressions. I understand, sort of, the processes that PHP's preg_replace or preg_match functions perform but detailing the expressions that make them do exactly what they are supposed to is driving me to drink.

I need nothing more than to parse a page, replacing any <sup></sup> tags with some alternatives. To make this a bit more clear, when I locate a <sup> tag, I need to know what is contained within that tag as well. <sup>1</sup> in some perhaps really crazy cases these tags might contain linked numbers <sup><a href="#">1</a></sup>, and finally some wise guys might even might have a moment of self reflection and decide in all their wisdom to apply css to these. <sup class="Retarded"><a href="#">1</a></sup>. There is no reason for these to be applied, but from a users perspective I have to be prepared for this.

When the user states a <sup></sup> tag they are doing this so that I can recognize this with a script and use it as it is intended. So here is what I have. I use a copied html page from Wiki.. as myfile.html that has some of the <sup></sup> tags already and they also include css, id, and links.

$source = "myfile.html";
$fp = fopen($source, "r") or die("couldn't Open $source");
while (!feof($fp)) {
$line .= fgets($fp,1024);
}
fclose($fp);
// Remove the Garbage Tags for testing
$line = strip_tags("$line","<sup>,<div>,<a>");

preg_match_all('/<sup>(.*)<\/sup>/s',$line,$result);

for ($i = 0; $i < count($result[0]); $i++) {
// Used for Testing the Contents of $result
echo "Works: ".$result[0][$i]."\n";
}

foreach($result as $key => $val) {
// if $results where working, we could use our $result data to make the necessary changes.. but!
}

Once I had the $results data changed to what I needed it to be changed to then I could use some thing like.
preg_replace('/(<sup[^>]*>)(.*?)(<\/sup>)/is','$changes',$somesource);

I mentioned that I have had to bits and pieces the expressions together, I haven't a clue if they are right or wrong, honestly, I don't have a clue whether or not the whole intention is right. Thus, I need your expertise to steer me right.

Thanks in Advance.

eelixduppy

6:22 am on Aug 1, 2007 (gmt 0)



Welcome to WebmasterWorld!

So basically you want to strip any attributes or additional HTML that may be contained within the

<sup></sup>
tags?

Something like this?


$pattern = "/<sup[^>]*>(.+)<\/sup>/i";
$new_string = preg_replace($pattern,'<sup>\\1</sup>',strip_tags($string,'<sup>'));

eelixduppy

6:35 am on Aug 1, 2007 (gmt 0)



Actually, that solution will strip the tags that you want to keep outside of the <sup> tag. So how about something like this?


$pattern = "/<sup[^>]*>(<[^>]*>)*([^<]+).*<\/sup>/i";
$new_string = preg_replace($pattern,'<sup>\\2</sup>',$string);

I'm sure the pattern can be optimized a bit, but this will do the job.