Forum Moderators: coopster

Message Too Old, No Replies

Reg Expressions AArrggghhh!

should be a simple search and replace, right?

         

royalelephant

5:34 pm on Jan 2, 2005 (gmt 0)

10+ Year Member



I need to search and replace lines of html code using regular expressions. In the example below, all of the information between "href="/ and LAST_MODIFIED="0"> varies from individual entry to entry and has to be ignored.

Can someone write the general expression I need to use? I've been trying to get it right to no avail.

Search For...

<DT><A HREF="/variable_directory_name/dir/pagename.shtml" ADD_DATE="1043532847" LAST_VISIT="1043532847" LAST_MODIFIED="0">

Replace With...

<DT><A HREF="/variable_directory_name/dir/pagename.shtml" ADD_DATE="1043532847" LAST_VISIT="1043532847" LAST_MODIFIED="0"> <img src="/gr/gs/graphicname.gif">

Salsa

7:30 pm on Jan 2, 2005 (gmt 0)

10+ Year Member



Just to make sure I understand you correctly, are you wanting to add

<img src="/gr/gs/graphicname.gif">

...after each occurance of

<DT><A HREF="/ ... LAST_MODIFIED="0">

...?

ergophobe

9:37 pm on Jan 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



also, I assume that graphicname.gif is not going to be the same in every case. Where will this filename come from?

royalelephant

11:02 pm on Jan 2, 2005 (gmt 0)

10+ Year Member



Yes, I want to add

<img src="/gr/gs/graphicname.gif">

...after each occurance of

<DT><A HREF="/ ... LAST_MODIFIED="0">

I assume that graphicname.gif is not going to be the same in every case. Where will this filename come from?

Oh, no the graphicname.gif will be the same for every insertion.

Salsa

12:21 am on Jan 3, 2005 (gmt 0)

10+ Year Member



Okay, see if this will work for you:

$html_file = // html file contents 
$image_tag = "<img src=\"/gr/gs/graphicname.gif\">";
$regex_pattern = "#(<DT><A HREF=\"/.*LAST_MODIFIED=\"0\">?)(?!$image_tag)#i";
while (preg_match($regex_pattern,$html_file,$matches) {
$html_file = preg_replace($matches[1],$matches[1].$image_tag,$html_file);
}

This isn't tested, but I think it might work out of the box. At least the way it's meant to work is that the stuff in the first set of () of the pattern will be stored in $matches[1] and should match all the variations that you want. The .* will match any number of any characters between the literal strings on either side.

The second set of () is an assertion that will disallow any matches that already
have the image tag after them. That part of the pattern will allow you to make
the replacements in a while loop that will run until all of the replacements are
done.

At the end of the pattern, the i after the # boundary is to make the search
case-insensitive. If you are certain about the cases of the tags, you can omit
it if you want.

As I said, this is untested, so if you have any problems that you have trouble
solving, just post what error messages you get.

I hope this helps.

Salsa

3:50 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



I thought I might have been a bit bold to imagine that the code I posted last night could work out of the box, so I couldn't resist testing it. Of course, it didn't work. So, try this:

while (preg_match('@(<DT><A HREF="/.*LAST_MODIFIED="0">)(?!<img src="/gr/gs/graphicname.gif">)@i',$html_file,$matches))
{
$pattern = "#".$matches[1]."#";
$replacement = $matches[1]."<img src=\"/gr/gs/graphicname.gif\">";
$html_file = preg_replace($pattern,$replacement,$html_file);
}
It's a bit crude, but at least it worked with a test $html_file requiring four iterations.

I wish you well.

[edited by: coopster at 12:22 am (utc) on Jan. 4, 2005]
[edit reason] Disabled graphic smile faces for this post [/edit]

royalelephant

10:23 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



Thank you Salsa! Have a great 05! (Yes, it looks good now.) :)