Forum Moderators: coopster

Message Too Old, No Replies

regex problem

         

force123

6:06 pm on Jul 28, 2009 (gmt 0)

10+ Year Member



Hi,

I need to check all the text in an HTML content but ignore the tags attributes.

For example:

$content = '<a class="text">text</a> This is some text and other elements <b class="text">Goes here</b>';

$check = 'text';

I've gone this far:

$content = preg_replace('~(<[^>]+>)?' . preg_quote($check, '~') . '(</[^>]+>)?~', '$1<b>'.$check.'</b>$2', $content);

but it still fails. It succeeds ignoring the a tag attributes but it does the bold to the b tag:

<a class="text"><b>text</b></a> This is some <b>text</b> and other elements <b class="<b>text</b>">Goes here</b>

By ignoring I mean ignoring the data between < and >. Not ignoring the whole tag. (Otherwise I would use strip_tags())

Gibble

6:56 pm on Jul 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, download the tool expresso if you don't have it already.

It's the most useful tool I have ever found when trying to write a regex that works.

I'll play with it, but with that tool you should be able to figure it out in no time.

idfer

10:36 pm on Jul 28, 2009 (gmt 0)

10+ Year Member



You could look at it this way: you want to match text that's not followed by a > without any intervening <'s. You can do this with a negative lookahead assertion [(?!...)]:

$content = preg_replace('~' . preg_quote($check, '~') . '(?![^<]*>)~', '<b>'.$check.'</b>', $content);

Hope this helps.

force123

3:29 am on Jul 29, 2009 (gmt 0)

10+ Year Member



Yea it works. :)
I didn't know about the [(?!...)] :)
Thanks A LOT!