regex problem

Forum Moderators: coopster

Message Too Old, No Replies

regex problem

force123

6:06 pm on Jul 28, 2009 (gmt 0)

Hi,

I need to check all the text in an HTML content but ignore the tags attributes.

For example:

$content = '<a class="text">text</a> This is some text and other elements Goes here';

$check = 'text';

I've gone this far:

$content = preg_replace('~(<[^>]+>)?' . preg_quote($check, '~') . '(</[^>]+>)?~', '$1'.$check.'$2', $content);

but it still fails. It succeeds ignoring the a tag attributes but it does the bold to the b tag:

<a class="text">text</a> This is some text and other elements text">Goes here

By ignoring I mean ignoring the data between < and >. Not ignoring the whole tag. (Otherwise I would use strip_tags())

Gibble

6:56 pm on Jul 28, 2009 (gmt 0)

First, download the tool expresso if you don't have it already.

It's the most useful tool I have ever found when trying to write a regex that works.

I'll play with it, but with that tool you should be able to figure it out in no time.

idfer

10:36 pm on Jul 28, 2009 (gmt 0)

You could look at it this way: you want to match text that's not followed by a > without any intervening <'s. You can do this with a negative lookahead assertion [(?!...)]:

$content = preg_replace('~' . preg_quote($check, '~') . '(?![^<]*>)~', ''.$check.'', $content);

Hope this helps.

force123

3:29 am on Jul 29, 2009 (gmt 0)

Yea it works. :)
I didn't know about the [(?!...)] :)
Thanks A LOT!

regex problem

force123

Gibble

idfer

force123

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week