Welcome to WebmasterWorld Guest from 54.144.124.152

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

RegExp - Find non hyperlink words

     

milanmk

8:14 am on Aug 2, 2007 (gmt 0)

5+ Year Member



I want to find all the non hyperlinked words from a given set of paragraphs. The words can be inside other tags like bold or italics but should not be inside an anchor tag.

Currently using the following regular expression to find the word "keyword" :-

[^>]\bkeyword\b[^</a>]

But it still matches the word "keyword" inside <a href="http://example.com">abc keyword xyz</a>.

Any suggestions to improve my RegExp pattern?

Milan

milanmk

2:46 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



Anyone...

Milan

RonPK

4:52 pm on Aug 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's not clear to me what you want: find all non hyperlinked words in a string, or a particular non hyperlinked keyword in the string.

borntobeweb

4:55 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



Hi Milan. You could always remove all the hyperlinks first and take the words from whatever's left, so:

$noLinks = preg_replace('{<a.*?</a>}', '', $origText);
$noTags = preg_replace('{<.*?>}', '', $noLinks);

$noTags should have everything inside the original text minus hyperlinks and html tags. Hope this helps.

milanmk

6:51 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



I want to replace particular word with hyperlink but only if that word is not inside an anchor text.

The word "keyword" should be replaced with hyperlink in the following string
"this is a keyword"

but not in any of the following
"this is a <a href="http://example.com">keyword</a> and another <a href="http://example.com">abc keyword def</a>"

This expression [^>]\bkeyword\b[^</a>] is still matching the "keyword" in second anchor text in the above example.

Milan

RonPK

8:27 pm on Aug 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, beats me.

borntobeweb

8:46 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



Heh looks like i got it totally wrong. So you want to take a string, e.g.

'One keyword, <em>second keyword</em>, <a href="example">third keyword</a>'

and change it into:

'One <a href="linktokeyword">keyword</a>, <em>second <a href="linktokeyword">keyword</a></em>, <a href="example">third keyword</a>'

Is that it? BTW, the regex part [^</a>] matches any single character that is not <, /, a, or >. Not any string that isn't </a>.

milanmk

9:47 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



Exactly, borntobeweb. Got your point on [^</a>] pattern.

Could you suggest corrections in the regular expression?

Milan

borntobeweb

10:11 pm on Aug 3, 2007 (gmt 0)

5+ Year Member



I can't think of any single regex that can do that. You can do it the not-so quick but dirty way:

// Replace all keyword by hyperlink.
$step1 = preg_replace('{\bkeyword\b}', '<a href="link">keyword</a>', $origText);

// Remove inner hyperlinks created by step1 above.
$step2 = preg_replace('{(<a[^<]*)<a href="link">keyword</a>([^<]*</a>)}', '\1keyword\2', $step1);

This works as long as there are no other html tags inside the original hyperlinks.

[edited by: eelixduppy at 5:25 pm (utc) on Feb. 21, 2008]

milanmk

10:38 am on Aug 4, 2007 (gmt 0)

5+ Year Member



Its working fine now. Just made a small change in step2 from '\1keyword\2' -> '$1keyword$2' and it did the trick.

Thank you for your continued patience.

Milan

 

Featured Threads

Hot Threads This Week

Hot Threads This Month