Welcome to WebmasterWorld Guest from 54.159.50.111

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

RegExp - Find non hyperlink words

     
8:14 am on Aug 2, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Jan 4, 2006
posts:307
votes: 0


I want to find all the non hyperlinked words from a given set of paragraphs. The words can be inside other tags like bold or italics but should not be inside an anchor tag.

Currently using the following regular expression to find the word "keyword" :-

[^>]\bkeyword\b[^</a>]

But it still matches the word "keyword" inside <a href="http://example.com">abc keyword xyz</a>.

Any suggestions to improve my RegExp pattern?

Milan

2:46 pm on Aug 3, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Jan 4, 2006
posts:307
votes: 0


Anyone...

Milan

4:52 pm on Aug 3, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1749
votes: 0


It's not clear to me what you want: find all non hyperlinked words in a string, or a particular non hyperlinked keyword in the string.
4:55 pm on Aug 3, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:Mar 31, 2007
posts:85
votes: 0


Hi Milan. You could always remove all the hyperlinks first and take the words from whatever's left, so:

$noLinks = preg_replace('{<a.*?</a>}', '', $origText);
$noTags = preg_replace('{<.*?>}', '', $noLinks);

$noTags should have everything inside the original text minus hyperlinks and html tags. Hope this helps.

6:51 pm on Aug 3, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Jan 4, 2006
posts:307
votes: 0


I want to replace particular word with hyperlink but only if that word is not inside an anchor text.

The word "keyword" should be replaced with hyperlink in the following string
"this is a keyword"

but not in any of the following
"this is a <a href="http://example.com">keyword</a> and another <a href="http://example.com">abc keyword def</a>"

This expression [^>]\bkeyword\b[^</a>] is still matching the "keyword" in second anchor text in the above example.

Milan

8:27 pm on Aug 3, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 22, 2002
posts:1749
votes: 0


Sorry, beats me.
8:46 pm on Aug 3, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:Mar 31, 2007
posts:85
votes: 0


Heh looks like i got it totally wrong. So you want to take a string, e.g.

'One keyword, <em>second keyword</em>, <a href="example">third keyword</a>'

and change it into:

'One <a href="linktokeyword">keyword</a>, <em>second <a href="linktokeyword">keyword</a></em>, <a href="example">third keyword</a>'

Is that it? BTW, the regex part [^</a>] matches any single character that is not <, /, a, or >. Not any string that isn't </a>.

9:47 pm on Aug 3, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Jan 4, 2006
posts:307
votes: 0


Exactly, borntobeweb. Got your point on [^</a>] pattern.

Could you suggest corrections in the regular expression?

Milan

10:11 pm on Aug 3, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:Mar 31, 2007
posts:85
votes: 0


I can't think of any single regex that can do that. You can do it the not-so quick but dirty way:

// Replace all keyword by hyperlink.
$step1 = preg_replace('{\bkeyword\b}', '<a href="link">keyword</a>', $origText);

// Remove inner hyperlinks created by step1 above.
$step2 = preg_replace('{(<a[^<]*)<a href="link">keyword</a>([^<]*</a>)}', '\1keyword\2', $step1);

This works as long as there are no other html tags inside the original hyperlinks.

[edited by: eelixduppy at 5:25 pm (utc) on Feb. 21, 2008]

10:38 am on Aug 4, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Jan 4, 2006
posts:307
votes: 0


Its working fine now. Just made a small change in step2 from '\1keyword\2' -> '$1keyword$2' and it did the trick.

Thank you for your continued patience.

Milan