homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

RegExp - Find non hyperlink words

 8:14 am on Aug 2, 2007 (gmt 0)

I want to find all the non hyperlinked words from a given set of paragraphs. The words can be inside other tags like bold or italics but should not be inside an anchor tag.

Currently using the following regular expression to find the word "keyword" :-


But it still matches the word "keyword" inside <a href="http://example.com">abc keyword xyz</a>.

Any suggestions to improve my RegExp pattern?




 2:46 pm on Aug 3, 2007 (gmt 0)




 4:52 pm on Aug 3, 2007 (gmt 0)

It's not clear to me what you want: find all non hyperlinked words in a string, or a particular non hyperlinked keyword in the string.


 4:55 pm on Aug 3, 2007 (gmt 0)

Hi Milan. You could always remove all the hyperlinks first and take the words from whatever's left, so:

$noLinks = preg_replace('{<a.*?</a>}', '', $origText);
$noTags = preg_replace('{<.*?>}', '', $noLinks);

$noTags should have everything inside the original text minus hyperlinks and html tags. Hope this helps.


 6:51 pm on Aug 3, 2007 (gmt 0)

I want to replace particular word with hyperlink but only if that word is not inside an anchor text.

The word "keyword" should be replaced with hyperlink in the following string
"this is a keyword"

but not in any of the following
"this is a <a href="http://example.com">keyword</a> and another <a href="http://example.com">abc keyword def</a>"

This expression [^>]\bkeyword\b[^</a>] is still matching the "keyword" in second anchor text in the above example.



 8:27 pm on Aug 3, 2007 (gmt 0)

Sorry, beats me.


 8:46 pm on Aug 3, 2007 (gmt 0)

Heh looks like i got it totally wrong. So you want to take a string, e.g.

'One keyword, <em>second keyword</em>, <a href="example">third keyword</a>'

and change it into:

'One <a href="linktokeyword">keyword</a>, <em>second <a href="linktokeyword">keyword</a></em>, <a href="example">third keyword</a>'

Is that it? BTW, the regex part [^</a>] matches any single character that is not <, /, a, or >. Not any string that isn't </a>.


 9:47 pm on Aug 3, 2007 (gmt 0)

Exactly, borntobeweb. Got your point on [^</a>] pattern.

Could you suggest corrections in the regular expression?



 10:11 pm on Aug 3, 2007 (gmt 0)

I can't think of any single regex that can do that. You can do it the not-so quick but dirty way:

// Replace all keyword by hyperlink.
$step1 = preg_replace('{\bkeyword\b}', '<a href="link">keyword</a>', $origText);

// Remove inner hyperlinks created by step1 above.
$step2 = preg_replace('{(<a[^<]*)<a href="link">keyword</a>([^<]*</a>)}', '\1keyword\2', $step1);

This works as long as there are no other html tags inside the original hyperlinks.

[edited by: eelixduppy at 5:25 pm (utc) on Feb. 21, 2008]


 10:38 am on Aug 4, 2007 (gmt 0)

Its working fine now. Just made a small change in step2 from '\1keyword\2' -> '$1keyword$2' and it did the trick.

Thank you for your continued patience.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved