homepage Welcome to WebmasterWorld Guest from 54.196.199.46
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
RegExp - Find non hyperlink words
milanmk




msg:3411241
 8:14 am on Aug 2, 2007 (gmt 0)

I want to find all the non hyperlinked words from a given set of paragraphs. The words can be inside other tags like bold or italics but should not be inside an anchor tag.

Currently using the following regular expression to find the word "keyword" :-

[^>]\bkeyword\b[^</a>]

But it still matches the word "keyword" inside <a href="http://example.com">abc keyword xyz</a>.

Any suggestions to improve my RegExp pattern?

Milan

 

milanmk




msg:3412745
 2:46 pm on Aug 3, 2007 (gmt 0)

Anyone...

Milan

RonPK




msg:3412923
 4:52 pm on Aug 3, 2007 (gmt 0)

It's not clear to me what you want: find all non hyperlinked words in a string, or a particular non hyperlinked keyword in the string.

borntobeweb




msg:3412929
 4:55 pm on Aug 3, 2007 (gmt 0)

Hi Milan. You could always remove all the hyperlinks first and take the words from whatever's left, so:

$noLinks = preg_replace('{<a.*?</a>}', '', $origText);
$noTags = preg_replace('{<.*?>}', '', $noLinks);

$noTags should have everything inside the original text minus hyperlinks and html tags. Hope this helps.

milanmk




msg:3413046
 6:51 pm on Aug 3, 2007 (gmt 0)

I want to replace particular word with hyperlink but only if that word is not inside an anchor text.

The word "keyword" should be replaced with hyperlink in the following string
"this is a keyword"

but not in any of the following
"this is a <a href="http://example.com">keyword</a> and another <a href="http://example.com">abc keyword def</a>"

This expression [^>]\bkeyword\b[^</a>] is still matching the "keyword" in second anchor text in the above example.

Milan

RonPK




msg:3413130
 8:27 pm on Aug 3, 2007 (gmt 0)

Sorry, beats me.

borntobeweb




msg:3413144
 8:46 pm on Aug 3, 2007 (gmt 0)

Heh looks like i got it totally wrong. So you want to take a string, e.g.

'One keyword, <em>second keyword</em>, <a href="example">third keyword</a>'

and change it into:

'One <a href="linktokeyword">keyword</a>, <em>second <a href="linktokeyword">keyword</a></em>, <a href="example">third keyword</a>'

Is that it? BTW, the regex part [^</a>] matches any single character that is not <, /, a, or >. Not any string that isn't </a>.

milanmk




msg:3413173
 9:47 pm on Aug 3, 2007 (gmt 0)

Exactly, borntobeweb. Got your point on [^</a>] pattern.

Could you suggest corrections in the regular expression?

Milan

borntobeweb




msg:3413186
 10:11 pm on Aug 3, 2007 (gmt 0)

I can't think of any single regex that can do that. You can do it the not-so quick but dirty way:

// Replace all keyword by hyperlink.
$step1 = preg_replace('{\bkeyword\b}', '<a href="link">keyword</a>', $origText);

// Remove inner hyperlinks created by step1 above.
$step2 = preg_replace('{(<a[^<]*)<a href="link">keyword</a>([^<]*</a>)}', '\1keyword\2', $step1);

This works as long as there are no other html tags inside the original hyperlinks.

[edited by: eelixduppy at 5:25 pm (utc) on Feb. 21, 2008]

milanmk




msg:3413524
 10:38 am on Aug 4, 2007 (gmt 0)

Its working fine now. Just made a small change in step2 from '\1keyword\2' -> '$1keyword$2' and it did the trick.

Thank you for your continued patience.

Milan

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved