Welcome to WebmasterWorld Guest from 54.227.231.144

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

need help with preg replace

   
2:33 pm on Dec 26, 2008 (gmt 0)

5+ Year Member



I've got articles that contain certain words I'd like to replace with links.

These words are between <p></p> tags. The word is identified by having a space in front of it and after it. I only want to replace the word with the link.

This is what I have done so far (the test word is hello), but it doesn't work:

Code:


$patterns[0] = "/<p>.+\s(hello)\s.+<\/p>/i";
$replacements[0] = "AAA"; // for testing
$pageContent = preg_replace($patterns, $replacements, $pageContent);

Thanks in advance.

6:45 pm on Dec 26, 2008 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I only want to replace the word with the link.

You have to grab the rest of your pattern then and store it as well so you can use it in the replacement. Note the additional parentheses added to the pattern and the use of the variables in the replacement:

$patterns[0] = "/(<p>.*\b)(hello)(\b.*<\/p>)/is"; 
$replacements[0] = "$1AAA$3"; // for testing

I used word boundaries here rather than space characters as the word may appear at the beginning or end of the element text with no space characters between it and the end. For example ...
<p>hello</p>
8:38 pm on Dec 26, 2008 (gmt 0)

5+ Year Member



thanks coopster, that got me going in the right direction.

I modified it a bit, so it's ungreedy as well as won't match words that are already contained in html:


$patterns[0] = "/(<p>[^<]*\b)(hello)(\b[^>]*<\/p>)/isU";

So it will match

<p>Hello there</p>

but not


<p><a href="/">Hello</a> there</p>

or


<p><table class="hello"><tr>...</tr></table> there</p>

[edited by: eelixduppy at 6:38 am (utc) on Dec. 27, 2008]
[edit reason] disabled smileys [/edit]

8:53 pm on Dec 26, 2008 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yeah, I was thinking about that when I threw the expression down the first time. I was actually going to do that for you but when you said the word was in paragraph tags I figured that was it, nothing more, so I left it as is. Are you certain you want your second tag marker like that? The regex as is with the end tag marker match of
[^>]*
will not match on something like
<p>Hello there <b>darkage</b>!</p>
I would think you will want to keep the dot metacharacter there. If not, since you are no longer using the dot metacharacter you can drop that "s" modifier.

You shouldn't need that ungreedy modifier. Were you having unexpected results without it?

9:35 pm on Dec 26, 2008 (gmt 0)

5+ Year Member



The ungreedy modified is due to having several paragraphs <p></p> so I want to make the "smallest" match, which might not be needed since I negating the '<' and '>'.

I have an issue though with negating the '<' and '>'. I cant replace multiple keywords with links within the same paragraph as once the first replacement is done, it now contains <a href="" ... so all other matches on that paragraph fails.

Have to look into what's the best way to solve it (avoid replacing keywords that have already been replaced and thus are a link or in other words avoid links in links - get it? :-).

Any input ?

3:26 pm on Dec 27, 2008 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Depending on which version of PHP you are running, you may be able to use Recursive patterns [php.net]. They are a bit more advanced but pretty powerful once you get a grip on them. The PHP manual page I linked to there has some information but the perlre [perldoc.perl.org] manual page has much more information.