Question about conditional text replacement

Forum Moderators: coopster

Message Too Old, No Replies

Question about conditional text replacement

Looking for a method to use

Aleister

3:02 am on Nov 11, 2006 (gmt 0)

Here is my situation - I have a string which contains some HTML. I want to replace text A with text B. The only problem is, I only want the replacement to occur in instances where A is not part of a URL or image link.

For example, here is a string I would perform this replacement on:

This is a test! <a href="test.com">test</a> <img src="test.jpg" alt="test" />

In that example, the word 'test' is all over the place. I would only want to replace the first instance of it though, because the rest are all in either a link or image tag.

I was thinking about ereg_replace, but could not get it sorted out properly. Any thoughts? :)

coopster

1:55 pm on Nov 12, 2006 (gmt 0)

A regular expression is the likely tool to use. I prefer the Regular Expression Functions (Perl-Compatible) [php.net] engine over the Regular Expression Functions (POSIX Extended) [php.net] engine though. preg_replace [php.net] should do the trick. What have you got so far for your pattern?

Aleister

8:47 pm on Nov 12, 2006 (gmt 0)

I really do not have anything that comes close to doing this yet :) I know how to search for text within a certain tag, but I do not know how to search for text that is not in a set number of tags.

I have limited experience with regular expressions in general. While I do use them for basic tasks, I have not been able to find any examples of anything similar to this.

Edit: What if I did something like this:

1) Find instances of A in links and images and rename them to something else - Z

2) Perform the normal search and replace for A to B

3) Afterwards, change Z back to A

[edited by: Aleister at 9:05 pm (utc) on Nov. 12, 2006]

pixeltierra

9:27 pm on Nov 12, 2006 (gmt 0)

I agree with coopster, use: preg_replace, as you can limit the number of replacements perforemed.

And you don't even have to learn regex, since your text is always just a string.

Do your self a favor though, and learn regex. It's painful at first. Liberating for the rest of your life.

Aleister

10:06 pm on Nov 12, 2006 (gmt 0)

I was able to do what I needed using this method:

function protect_text($content, $a) {
return stripslashes(str_replace($a, "TEMPTEXT", $content));
}
// protect text between < and >
$content = preg_replace("^(<)\n?([\S�\s]*?)\n?(>)^ie", "'<' . protect_text('$2', $a) . '>'", $content);
// protect text in hrefs
$content = preg_replace("^(<a)\n?([\S�\s]*?)\n?(</a>)^ie", "'<a' . protect_text('$2', $a) . '</a>'", $content);
// replace A with replacement text B
$content = str_replace($a, $b, $content);
// set TEMPTEXT back to A
$content = str_replace('TEMPTEXT', $a, $content);

(It has been simplified a bit, to make it easier to read)

This is very far from optimal, and I am sure there are many cases that this will not handle very well, such as newlines in the href or img tag, but it works somewhat :)

If someone has a better way to do this, I would love to hear it :) I found many sites where people were asking about this, but I did not ever see anything aside from this type of solution.

[edited by: Aleister at 10:07 pm (utc) on Nov. 12, 2006]

mcibor

10:48 pm on Nov 12, 2006 (gmt 0)

As I tested it seems to work:

/(>[^<]*?)(test)/

It should check for all appearances of test between > and < (outside tags)

<p>This is a test! <a href="test.com">test</a> <img src="test.jpg" alt="test" /></p>

Hope this helps you.
I'm not sure how it will work with preg_replace. I tested it with preg_match and it finds the first test (in $result[2]). I don't know how to change it to find just test.

Michal

Aleister

11:06 pm on Nov 12, 2006 (gmt 0)

mcibor: Thanks, but I am not sure if that would work for what I need. Wouldn't that also catch link text? Since it is between > and <

<a href="test">this text here</a>

And if the string I am processing does not start/end with a tag, it would not catch anything at all:

sample text

[edited by: Aleister at 11:07 pm (utc) on Nov. 12, 2006]

mcibor

1:08 pm on Nov 13, 2006 (gmt 0)

In html everything starts and ends with a tag - everything is between <html> and </html>

I thought you want to change all text, appearing in link as well.

You would have to add a not </a> following clauzule.

Hard thing to do, really.

Sorry for misunderstanding.
Michal

Aleister

12:24 am on Nov 14, 2006 (gmt 0)

mcibor: You did not misunderstand, I was just being overly picky ;)

The supplied content is actually just part of an html page, so it would not neccesarily have the tags, but that is fine - and for what this is needed for, it is most likely the best option.

In theory I could also just append a less than and greater than character to the beginning and end of the string, and remove it afterwards. :)

A 100% solution would require too many 'what ifs' and would be a regex nightmare anyway :)

[edited by: Aleister at 12:25 am (utc) on Nov. 14, 2006]