RegExp: add <i> </i> around " "

Forum Moderators: coopster

Message Too Old, No Replies

RegExp: add <i> </i> around " "

sarka86

7:44 am on May 1, 2008 (gmt 0)

Good morning!

I have a string that contains one or many quotations between " and ". I'd like to add an italic style around this text but I have no idea about how to do that.

For example:
$str = "and he said: \"I am your father\""
magic_function($str) outputs "and he said: <i>\"I am your father\"</i>"

Can someone be so kind to help me?

Thank you a lot!

PHP_Chimp

11:18 am on May 1, 2008 (gmt 0)

Although this wont cope with nested quotes it will put the italic tags around double quoted strings.


function italic_quotes($string) {
$pattern = '%(".+?")%s';
$replacement = '<i>$1</i>';
$string = preg_replace($pattern, $replacement, $string);
return $string;
}

For the sort of simple string you have this should work OK.

[edited by: PHP_Chimp at 11:19 am (utc) on May 1, 2008]

sarka86

12:27 pm on May 1, 2008 (gmt 0)

Wow! Thank you a lot.
Could you be so kind to explain me also how does it work?
I've read the specs of preg_replace but I don't get for example the meaning of $1.

PHP_Chimp

12:55 pm on May 1, 2008 (gmt 0)

OK, a basic explanation...I apologize if you know most of it already ;)

$pattern = '%(".+?")%s'; ... should actually be $pattern = '%("[^"]+?")%';

The ()'s capture a " followed by any character that is not a " 1 or more times (the +). The ? after the + means that you will get the shortest matching patter.
So if you have -
he said "hello" then "bye"
You want a match of "hello" and "bye" not "hello" then "bye". So you need the shortest matching pattern.

$replacement = '<i>$1</i>';

The captured patterns are stored so you can refer to them. You can either use \1 or $1 for the first pattern, \2 or $2 for the second and so on for 99 patterns. It is suggested that we use $ version as there is then no confusion with some of the other escape sequences that all start with a \. That is the reason that you can get away with using ' around the replacement. As it is the regex engine that is substituting the $1 not php's string engine.

<edit>
The reason for the change in $pattern is that as the . matches any character the original will only work with a single set of "s the second should work with multiple quotes. It will still not work with nested quotes...although nested quotes should be single quotation marks, not double, if we are getting into gramma.

An improvement to this function would be to use curly quotes. So the below will turn the start and end quotes into nice curly ones (I am assuming that you are writing in English, but you could change the code to suit any language).


$pattern = '%"([^"]+?)"%';
$replacement = '&#8220;<i>$1</i>&#8221;';

[edited by: PHP_Chimp at 1:07 pm (utc) on May 1, 2008]

sarka86

1:30 pm on May 1, 2008 (gmt 0)

Hallo again!

Very interesting your explanation. The trick with ? is great.
I have another question:
Can I also say to select all characters that are not included in other characters?
Always in my example let's assume there are formatting tags:

$string = "<div style=\"border:0px;\">He said \"Hello!\"</div>";

if I apply the function now I will obtain:

"<div style=<i>\"border:0px;\"</i>>He said <i>\"Hello!\"</i></div>";

Can I skip the occurences that are included in < > ?

Thanks!

PHP_Chimp

3:37 pm on May 1, 2008 (gmt 0)

The answer is yes...however your regex foo is going to get a workout if you want to do it properly.

The full solution would involve lookaheads however there is a poor solution that is a lot easier below.

The poor solution would be to look and check for <.+?> as this should match a tag.


$pattern = '%(?:<.+?>)?"([^"]+?)"(?:<.+?>)?%';
$replacement = '&#8220;<i>$1</i>&#8221;';

may work for you, but it isnt very good.

[edited by: eelixduppy at 4:25 pm (utc) on May 1, 2008]
[edit reason] disabled smileys [/edit]

sarka86

4:49 pm on May 1, 2008 (gmt 0)

Thank you again for your help. I thought it'd have been difficult.
Unfotunately this second solution doesn't seem to work. For example with the string:

print preg_replace($pattern, $replacement, "<div style=\"value\">text \"quote\"</div>");

I obtain still the wrong output.
But I got your idea: you will transform only text between " that is contained between <*> and <*>.

g1smd

6:42 pm on May 1, 2008 (gmt 0)

That should be between <*> and </*> I think

sarka86

8:11 pm on May 1, 2008 (gmt 0)

Hallo and welcome to this thrilling discussion :)

Isn't </*> a subset of <*>? And what happened with tags like <br />?
I guess <*> Should work better.
Anyway it doesn't work.