Forum Moderators: coopster

Message Too Old, No Replies

RegEx replace inside/outside of captured match

         

scott182

1:06 am on Nov 9, 2005 (gmt 0)

10+ Year Member



This is really more of a general regular expressions question, but I'm working on it in the context of PHP.

I am beginning with a string of words that I want to split on spaces, but the string may include more than one quoted string with a space inside of them that shouldn't be split, but rather changed to an underscore before splitting the whole string on spaces.

For example, the string $category_string may look like:


first second third "fourth cat" fifth sixth "seventh cat item"

Then $category_string should be split into $category_array:


$category_array = explode(' ', $vars[categories]);

I know how to do this using a preg_replace() statement if there were only two words inside the quoted string, but if there are more than two, it becomes a problem.

On a similar note, I am trying to convert some HTML to LaTeX, and need to replace some special characters, such as &nbsp;, but only _outside_ of any <pre> </pre> tags (this becomes 'verbatim' in LaTeX, and some characters don't need to be replaced).

Thanks for any assistance.

jatar_k

9:52 pm on Nov 9, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I don't really have the answer to your regex questions but I, as always, have a question.

What generates $category_string?

is this something you have no control over? This seems a scenario of programming for all eventualities and that is never fun.

is there no way of normalizing this string when it is generated?

coopster

10:04 pm on Nov 9, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I would just take *two* whacks at it, using the PCRE /e modifier [php.net] to evaluate the replacement string as PHP code.
$string = 'first second third "fourth cat" fifth sixth "seventh cat item"'; 
print "$string\n";
$string = preg_replace [php.net]('/"([^"]+)"/e', "preg_replace('/\s+/', '_', '$1')", $string);
print "$string\n";
$array = preg_split [php.net]("/[\s]+/", $string, -1, PREG_SPLIT_NO_EMPTY);
print_r($array);

scott182

1:15 am on Nov 10, 2005 (gmt 0)

10+ Year Member



Thanks coopster. This works perfectly. This is how I planned to do it at first, but I didn't know about the /e. Regular expressions are fun to work with, but there's just way too much to know.

jatark, the string comes from a text input box in an HTML form, so I don't have any control over it, aside from putting a "separate with spaces" by the text box, so I wanted a way to handle multiple word categories. However, I really don't need to program for all eventualities. I'm creating a to do/notes list similar to del.icio.us for use by only a few people in our lab, so I can just yell at them if they try to do anything crazy!

jatar_k

3:40 am on Nov 10, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> so I can just yell at them if they try to do anything crazy

that's the best way ;)