Forum Moderators: coopster
$str = preg_replace('#\s{2,}#', ' ', $str);
$str = preg_replace('#\s?([=),;+])\s?#', '$1', $str); // using /x here for readability
$str = preg_replace('#
\s?([=),;+])\s? |
(\s){2,}
#x', '$1$2', $str); \s?([=),;+])\s?Don't you mean \s*([=),;+])\s*to account for punct preceded or followed by multiple spaces? Don't you mean...
$str = preg_replace('#
\s*
(
[=),;+] |
\s
)+
\s*
#x', '$1', $str); [=),;+]|\sDo they have to be separate at all? \s*([=),;+\s])\s*It does create a tiny bit of a hiccup if the string is made up entirely of spaces--RegEx picks up all of them and then has to take one step back for “Oh, oops, I was supposed to leave room for the last space”--but you do end up capturing either the punct or a single space, which would be what you want to end up with. Do they have to be separate at all?
Pattern can contain close-parenthesis but not open-parenthesis? Or is that just an artifact of posting?
wouldn't it unnecessarily replace every whitespace with a whitespace?I guess. End result the same, but a smidgen of extra work. If there's an enormous number of them, you could do some benchmark testing to see if there's any meaningful difference.
// convert comments, line breaks, and tabs to a single whitespace
$str = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/|\r\n|\r|\n|\t#', ' ', $str);
// remove repeating whitespace
$str = preg_replace('#\s{2,}#', ' ', $str);
// remove opening or trailing whitespace, or whitespace that's
// following certain characters
$str = preg_replace('#^\s|\s$|\s?([=),;+])\s?#', '$1', $str);
((/\*+[^*]*\*+/(\s*/\*+[^*]*\*+/)*)(wow! those asterisks make it look confusing, don't they)
[\r\n\t]+
doesn't "\s" include [\n\r\t], as well as ' 'Yes, that's what I meant by “non-space spaces” :) If you wanted to pick out all space characters except the literal " " you'd have to do something like [^\S ] meaning “neither a non-space nor a literal %20 space".
// using /x for readability
$str = preg_replace(#
(
(
/\*+
[^*]*
\*+
/
(
\s*
/
\*+
[^*]*
\*+
/
)*
)
)#x, $str) $str = preg_replace(#
/\*
[^*]*
\*+
(
[^/]
[^*]*
\*+
)*
/#x, $str) // First option, 0.95533585548401s
$str = preg_replace('#^ +| +$| *([=),;+ ])+ *#', '$1', $str);
$str = preg_replace('#/\*+[^*]*\*+([^/][^*]*\*+)*/|\s+#', ' ', $str);
// Second option, 0.62886810302734s
$str = preg_replace('#^ +| +$| *([=),;+ ])+ *#', '$1',
preg_replace('#/\*+[^*]*\*+([^/][^*]*\*+)*/|\s+#', ' ', $str)
); // First option, 0.95533585548401sIt was all those slashes that prompted me to experiment, since the italics in [ code ] made it look as if there are three kinds of slash.
$str = preg_replace('#^ +| +$| *([=),;+ ])+ *#', '$1', $str);
$str = preg_replace('#/\*+[^*]*\*+([^/][^*]*\*+)*/|\s+#', ' ', $str);
// Second option, 0.62886810302734s
$str = preg_replace('#^ +| +$| *([=),;+ ])+ *#', '$1',
preg_replace('#/\*+[^*]*\*+([^/][^*]*\*+)*/|\s+#', ' ', $str)
);
Fun fact: I actually have no idea what # means in those preg_replace expressions :)