homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

u modifier
What does it do exactly?
Patrick Taylor

 10:01 am on Apr 7, 2014 (gmt 0)

My function uses negative matching to allow only specified characters:

function allowedChars($str) {

$charArray = array(
/* Symbols etc */
'-', '_', '\,', '\.', ' ',
/* Numbers */
'1', '2', '3', '4', '5','6', '7', '8', '9', '0',
/* English */
'A', etc
/* Latin */
/* Greek */

$allowed = implode("", $charArray);
// Strip everything except $allowed
// u -> UTF-8 validity of the pattern is checked
$str = preg_replace('/[^' . $allowed . ']+/u', '', $str);

return $str;


It works, but only after I added the u-modifier to the regex. Previously some 'non-allowed' characters output the invalid character symbol instead of them being stripped out altogether. I have looked up the fact that the modifier checks for UTF-8 but why does the regex not work properly without it?



 3:49 pm on Apr 7, 2014 (gmt 0)

Does your permitted list include actual Greek characters? You would expect to get Issues if you're looking for UTF-8 (i.e. multi-byte) characters and you haven't specified UTF-8 encoding.

Tangential: Does the implode/explode/array arrangement work more speedily than a direct
"replace [^list-of-approved-characters-here] with ''"

:: detour to look up whether php has a \P{ASCII} or equivalent notation ::

Patrick Taylor

 4:13 pm on Apr 7, 2014 (gmt 0)

Greek, Turkish, Russian, Ukrainian, Czech, Polish and Latvian.

(UTF-8 encoding is specified)

The aim is to approve various characters (they just happen to be in an array) and to remove everything else from a body of text. I thought I'd got the regex right with '/[^' . $allowed . ']+/' but before some of the foreign characters were added to the approved array, when I tested for example the Russian characters through the function the modified string contained invalid character symbols instead of them being removed. The u-modifier has corrected things - they are now stripped out - but I don't understand why, and why the original regex didn't work properly.


 12:36 pm on Apr 18, 2014 (gmt 0)

The PHP documentation is often lacking when it comes to PCRE. When I am working on advanced regexs I'll defer to the PCRE man page rather than the PHP docs for the PCRE library version which is currently installed and operating in the server/php installation version. Yes, this latter piece is crucial.

You can see which PCRE Library Version is in your current set up by running a phpinfo() command.

By the way, lots of discussion and comments running back years ago can be found on the PHP docs:


 12:43 pm on Apr 18, 2014 (gmt 0)

This may be helpful as well in regards to PHP versus PCRE documentation, what version is installed and what features are available:

Patrick Taylor

 2:21 pm on Apr 19, 2014 (gmt 0)

Many thanks. I have now bookmarked that thread in my "incredibly useful" folder.


 3:14 pm on Apr 19, 2014 (gmt 0)

You are very welcome. I should have linked to the
perlre docs in that original thread rather than assuming anybody reading it would know what Perl is and how to use Perl regular expressions. Here is the current online man page for Perl regular expressions:

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved