Welcome to WebmasterWorld Guest from 54.226.67.166

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

u modifier

What does it do exactly?

     

Patrick Taylor

10:01 am on Apr 7, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My function uses negative matching to allow only specified characters:

function allowedChars($str) {

$charArray = array(
/* Symbols etc */
'-', '_', '\,', '\.', ' ',
/* Numbers */
'1', '2', '3', '4', '5','6', '7', '8', '9', '0',
/* English */
'A', etc
/* Latin */
etc
/* Greek */
etc
);

$allowed = implode("", $charArray);
// Strip everything except $allowed
// u -> UTF-8 validity of the pattern is checked
$str = preg_replace('/[^' . $allowed . ']+/u', '', $str);

return $str;

}


It works, but only after I added the u-modifier to the regex. Previously some 'non-allowed' characters output the invalid character symbol instead of them being stripped out altogether. I have looked up the fact that the modifier checks for UTF-8 but why does the regex not work properly without it?

lucy24

3:49 pm on Apr 7, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Does your permitted list include actual Greek characters? You would expect to get Issues if you're looking for UTF-8 (i.e. multi-byte) characters and you haven't specified UTF-8 encoding.

Tangential: Does the implode/explode/array arrangement work more speedily than a direct
"replace [^list-of-approved-characters-here] with ''"
?

:: detour to look up whether php has a \P{ASCII} or equivalent notation ::

Patrick Taylor

4:13 pm on Apr 7, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Greek, Turkish, Russian, Ukrainian, Czech, Polish and Latvian.

(UTF-8 encoding is specified)

The aim is to approve various characters (they just happen to be in an array) and to remove everything else from a body of text. I thought I'd got the regex right with '/[^' . $allowed . ']+/' but before some of the foreign characters were added to the approved array, when I tested for example the Russian characters through the function the modified string contained invalid character symbols instead of them being removed. The u-modifier has corrected things - they are now stripped out - but I don't understand why, and why the original regex didn't work properly.

coopster

12:36 pm on Apr 18, 2014 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The PHP documentation is often lacking when it comes to PCRE. When I am working on advanced regexs I'll defer to the PCRE man page rather than the PHP docs for the PCRE library version which is currently installed and operating in the server/php installation version. Yes, this latter piece is crucial.

You can see which PCRE Library Version is in your current set up by running a phpinfo() command.

By the way, lots of discussion and comments running back years ago can be found on the PHP docs:
[php.net...]

coopster

12:43 pm on Apr 18, 2014 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



This may be helpful as well in regards to PHP versus PCRE documentation, what version is installed and what features are available:
[webmasterworld.com...]

Patrick Taylor

2:21 pm on Apr 19, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Many thanks. I have now bookmarked that thread in my "incredibly useful" folder.

coopster

3:14 pm on Apr 19, 2014 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



You are very welcome. I should have linked to the
perlre
docs in that original thread rather than assuming anybody reading it would know what Perl is and how to use Perl regular expressions. Here is the current online man page for Perl regular expressions:
[perldoc.perl.org...]
 

Featured Threads

Hot Threads This Week

Hot Threads This Month