Forum Moderators: coopster

Message Too Old, No Replies

"\w" pattern and locale matching

"\w" regexp pattern will not match "יטאחש"

         

Marino

4:21 pm on Feb 6, 2007 (gmt 0)

10+ Year Member



Hello,

In the PHP doc, it's stated :
---------------------
\W
any "non-word" character

[...]

A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
---------------------

Well, my french extended chars are not matched, and I really can't find where I can set the "PCRE's character tables" to take my locale into account.

phpinfo() says that :
Configure Command [...] '--with-pcre-regex=/usr'

pcre
PCRE (Perl Compatible Regular Expressions) Support enabled
PCRE Library Version 5.0 13-Sep-2004

Any clues would help.

Thanks in advance,

Marino

Marino

4:28 pm on Feb 6, 2007 (gmt 0)

10+ Year Member



Me again,

Solution may be there, but a bit too complex for me :

www.ugcs.caltech.edu/manuals/libs/pcre-6.4/pcreapi.html#SEC9

Marino

6:33 pm on Feb 6, 2007 (gmt 0)

10+ Year Member



Tried a

setlocale(LC_ALL, 'fr_FR');

... no way.

I use a utf-8 charset for may pages, so may be I should use the UTF8-MODE \p{L} syntax, as stated in the PHP doc ("Unicode chartacters properties at fr3.php.net/manual/en/reference.pcre.pattern.syntax.php), but I've got a message saying :

Warning: preg_replace() [function.preg-replace]: Compilation failed: support for \P, \p, and \X has not been compiled at offset 2 in yada yada...