Forum Moderators: coopster
The only thing I'm not able to do is using "preg_match_all()" with the unicode modifier "/u". It just doesn't work.
$text = "été" ;
preg_match_all('/[\w]+/u', $text, $matchs);
$matchs = $matchs[0];
$matchs would only contains "t" here, as if "é" was not alphanumeric!
In fact what I'm trying to do is a function that would split an UTF-8 string into words.
I tought:
$mots = mb_split("\W", $text) ;
would do the job but it seems that this function is not aware that some extended characters are not alphanumric (ie: "´").
Any help will be really appreciated.
But even if I use "mb_regex_encoding('UTF-8');" it doesn't work.
mb_split() still thinks that "´" is alphanumeric even if, I think, this character should be used as a delimiter.
Try it:
mb_regex_encoding('UTF-8');
$words = mb_split("\W", "aaa´bbb") ;
Or, if the editor you use doesn't fully support UTF-8:
mb_regex_encoding('UTF-8');
$words = mb_split("\W", "aaa´bbb") ;
I would be SO happy to find a solution for that problem...
I can't believe it's so hard, I'm only trying to get the words of a UTF-8 string. :-(
The only way I found that seemed to work, was:
preg_match_all('/[\w]+/u', "aaa´bbb", $words);
on my WINDOWS XP box.
On my Linux machine (my main web server) it doesn't work and I have no idea why!