Forum Moderators: coopster

Message Too Old, No Replies

A-Z regex for international languages

         

optik

5:35 pm on Mar 17, 2009 (gmt 0)

10+ Year Member



Hi

What are the best methods for using a regex to cover a number of international alphabets?

henry0

10:11 pm on Mar 17, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I might be wrong, actually I'll love to see how it can be done but I do not think there is such a (very useful) regex.
I will do a switch and treat one case at a time.
of course it could be done by allowing a long list of special char belonging to each language, such as n tilde, reverse question mark, inflection etc...
But you could also end by allowing too much
Example: allow for reverse question mark thus regular question mark too
Personally I disallow question mark .. see where I go.

rob7591

11:43 pm on Mar 17, 2009 (gmt 0)

10+ Year Member



So you're trying to match all characters including ones such as åéáê etc? Do you want punctuation as well?

You can use \x to match a character's hex code (and search through ranges).

I don't know if Ç is a character in any alphabet, but the range from Ç (\x80) to Ñ (\xA5) appears to have most of the special accented characters and what not.
"/[\x80-\xA5]/" should match a good portion of the characters that you are looking for. (of course you can use a bar to use this in conjunction with a-z0-9 etc.)

phranque

6:31 am on Mar 18, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i would suggest using the \w metacharacter (matches any "word" character) which is available with Perl Compatible Regular Expression Syntax [php.net].

optik

8:23 pm on Mar 20, 2009 (gmt 0)

10+ Year Member



the \w works nicely although htmlentities then messes it up and replaces the characters with the wrong codes for example é gets changed to é

phranque

11:17 am on Mar 21, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



have you set the locale to include the language-specific codes?

optik

3:46 pm on Mar 21, 2009 (gmt 0)

10+ Year Member



No am I meant to? How would I do that.

optik

1:32 pm on Mar 23, 2009 (gmt 0)

10+ Year Member



Here is the how to set htmlentities to include UTF-8 character set

htmlentities($s, ENT_NOQUOTES,"UTF-8");