Forum Moderators: coopster

Message Too Old, No Replies

PHP, Apache, MYSQL and foreign characters

How to make \w match ö or ñ?

         

gethan

8:34 am on Feb 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a site that is written in English, but due to its global nature often gets foreign characters added.

Eg. Scandinavian characters æ, å, Hungarian ö, Spanish ñ etc etc

I'm not talking about unicode (chinese characters etc though I'd really like to know more about these)

I currently have a directory structure that uses place names, currently I'm using o instead of ö for these directory structures, and using preg "\W" to "-" to remove punctuation characters.

What I'd like to do is to allow all foreign characters (that are included in the Extended ASCII set) as filenames and directory names. (Extended Ascii is missing the long ö & ü from the Hungarian char set ... sigh)

So what problems have I not thought of? My initial tests suggest that Apache and the OS (Linux) will be fine with retrieving these.

Next the preg functions - \w doesn't match these types of characters. Reading egrep man page suggests that if I set the local with setlocale(LC_CTYPE,''); I can make this match. The problem is that it is for only one language - post php 4.3 I can send in multiple languages but this seems crazy... is there a more simple way?

[php.net...]

As always any ideas or suggestions are much appreciated.

jatar_k

8:49 pm on Feb 24, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



We go round and round this charset issue every once in a while. I don't really know your answer but maybe these threads can add some insight or confusion

UTF-8, ISO-8859-1, PHP and XHTML [webmasterworld.com]
Saving foreign characters into the database [webmasterworld.com]
Matching international characters [webmasterworld.com]
Greek letters (Numeric Character References) in MySQL? [webmasterworld.com]
Removing umlauts from PHP [webmasterworld.com]