Forum Moderators: coopster
In our html application, up to 5 different languages could be displayed on the same page at the same time.
We've choosen utf-8 because it seems to be the future that's already ripe to use now.
We haven't encountered any problems up to now, as long as we stay in utf-8 the whole time. That means both with inserts/updates as well as using <meta http-equiv="content-type" content="text/html; charset=UTF-8">.
(we did have to do some utf8_encode()ing on text we imported from existing tables into the new structure.)
Up to now we've only tested it with run of the mill characters like üÜöÖäÄß. We will be testing using Japanese, Chinese, Korean and Arabic characters in about 1-2 months.
We don't expect to experience any problems with data storage, (inserting/updateing), but where we are sure we will see odd results is with order by and the like. That is with data retrieval.
We have mysql 3.23.47, and at least for the forseeable future, aren't able to adjust its configuration. (Dependant on server provider.)
Perhaps there's others out there, who'd like to share their experience(s) with us.
function textIn( $text, $noNewLine = false, $specialChars = true )
{
$text = stripSlashes($text);
//$prePend used for any addition replacements
$prePend = array( "<script>"=>"", "</script>"=>"", //for security
"\r\n"=>"\n", "\n\r"=>"\n", "\r"=>"\n"); //only \n as newLine
//$trans translates HTML_ENTITIES into their decoded values
$trans = get_html_translation_table(HTML_ENTITIES);
$trans = array_flip( $trans );
$trans = array_merge($prePend, $trans);
//this replaces all keys found in $text with their values
$text = strtr($text, $trans);
//utf-8 encode the text
$text = utf8_encode ( $text );
if( $noNewLine ){
//optional, if new lines are not wanted
//and replace any multiple spaces with single spaces
$text = preg_replace("/\s+/"," ",$text);
}
if( $specialChars ){
//optional, replace <>&" with their HTML_ENTITIES
$text = htmlspecialChars($text,ENT_COMPAT, UTF-8);
//[i]Does anybody know exactly what specifying UTF-8 here does?[/i]
}
return $text;
}
We don't want to give to much away about the intent or nature of the project untill its launched. Suffice to say, its inspired by the spirit of "copy left" as used in open source [opensource.org] and GNU General Public License [gnu.org]. You could think of it, as a form of "living" achive.
Where we expect to have problems is with sorting, mysql's "order by". For example, how will mysql order utf8 Chinese characters, certainly not A-Z.
As yet we have not found anything about utf-8 and mysql. Perhape somebody has a few tips or links in regards to this, as well as generally about utf-8.
Here's a list of utf-8 and charset links we've found useful.
[w3.org...]
[unicode.org...]
[hclrss.demon.co.uk...]
[lcweb.loc.gov...]
[zsigri.tripod.com...]
[geocities.com...]
[czyborra.com...]