|Encoding Characters and Multi-Language Websites|
On a multi asian language website, we have some encoding problems.
For exemple, when you receive a email in chinese, on some computer it will appear correctly, but on an other one it will be impossible to read (both computer has the chinese characters encoding implemented, both of them can browse chinese website).
DO you have any website that explain the characters encoding system. I really need to solve this problem. I can't send emailto customer if they can't understand it.
What is the better encoding system we can use (for 7 asian language - Thai, Chinese (sdimplified and traditional), Korean, Indonesian, japanese).
Thnaks a lot
Which character set are you using when displaying the page?
Are the characters in unicode?
UTF-8 Unicode should work in your case.
We have no problem with Chinese (all versions) Japanese and Korean. No idea about thai and Indonesian however given the above fact that should be no problem.
UTF-8 only works if the characters are actually in Unicode, though.
OK...this gets complicated
UTF-8 (Unicode) covers most languages and if you create text in Unicode it can be read by a huge range of software...however
some countries hit the Internet in a big way before Unicode was widespread...so they have a lot of people using a different character encoding system for their language...relatively simple examples are shift_JIS in Japan and Windows-1251 in Russia...Chinese is more complicated since Taiwan and Hong Kong generally use different forms of character anyway (simplified and traditional) and developed two separate encoding systems (Big 5 and GB)...of course the mainland generally uses UTF-8 to complete the set
so...it gets complicated if you want 100% accessibility...you'll need to offer at least two versions of Japanese (though few people now require shift_JIS) and three versions of Chinese
when it comes to email you need to use the standard system for communicating across the language barrier...send in your own language with instruction on how to find an online automatic translation service unless you have staff who can communicate fluently in the relevant language
[edited by: tedster at 12:54 am (utc) on May 11, 2004]
some further useful sites with information on character encoding and multi-lingual web sites
That's a nice set of references, Eric. I was immdiately able to fill in some gaps in my knowledge.
I'm currently setting up a site to cover pretty much everything I've learned about i18n...it'll be up in a few weeks time
I hope the moderators will make an exception to let Eric post the URL to his site once he has it running, even if it might be considered self-promotion.
I've read pretty extensively at Jukka Korpela and Alan Wood's site, but it can be such a complex issue, that more perspectives can't hurt.
One more link that I like
it has relatively detailed info on every unicode character and a pretty good search function. It doesn't really have any info that addresses the original poster's question, but I find it a handy resource.
I'll set it as the site in my user profile, that's the simple answer
Thanks. Anything to help avoid getting those funny looking characters to show up.
Actually, I think I can do "international" (read: anglo-euro) pretty well. I still see a lot of sites, though where they obviusly have no clue that things will look wrong if they don't get the encoding right.
Thanks, I will have a look on it, but three versions in chinese, 2 in japanese, etc.... will be really complicated. we have a lot of content!
Now, if I want to use this kind of system, to have a 100% compatible website, with all the asian languages, is there any way to select automatically the right encoding system per user? Or the user will have to select by himself the right site version.
I've just posted an explanation of my favoured system in the Asian Search Engines Forum
basically you need to look up content negotiation...this will allow a visitor to be directed to a page according to their browser settings...however not everyone will want to read the site in the language they have set the browser to (eg if they are in an Internet cafe, visiting a client/supplier abroad and using their desktop etc)...so ONLY use content negotiation on index.html in each language and direct all internal links to a home page to default.html
use a language switching page that covers all the languages offered...on that you need the two letter language code, the type of encoding, the name of the language in that language and in English, and a short piece of descriptive text as spider food