Welcome to WebmasterWorld Guest from 54.167.86.211

Forum Moderators: incrediBILL

Message Too Old, No Replies

encoding for thai and vietnamese

how to set the charset, or go to unicode?

     
10:11 pm on Jan 18, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 24, 2003
posts:99
votes: 0


researching a bit for thai and vietnamese language sites, should i stick with for e.g. thai
<meta http-equiv="Content-Type" content="text/html; charset=windows-874">

or go to unicode/UTF8?

Your thoughts?

9:28 am on Jan 19, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 7, 2003
posts:408
votes: 0


I would use UTF-8 for these languages.
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
10:43 pm on Jan 19, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


A lot of Thai websites use:

<meta http-equiv="Content-Type" content="text/html; charset=TIS-620">

Some use:

<meta http-equiv="Content-Type" content="text/html; charset=windows-874">

2:19 am on Jan 20, 2004 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14487
votes: 49


I know that a lot of developers in Japan and China still shy away from Unicode for page encoding. I wouldn't be surprised if other Asian language character sets had similar problems with it. Although Unicode may seem like a panacea, there are still a number of perceived problems with it. Do a survey of some of the leading sites in those languages and see what they use.
8:53 am on Jan 20, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 7, 2003
posts:408
votes: 0


On the other hand, if you have pages in several languages, it's much easier to use UTF-8 on ALL pages. I have tested it in more than 60 languages and it works perfectly.
9:11 am on Jan 20, 2004 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14487
votes: 49


You've tested 60 languages with local operating systems and browsers? Do those include Thai and Vietnamese? I'm not saying that UTF-8 isn't great for a number of languages...it's just that I've heard reports of certain Asian languages where it hasn't worked. Make sure you test with local webmasters who may know about potential problems.
12:48 pm on Jan 20, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 7, 2003
posts:408
votes: 0


Yes. Thai and Vietnamese are amongst them, and they display correctly.

Note: you must save all documents as Unicode documents, or it won't work...

4:17 am on Jan 21, 2004 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14487
votes: 49


Could I ask what browsers and which versions you tested with and in combination with what operating systems? IE has always been quite good with Unicode display. It's Netscape and some others that were problems from what I recall.

You would probably want to find a breakdown of what the most popular browsers (and versions) were for those respective language markets and consider those factors as well. Then of course it may depend on your niche market's users...but you all know that.

9:58 am on Jan 21, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 7, 2003
posts:408
votes: 0


bill, all modern browsers have a built-in Unicode support, so that's no problem. All operating systems support Unicode, so that's no problem either.

You'll find more information on this site: [alanwood.net...]

The major reason why we use UTF-8 on all our pages is this: when you have a mix of several languages (for example Vietnamese, Chinese, Korean and English) on the same Web page, all characters display properly.

That's why all (smart) translation bureaus use UTF-8.

6:24 am on Jan 22, 2004 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14487
votes: 49


tombola don't get me wrong...I think Unicode is a great idea. I've been waiting for years for it to work out the kinks on the Asian language side...I'm waiting for someone to prove to me that it's 100% ready. ;)

Take a look at this article: A peek at Unicode's soft underbelly [www-106.ibm.com]
This came up in a recent discussion [webmasterworld.com] (msg#9) we had over in the Asia Pacific Forum. I get wary when people tout UTF-8 as the solution to encoding problems because I've heard a lot to the contrary. I'm really just playing devil's advocate here waiting for some of the old Unicode pros to show themselves.

9:45 am on Jan 22, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 7, 2003
posts:408
votes: 0


ok bill, I rest my case.

... but I'll stick to UTF-8 ;-)

10:36 am on Jan 22, 2004 (gmt 0)

Full Member

10+ Year Member

joined:May 27, 2003
posts:242
votes: 0


thai-language.com which is apparently a resource for learning Thai suggests using Unicode for Thai.
ht*p://www.thai-language.com/default.asp?tab=5

Vietnamese is ~Latin a-z + alsorts of accents and diacritic marks which Unicode handles easily. That it is isn't a more complicated graphic script suggests Unicode will probably be the best for Viet also.

Of course the best way would be to ask a local Thai and Viet users who might be able to advise the prefered system in those countries.