Welcome to WebmasterWorld Guest from

Forum Moderators: bill

Message Too Old, No Replies

Unicode vs legacy charsets

4:15 pm on Nov 13, 2009 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 7, 2005
votes: 0

I'm in the planning stages of some website translations but I've hit a patch of confusion. Take the Thai language for example, one of the languages I'm going to be using...

The Thai specific charset is TIS-620... but if I mix languages on the page, am I to use UTF-8? Which is the most efficient use of data. Can someone put together a short idiots guide for me. Why can't I just use UTF-8 for everything?

2:42 am on Nov 14, 2009 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
votes: 167

Thai is one of those languages I need to play with someday. That's a market I'm not experienced with yet. A few years ago I would have said that the native charset would have been preferable, but now when I look at the Thai versions of sites like Yahoo or Google they're all using UTF-8. They use multiple languages on the page, but they don't mark them up the way the W3C recommends.

This article on the W3C site shows what they consider to be the way to handle language tags in your headers and inline: Internationalization Best Practices: Specifying Language in XHTML & HTML Content [w3.org]

I think you can use UTF-8 for everything as long as you specify the language properly in your headers and inline if necessary.