Forum Moderators: open

Message Too Old, No Replies

Chinese text - charset

         

yllai

1:01 am on Jan 2, 2007 (gmt 0)

10+ Year Member



I want to display chinese text in my web. I know that I have to set meta tag. So, what is the most common charset type for all browsers?

I have seen some webpage containing chinese text can display it nice ly, some need user goto change the language encoding. Can this problem can be solve? How? Where can I get more info about this?

Thanks...

bill

1:21 am on Jan 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The most standard Chinese charsets are:
  • GB2312 - Simplified Chinese (mainland)
  • big5 - Traditional Chinese (Hong Kong, Taiwan)

Some people have reported successful use of UTF-8 as well, but keep in mind that you'll get the best compatibility with the aforementioned charsets.

yllai

9:10 am on Jan 2, 2007 (gmt 0)

10+ Year Member



thanks for advice

commanderW

8:14 pm on Jan 2, 2007 (gmt 0)

10+ Year Member



Also - check out the unihan database at unicode.org
[unicode.org...]

shutuzhe

9:10 am on Jan 11, 2007 (gmt 0)

10+ Year Member



GBK is better now

bill

2:46 am on Jan 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



GBK is better now

Better than which encoding? I can see the advantages as GBK is a superset of GB2312, but I've not seen it in use much.

DamonHD

1:37 pm on Jan 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Bill (and others),

Are there actually any hard numbers on what %ages of the browsers used by real Chinese native-language users, in the PRC, rest of AsiaPac and elsewhere, actually support (and have decent fonts for):

UTF-8
Big-5
etc?

and what any trends are?

Would I get useful/accurate data by collecting HTTP headers?

I'm using UTF-8 for all languages in all countries, and would need see real value in making a special case for my PRC server or users. My conversion rates there are still low, but I suspect UTF-8 browser support problems are only a small part of the issue!

Rgds

Damon

bill

3:13 pm on Jan 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The CNNIC site has some interesting reports available in English [cnnic.net.cn], but I didn't see that particular stat. Offhand I would say that there is a much larger portion of Chinese speakers who would use GB2312 rather than Big5.

However, in the case that you've got a working UTF-8 site I don't know that you'd see much benefit in changing the site over. You could invest that time and money into testing the UTF-8 site and making sure there are no issues for you audience.

inbound

3:25 pm on Jan 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are 3 main things that you can do to help a browser use the correct character set:

1. Use the correct HTTP Header to set the character set for the document (very simple in PHP)

2. Declare the characterset in the META tags of the document

3. (and often overlooked) Use <form accept-charset="?"> in forms to state that data should be in the given charset.

Using all 3 may be a little over the top but it often makes life easier in the long run (what if your Header and META charsets are different and browsers change to being more strict?)

Also, you should be aware that handling data from different (or potentially different) charsets can be problematic in PHP. UTF-8 causes many issues with PHP4 (as will other multi-byte sets) as their string functions work at byte level. I don't know about other languages as PHP is the one we use (as we don't have many such issues now).

DamonHD

7:12 pm on Jan 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi,

Thanks for that. I'm doing (1), I'm not doing (2), and (3) isn't applicable (I have no forms).

Rgds

Damon