gb2312 which often means gbk, is better
gbk contains most characters but not all, in big5
I'll second GB2312
The GB part loosely stands for National Standard.
In the US you are familiar with ANSI standards, In Europe EN, in Britain BS, and so on.
what do you mea with I`ll second? you prefer the second one?
I get a lot of people asking if our webmail client supports big5. I'm not sure if that means it's the most popular or most common, though.
|what do you mea with I`ll second? you prefer the second one? |
That means, "if you want a second opinion" I would also recommend using GB2312. That's the one I see the most and the one that I've used. Sorry for the confusion.
GB2312 is for "Simplifed Chinese", which is used by 1.3 billion people in mainland China. Big5 is "Traditional Chinese", and is used by 30 million people in Taiwan. I believe there are some online tools to convert one to another.
So there is a direct mapping between those 2?
Or is there only a one-way mapping?
Logically there should be a 100% 1:1 mapping from the Big5 to the simple version?
so it means offical
there's no formula to do the convertion between GBK and Big5, you need a map
some chars created by hk/tw may not included in GBK charset
and Big5 does not contains Simplified Characters
most but not all chars can be map 1->1
there's also some "same meaning" but just "different look" chars
GBK ~= GB2312+Big5
so u may also need a map to conv "simplified"<->"traditional" within gbk<->gbk
there is also GB18032, included more chars, but not that widely supported, and can be mapped into UNICODE, which can be encoded into UTF-8
i don't know much about unicode, but if possible, i'll recommend you have a try on using UNICODE(UCS-2orUTF-8) as Internal Encoding, so it can be mapped from/into both GB2312/Big5
|I know the GBs are simplified chinese. |
What does that mean? Are latin characters instead of chinese characters used instead?
Chinese written out in a romanized system/latin characters is called pinyin. Simplified Chinese was created in China in the 50's. It's still Chinese characters, but some of them have been somewhat simplified, as the name suggests.
|there's no formula to do the convertion between GBK and Big5, you need a map some chars created by hk/tw may not included in GBK charset and Big5 does not contains Simplified Characters.... |
Theoretically this may be true, but I think at least 99% often used words are common, and the converting is not hard.
There are online tools to do it. Search "Big5/GB convertor" on Google. mandarintools.com is a good place to start. I noticed an online service to convert whole web pages on the fly between these two codes some time ago, but I cannot remember the website any more.