Welcome to WebmasterWorld Guest from 54.242.115.55

Forum Moderators: bill

Message Too Old, No Replies

Gb2312

I am confused!

     
10:19 am on Dec 24, 2007 (gmt 0)

New User

10+ Year Member

joined:Dec 24, 2007
posts: 4
votes: 0


Greetings from a non Chinese in China.
Surfing I found a thread back in 2004 re different language codes in China, I THINK this is the board, please, don't nail me up if I haev the wrong one.
I also need to make it clear I am NOT a designers or coders other end of the horse, but I need to do it out of desperation so please, reply as if I were a 3 year old!
Ok, I have a web site, from a template which i drastically changed. Surpising it runs well, ...in English.
It needs to be bi ling Chinese, [mainland] so I added the Unicode UTF-8.
I code in Linux using Opera and it looks wonderful!
But, so many people can not read it, I was advised to use GB2312 so I switched. The problem is that then NO ONE can read it, not even me in my system, it looks Chinese but is actually garbage. Switch back to unicode and some can.
I have been over and over as I thought at first I had mistyoed, [not my strong point!] or missed a code or < etc somewhere, but it is fine.
Now I am so confused I see myself coming back. Can any kind soul please put me out of my misery?
Why can it be read by some in UTF including all my office machines, looks like garbage on most other Chinese users, but when I reeset the language it can't be read by anyone? Actually, the reason isn't so important, I just need to know waht to do.
I have asked other Chinese designers, including the guy who does a lot of work for us who all say it "LOOKS" ok, but they have no idea.
I did have the lang code in the HEAD details, I was told to put it in body, I did, no difference.
Currently it is coded thus;
<html lang="zh">
<meta http-equiv="Content-language" content="zh" />
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
Oh, it's Christmas eve, Guess what I want for Christmas?
Ho ho ho
Thanks
peter
2:58 pm on Dec 24, 2007 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:15136
votes: 167


Welcome to WebmasterWorld wordperfect.

Maybe I've had too much Christmas cheer...but it seems that you have a GB2312 page and perhaps too much UTF8 data.

GB2312 is the recommended format for Chinese sites. It's the most compatible with the widest variety of Chinese browsers.

3:34 pm on Dec 24, 2007 (gmt 0)

New User

10+ Year Member

joined:Dec 24, 2007
posts: 4
votes: 0


Hello Bill, Merry Christmas.... enjoy the cheer!
Thanks for your comment, sorry, but maybe I have not had ENOUGH xmas cheer... not sure I understand.
UTF data?
4:36 pm on Dec 24, 2007 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:15136
votes: 167


UTF-8 and GB2312 aren't exactly the same. ;)
7:26 pm on Dec 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 21, 2002
posts:1051
votes: 0


You don't say if your Chinese text is entered directly on the page or from a CMS system. Either way the text on the page has to be encoded in the same encoding you specify in the meta tag with "charset=". If someone has created the Chinese text in an editor, then you need to convert it before pasting it into the page.

You can get convertors which do this, typically by pasting the original text into the convertor and then copying out the result. I use NJStar Word Processor to creat the text and this has a facility for copying directly into GB2312 or Unified-simplified. (Plus also Big5 and Unicode-traditional.)

One point to note is that Chinese uses double-byte characters. So if a single byte is missing from the start of the text the result can look like Chinese but is actually garbage.

10:20 pm on Dec 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 27, 2001
posts:1472
votes: 0


Check the server headers for content-type too.
1:55 am on Dec 25, 2007 (gmt 0)

New User

10+ Year Member

joined:Dec 7, 2007
posts: 5
votes: 0


GB2312 is simplified Chinese
Big5 is traditional Chinese
3:35 am on Dec 25, 2007 (gmt 0)

New User

10+ Year Member

joined:Dec 24, 2007
posts: 4
votes: 0


Cheers for that everyone. Yep I am au fait with the different codings, and BG2312 V/s UTF8.
I need the BG2312 code as it is to be read by Mainland Chinese, not HK or Taiwan.
I had originally set it to UTF, my system reads that, being English, as do others but the majority are set to GB2312.
So i altered the charset code, thinking it was that simple.
However, reality bites hard.
Thanks HarryM, maybe you are closer to the problem, although I need to research to find out exactly what you mean!
I use Gedit on Linux set to Chinese but I am not sure which flavour was used as a Chinese native typed the info.
I then loaded that HTML page directly to my site via the ftp.
So i need to dig and experiment a bit to see what gives, the code or my santity!
Again, ta for your help all.
Peter
5:24 am on Jan 8, 2008 (gmt 0)

New User

10+ Year Member

joined:Dec 24, 2007
posts: 4
votes: 0


Greetings,

First, thanks to everyone who replied but a special big thank you to those who took the time to read and think about what my real problem was and offer a practical solution.
I put this on several sites and got a lot of help.
Actually HarryM on this board was on the right track.
It was from his comments that I finally found the answer.
At the risk of going on I will detail what happened in case anyone else has the problem in the future.

The original English was typed in Gedit, Linux using Charset 8859-1
The Chinese was typed in Chinese, on Chinese Word using native Chinese Win Xp O/S, then transferred to my via flash disk.
The Chinese site Charset was originally UFT 8
When it became apparent that the majority of users could not read the Chinese I switched the code to GB2312 but that did not cure the problem.
From here a number of Chinese designers offered help, but could not solve the problem, some even rewrote the text again, and again transferred it to me.
The closest was a suggestion that I needed to put an instruction in the meta code that the body text was GB2312, but as to how, no one was sure.
This is actually the key, but I missed that sign post.
In the end the solution was to use "SAVE AS" when editing and select the GB2312 code, easy in Linux.
In Win 2003 server it was suggested to use Star WP, which proved a no go for me because when i tried to convert the text became a series of? making editing impossible and it also uploaded to my site via FTP as?
Not the look I wanted.
In the end I downloaded a free to trial [sorry!] copy of EC character encoding software which was ridiculously simple and effective.
I recommend this and will buy it if I ever have to do another Chinese site and need to use Windows.
The only hassle, if you can call it that is that when it is converted, either by Linux or EC converter, the scribble LOOKS Chinese but is actually garbage, on the editing block, BUT, miraculously, displays as real Putongwha on the site.
So, any subsequent editing means a reconvert back to UTF or big 5, then convert back to GB2312 before uploading. There may, surely must be, a better way but I was just happy after 6 weeks to have found a solution.
Maybe someone far more clever than me can add to this for future users.
What did puzzle me for a while was why did neither the original Chines typed script, produced on a Chinese machine and the copy made by a Chinese designer, [which I forgot to say ran faultlessly on his site when he trial it] collapse when I ran it up on mine.
The only thing I can think of is that at the time of saving to flash disk it was fine, but during the loading to my English O/S it was converted to big 5 and from here it all went wrong.
Again, this is just a guess.
So, the answer is that as well as having Charset gb2312 stated in the Meta code, one also must have the text body typed in GB2312 as well, otherwise, chaos.
And don't try to to it at 3 AM after 6 weeks of hassle when the mind is fuzzy and convert EVERTHING on the page to GB2312 as i first did, then wonder why the bloody page is in Chinese OK but won't display [can I say the B... word?]
The HTML code doesn't like to be set in GB2312!
Once I worked that out it was, relatively, plain sailing.
So again, cheers to all.
Happy New Year!