Forum Moderators: open
I have a non-commercial .com site hosted in the UK. It uses XHTML 1.0 Transitional, with charset ISO-8859-1. I am proposing to make a duplicate of part of the site which should be readable to the majority of Chinese users and also Asian search engines. The pages contain mainly images and very little text, so translation is not an issue. The new pages will be written in Mandarin using Simplified Characters.
My question is, what encoding is best? And how do I implement it?
The &#****x; codes are the easiest because I can create them with Word 2000. They seem to render correctly in all the browsers I have tried no matter what encoding I set on the page. However I am based in the UK, so I don't know if this would be the case in Asia.
If this is the way to go, what do I set my encoding to?
The obvious alternative is BG2312. But if this is the way to go, is there an easy method of conversion from characters produced on Word 2000? Do I also have to save the pages in any special way? (They are all .php)
If BG2312 is the way to go, what do I declare in my pages? I have tried various combinations, but nothing seems to automatically set my browser to the correct encoding. My current header is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"../DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
I realise I will have to change the charset, but I am unclear about the "en". In the HTML statement, do they both get replaced by zh?
Also is it necessary to include an XML declaration, such as:
<?xml version="1.0" encoding="gb2312"?>
Thanks in advance
Harry
Thanks for the reply. This is the first time I have ever had to become involved in encoding, so I am still lost.
I understand that &#****x; codes are ASCII codes. They render OK if I have my charset=iso-8859-1. They also look OK if I have charset=bg2312, but mine is all Western software and I don't know if these fonts are normally available on Chinese PCs. As I am only just starting on writing the Chinese pages, I want to make sure I get all encoding issues sorted out before I get to far.
From what you say they should render OK in Japanese PCs. As a side issue can Japanese speakers read Chinese simplified characters as Kanji, or only the traditional Chinese versions? In fact do many people read Kanji these days?
Harry
My Chinese site isn't XHTML yet, but here's what I've used:
<html lang="[b]zh[/b]">
<head>
<meta http-equiv="content-type" content="text/html;charset=[b]gb2312[/b]">
<meta http-equiv="content-language" content="[b]zh[/b]"> Your current headed declares the page English with Western European encoding. If you're going to use Chinese text on the page you're going to have to change that. Maybe one of our Chinese members could tell us the proper way to do this for XHTML.
Lots of questions, but I will do my best.
I understand that &#****x; codes are ASCII codes. They render OK if I have my charset=iso-8859-1. They also look OK if I have charset=bg2312, but mine is all Western software and I don't know if these fonts are normally available on Chinese PCs. As I am only just starting on writing the Chinese pages, I want to make sure I get all encoding issues sorted out before I get to far.
From what you say they should render OK in Japanese PCs.
As a side issue can Japanese speakers read Chinese simplified characters as Kanji, or only the traditional Chinese versions?
In fact do many people read Kanji these days?
Looking at the source of this page [linguaitaliana.com] can help you understand how these codes would work in a web page made with a simple ASCII editor.
The only question remaining is should I go with GB2312 encoding or HTML numeric? From just looking at what's out there, it seems GB2312 would be the way to go, but I would welcome any comments before I start the hard stuff - getting to grips with all the pinyin I learnt years ago, and which is now very, very rusty.
Incidentally, does anyone know of a good on-line Chinese glossary or source of internet terms?
Harry
The only question remaining is should I go with GB2312 encoding or HTML numeric?
The good thing about using the numeric codes is that it is easier to maintain the code in an ASCII editor. If you happen to save a GB2312 encoded file in the wrong format, some automatic conversion could cause problems you can only see if you can read Chinese.
OTOH, some special browsers (e.g. those embedded in a mobile phone or specially made for blind people) could have problems with something unusual as text in these numeric codes. Same potential problem for niche Search Engines. The numeric codes also use more bandwith, but that's no problem in your case since you wrote "the pages contain mainly images and very little text".
<?xml version="1.0" encoding="GB2312"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh">
<head>
<meta http-equiv="Content-Language" content="zh">
<meta http-equiv="Content-Type" content="text/html; charset=GB2312" />
However, this may not be 100% foolproof. I still have one problem which I have raised on the Apache forum, but so far no reply.
I created a test GB2312 page and placed it on my live server. All browsers automatically switch encoding correctly between this and my other pages.
However only Opera switches encoding automatically when accessing similar pages on my local host test server. IE and Mozilla both remain set to Western European, although the pages render OK if I set the encoding manually. I am no expert on Apache and as I installed the local host server myself, I suspect the problem is something missing in the config file.
As I use the local host server to develop the pages, this is a bit of a pain. :(
Harry
HTTP/1.1 200 OK
Date: Sat, 21 Feb 2004 13:32:48 GMT
Server: Apache/1.3.26 (Unix) PHP/4.3.4 mod_perl/1.27 mod_ssl/2.8.10 OpenSSL/0.9.6a
X-Powered-By: PHP/4.3.4
Connection: close
Content-Type: text/html
Unfortunately I can't use the utility on the identical page served by my localhost:8080/. Perhaps it's possible, but I don't know how.
I have added some more details at the Apache forum where I also raised the problem. I suspect it is an Apache problem.
[webmasterworld.com...]
Thanks for your time, takagi. If you want to look at the test pages, I could sticky you the url.
Harry