Forum Moderators: open
My website is in English, but aimed at a world audience. So do you think this is the best charset to use?
I notice that w3.org and other validator sites use it, but a lot of the major news websites do not. Any thoughts?
Googly
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> [w3.org] since it is the recommended encoding [ietf.org] for internet documents by the IETF. However, if you are concerned about full browser support, you might want to stick with ISO-8859-1. More information on this subject can be found at the W3Cs Web Internationalization Resources [w3.org] page, including the draft W3C Character Model [w3.org]. An individual at the University of Waikato [webteam.waikato.ac.nz] has also done some research into this subject.
The reason I ask this is that as I understand it, that encoding would be a 7-bit ascii character set, which is pretty restrictive in an internationalization sense.
I just found what appears to be another good tutorial on the subject at Webreference.com [webreference.com].
Oh, and here's another interesting internationalization site i18nGurus.com [i18ngurus.com]. Google-fu! Hai ya! ;)
In short, if you are writing strictly in ASCII english with no special characters, I don't think it matters too much what encoding you use, as long as you use one. However, I'd go for the defaults for the languages in question, which was ISO-8859-1 for HTML up to 3.2, and Unicode for HTML 4 / XHTML 1+. But don't take my word for it. Do your research. I'm no expert by any means, and I could be wrong. ;)
<edited>In fact, I am wrong on one point. HTML 4.01 explicitly states that there is no default character encoding [w3.org]. However, in both XHTML 1.0 [w3.org] and XHTML 1.1 (Module-Based XHTML) [w3.org] it is mentioned that the default character encoding is UTF-8 or UTF-16.</edited>
For XHTML and XML pages then <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> is more in order. Maybe also use this one for HTML 4.01 pages, but I think this can sometimes cause problems in Netscape 4, and may not work in earlier versions at all.
Don't forget to also include a !DOCTYPE declaration as the very first line of the file.
but the validator says:
"Unless another character encoding is required, it is encouraged that the character set "us-ascii" be used and the name "us-ascii" be used to specify this character set. This is the most commonly used character set on the Internet."
W3C Internationalization [w3.org]
I've been reading a whole bunch since Googly started this thread.
Not understanding charsets 48 hours ago to their fullest, I did a global find and replace on one site and switched everything to UTF-8, what a mistake that was. Had to call my host and restore a backup from the evening before. It was originally a windows-1252 charset which I typically use on all of my pages. All those little characters I've been using like ¦ (pipe) and · (small bullet) turned into question marks along with a few others.
I'm sticking with my windows-1252 for now as it appears to be working everywhere as far as I know. I was about ready to do a global find and replace and change to iso-8601 but I'm not too sure I want to do that yet! ;)
utf-8 (Unicode, worldwide)
utf-16 (Unicode, worldwide)
iso-8859-1 (Western Europe)
iso-8859-2 (Central Europe)
iso-8859-3 (Southern Europe)
iso-8859-4 (Baltic Rim)
iso-8859-5 (Cyrillic)
iso-8859-6-i (Arabic)
iso-8859-7 (Greek)
iso-8859-8-i (Hebrew)
iso-8859-9 (Turkish)
iso-8859-10 (Latin 6)
iso-8859-13 (Latin 7)
iso-8859-14 (Celtic)
iso-8859-15 (Latin 9)
us-ascii (basic English)
euc-jp (Japanese, Unix)
shift_jis (Japanese, Win/Mac)
iso-2022-jp (Japanese, email)
euc-kr (Korean)
gb2312 (Chinese, simplified)
gb18030 (Chinese, simplified)
big5 (Chinese, traditional)
tis-620 (Thai)
koi8-r (Russian)
koi8-u (Ukrainian)
macintosh (MacRoman)
windows-1250 (Central Europe)
windows-1251 (Cyrillic)
windows-1252 (Western Europe)
windows-1253 (Greek)
windows-1254 (Turkish)
windows-1255 (Hebrew)
windows-1256 (Arabic)
windows-1257 (Baltic Rim)
The us-ascii is for basic English so I might stay away from that one unless you are appealing to strictly an English speaking audience in a basic sort of way. ;)