Forum Moderators: open

Message Too Old, No Replies

<meta http.....charset=us-ascii">

Is this the best charset to use?

         

Googly

3:37 pm on Nov 27, 2002 (gmt 0)



I have recently been trying to improve my code so as to conform to w3 standards. Using the validators I have found that it is now the preferred choice to use
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

My website is in English, but aimed at a world audience. So do you think this is the best charset to use?

I notice that w3.org and other validator sites use it, but a lot of the major news websites do not. Any thoughts?

Googly

moonbiter

4:40 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Personally, I favor
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
[w3.org] since it is the recommended encoding [ietf.org] for internet documents by the IETF. However, if you are concerned about full browser support, you might want to stick with ISO-8859-1. More information on this subject can be found at the W3Cs Web Internationalization Resources [w3.org] page, including the draft W3C Character Model [w3.org].

An individual at the University of Waikato [webteam.waikato.ac.nz] has also done some research into this subject.

Googly

5:01 pm on Nov 27, 2002 (gmt 0)



So I should ignore the validator's advice?

moonbiter

5:04 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Where/when is the validator giving you this advice?

The reason I ask this is that as I understand it, that encoding would be a 7-bit ascii character set, which is pretty restrictive in an internationalization sense.

I just found what appears to be another good tutorial on the subject at Webreference.com [webreference.com].

Oh, and here's another interesting internationalization site i18nGurus.com [i18ngurus.com]. Google-fu! Hai ya! ;)

In short, if you are writing strictly in ASCII english with no special characters, I don't think it matters too much what encoding you use, as long as you use one. However, I'd go for the defaults for the languages in question, which was ISO-8859-1 for HTML up to 3.2, and Unicode for HTML 4 / XHTML 1+. But don't take my word for it. Do your research. I'm no expert by any means, and I could be wrong. ;)

<edited>In fact, I am wrong on one point. HTML 4.01 explicitly states that there is no default character encoding [w3.org]. However, in both XHTML 1.0 [w3.org] and XHTML 1.1 (Module-Based XHTML) [w3.org] it is mentioned that the default character encoding is UTF-8 or UTF-16.</edited>

g1smd

12:35 am on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If using HTML 4.01, then I would probably use: <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> as I believe there is more support for this (than stuff like Windows-1252 encoding) on non-Windows machines. I think that the US-ASCII option is also too limiting.

For XHTML and XML pages then <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> is more in order. Maybe also use this one for HTML 4.01 pages, but I think this can sometimes cause problems in Netscape 4, and may not work in earlier versions at all.

Don't forget to also include a !DOCTYPE declaration as the very first line of the file.

Googly

9:20 am on Nov 28, 2002 (gmt 0)



Well Moonbiter I'm using CSE HTML Validator and in my file I'm using
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

but the validator says:
"Unless another character encoding is required, it is encouraged that the character set "us-ascii" be used and the name "us-ascii" be used to specify this character set. This is the most commonly used character set on the Internet."

g1smd

9:32 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That must be a quirk of the CSE validator, as I have never seen that as a recommendation elsewhere. I see no mention of this anywhere on W3C for example. Indeed most Netscape browsers use the ISO-8859-1 character set as the default.

pageoneresults

12:07 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Languages, countries, and the charsets typically used for them...

W3C Internationalization [w3.org]

I've been reading a whole bunch since Googly started this thread.

Not understanding charsets 48 hours ago to their fullest, I did a global find and replace on one site and switched everything to UTF-8, what a mistake that was. Had to call my host and restore a backup from the evening before. It was originally a windows-1252 charset which I typically use on all of my pages. All those little characters I've been using like ¦ (pipe) and · (small bullet) turned into question marks along with a few others.

I'm sticking with my windows-1252 for now as it appears to be working everywhere as far as I know. I was about ready to do a global find and replace and change to iso-8601 but I'm not too sure I want to do that yet! ;)

pageoneresults

12:15 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here are the charsets as shown in the W3C Select Encoding dropdown menu...

utf-8 (Unicode, worldwide)
utf-16 (Unicode, worldwide)
iso-8859-1 (Western Europe)
iso-8859-2 (Central Europe)
iso-8859-3 (Southern Europe)
iso-8859-4 (Baltic Rim)
iso-8859-5 (Cyrillic)
iso-8859-6-i (Arabic)
iso-8859-7 (Greek)
iso-8859-8-i (Hebrew)
iso-8859-9 (Turkish)
iso-8859-10 (Latin 6)
iso-8859-13 (Latin 7)
iso-8859-14 (Celtic)
iso-8859-15 (Latin 9)
us-ascii (basic English)
euc-jp (Japanese, Unix)
shift_jis (Japanese, Win/Mac)
iso-2022-jp (Japanese, email)
euc-kr (Korean)
gb2312 (Chinese, simplified)
gb18030 (Chinese, simplified)
big5 (Chinese, traditional)
tis-620 (Thai)
koi8-r (Russian)
koi8-u (Ukrainian)
macintosh (MacRoman)
windows-1250 (Central Europe)
windows-1251 (Cyrillic)
windows-1252 (Western Europe)
windows-1253 (Greek)
windows-1254 (Turkish)
windows-1255 (Hebrew)
windows-1256 (Arabic)
windows-1257 (Baltic Rim)

The us-ascii is for basic English so I might stay away from that one unless you are appealing to strictly an English speaking audience in a basic sort of way. ;)

Googly

9:23 am on Nov 29, 2002 (gmt 0)



Thanks for your help guys!