Forum Moderators: Robert Charlton & goodroi
weather ❄ hard winter predicted
Do you think UTF-8 is better than 8859-1, increasing chances of showing symbols properly by the browsers of Google-user?
Arial Unicode
Code2000 (third-party)
DejaVu Sans (also third-party)
Menlo
Quivira (third-party)
unifont
Zapf Dingbats (this sounds like a "Duh!" but isn't, because this is a new unicode font, entirely separate from the dingbats legacy font)
Are dingbats-symbols like the 10052-snowflake "non-standard" and the other (more usual) ones (Wikipedia link) "standard" in your eyes?
It should make absolutely no difference. In spite of the term "charset", the encoding of a page has no effect on the characters it is able to display.
Content is composed of a sequence of characters. Characters represent letters of the alphabet, punctuation, etc. But content is stored in a computer as a sequence of bytes, which are numeric values. Sometimes more than one byte is used to represent a single character. Like codes used in espionage, the way that the sequence of bytes is converted to characters depends on what key was used to encode the text. In this context, that key is called a character encoding.
--
An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings.
A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages. Its use also eliminates the need for server-side logic to individually determine the character encoding for each page served or each incoming form submission. This significantly reduces the complexity of dealing with a multilingual site or application.
--
Why does the browser still not recognize the encoding?
Let's say, for example, that you saved your data as UTF-8. Although you saved your data in the right encoding, and even if you declared in the page that the page encoding is UTF-8, your server may still be serving the page with an accompanying HTTP header that says it is something else.
http://www.w3.org/International/questions/qa-choosing-encodings
Example of this principle at work: say you have θ in your HTML, but the output is in Latin-1 (which, understandably, does not understand Greek), the following process will occur (assuming you've set the encoding correctly using %Core.Encoding):
The Encoder will transform the text from ISO 8859-1 to UTF-8 (note that theta is preserved here since it doesn't actually use any non-ASCII characters): θ
The EntityParser will transform all named and numeric character entities to their corresponding raw UTF-8 equivalents: θ
HTML Purifier processes the code: θ
The Encoder now transforms the text back from UTF-8 to ISO 8859-1. Since Greek is not supported by ISO 8859-1, it will be either ignored or replaced with a question mark: ?
http://htmlpurifier.org/docs/enduser-utf8.html#whyutf8
Backtrack. Is there any evidence that google knows what a dingbat is?
Since Greek is not supported by ISO 8859-1, it will be either ignored or replaced with a question mark: ?
Why does the browser still not recognize the encoding?
Let's say, for example, that you saved your data as UTF-8. Although you saved your data in the right encoding, and even if you declared in the page that the page encoding is UTF-8, your server may still be serving the page with an accompanying HTTP header that says it is something else.
they don't fit there in our eyes