Forum Moderators: open

Message Too Old, No Replies

Any hope of a better HTML?

         

troels nybo nielsen

9:14 am on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I write my websites in a language with an alphabet of 29 letters, but only 26 of them are represented in html. The remaining 3 letters may be created artificially with codes, but that is an inferior solution to a problem that simply should not be there. And there are other languages that are served even worse.

I have read that the problem derives from the fact that html originally was invented by an English speaking person.

That fact does not explain why something better was not created when other languages began to use html. What are the reasons? Technical limitations? Political decisions?

And is there any hope of a better, richer and more usefull html?

DrDoc

9:21 am on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's not true ;)
You can use any characters you want, provided the document is served with a proper character set definition.

tombola

9:23 am on Jan 19, 2004 (gmt 0)

10+ Year Member



I can't answer your questions, but when you use the UTF-8 character set, it will display all characters properly.

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

troels nybo nielsen

9:25 am on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds interesting. Could you be more specific while still trying to use terms understandable to an ignoramus like me?

tombola

9:34 am on Jan 19, 2004 (gmt 0)

10+ Year Member



Here you can find a good insight about Unicode:

[tbray.org...]

troels nybo nielsen

10:02 am on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for your help.

Sorry if I sounded rude and arrogant and indeed came close to accusing somebody of something. And sorry for making a rash statement based on ignorance.

Funny. Usually it's the other way around: Other people complaining about something missing. Me telling them that indeed it's there and the only problem is their own unability to see it.

I have some reading to do. I'll be back with more questions if there is still something that I do not understand.

DrDoc

4:18 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Basically, for your needs, Troels, all you need to worry about is using the ISO-8859-1 charset (also known as "Latin-1" or "Western European"). It offers support for a ton of characters except A-Z. In fact, it offers support for all characters that have an HTML Entity equivalent (such as ßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ as well as ×÷¿¾½¼»º¹¸·¶µ³²±°¯®¬«ª©¨§¦¥¤£¢¡)

TryAgain

9:35 pm on Jan 19, 2004 (gmt 0)

10+ Year Member



DrDoc,

If I'm not mistaken, ISO-8859-1 is a subset of UTF-8, so is there a reason why not to use UTF-8 instead?

(This is an honest question, nothing personal, just in case, people seem to have such long toes these days.)

g1smd

10:38 pm on Jan 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I suppose that ISO-8859-1 could be regarded as a sort of a subset, only because UTF-8 includes just about everything.

Don't forget that there is also ISO-8859-2, ISO-8859-3, ISO-8859-4 right through to ISO-8859-12 for other parts of the planet outside of the Western World.

tombola

8:59 am on Jan 20, 2004 (gmt 0)

10+ Year Member



It's my experience that when you have pages in several languages that use non-Latin characters, it's much easier to use UTF-8 encoding for ALL pages, instead of choosing an appropriate character set for each language.

DrDoc

3:41 pm on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If I'm not mistaken, ISO-8859-1 is a subset of UTF-8, so is there a reason why not to use UTF-8 instead?

For Latin-1, no... For other languages? Maybe...
Use UTF if you are positive you are actually using the true unicode characters. Otherwise, use the specific subset.

For example, a lot of Russian Web sites would look something like this when viewed as UTF-8:

Íàøè óñëóãè

But try viewing that using the windows-1251 charset :)

tombola

8:56 pm on Jan 20, 2004 (gmt 0)

10+ Year Member



For example, a lot of Russian Web sites would look something like this when viewed as UTF-8:

Íàøè óñëóãè

DrDoc, the only reason why you see characters like these (Íàøè óñëóãè) is that - apparently - you have not installed the Cyrilic character set (to display Russian language characters) on your system.

No matter which Russian character set these Russian Webmasters use (KOI8-R, ISO-8859-5, UTF-8), if you have installed the corresponding character set of that language, you will see the page correctly.

I recommend to use UTF-8 to everybody who has pages in many different languages, just to make things easy, otherwise you have to find out which character set is the most appropriate for every language...

We have pages in more than sixty (60) languages, and all pages display correctly thanks to UTF-8 encoding :-)

DrDoc

9:28 pm on Jan 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I have the Cyrillic charset installed... It's just been created in an editor defaulting to ISO-8859-1, but is telling the browser to display it in Cyrillic.