Forum Moderators: open

Message Too Old, No Replies

international characters

how necessary is the & --- ; version?

         

jcmoon

7:35 pm on Aug 9, 2005 (gmt 0)

10+ Year Member



I'm doing some sites that are in international, yet latin-based, character sets. For example, the German Ä (= & Auml ;), ß (= & szlig ;), ü (= & uuml ;), etc.

What I want to know is, for the paragraphs, words, etc do I have to make it

[2][blue]Garantie f[/blue][red]ü[/red][blue]r den st[/blue][red]ö[/red][blue]rungsfreien[/blue][/2]

or can I get away with
[2][blue] Garantie für den störungsfreien[/blue][/2]

?

Obviously, the latter version will save me much time. When I look at the source of various international sites, I see it both ways - with the specially-encoded characters, and with just the characters themselves.

When you do international stuff, what do you choose?

encyclo

11:18 pm on Aug 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to be 110% sure, then use the character entities: however, I rarely if ever use them as the overwhelming majority of targetted users will have no problem with you using the accented characters.

You mention German, which like French (my sites are in French) is a western European language. The key to success is to carefully and explicitly declare your charset. You have two options: the first is to use the legacy encoding ISO-8859-1. This is what I use most often. Note that there is some inconsistency between ISO-8859-1 and windows-1252, which is a Microsoft-created variant and which Windows usually (silently) uses in replacement to ISO-8859-1. As long as you are validating your pages, you should have no real difficulty.

ISO-8859-1 is more widely supported than the alternative, which is UTF-8. UTF-8 is a better choice if you have documents in a wide variety of languages, as it contains characters for almost all living languages from English to Japanese. Most editors can save is UTF-8, and most modern browsers can render it correctly. You may experience problems with older browsers such as Netscape 4.

tedster

12:14 am on Aug 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I like to be 110% sure when Ican (it's the way I am). I use Homesite for authoring and it has an automatic one-click conversion that works either way. If none of your applications will do this, there are inexpensive to free utilities available that will convert extended characters to html entities.

encyclo

1:57 pm on Aug 10, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might get problems if a user is attempting to view your page in a DOS-based browser, if your charset is defined incorrectly, or if you have a visit from someone using a home-grown browser in Inner Mongolia which does not support ISO-9959-1. You should also use character entities if you are using HTML versions prior to HTML 4.0. You can get problems when authoring content in Windows when certain characters (for example the apostrophe) do not validate when you copy/paste from Windows applications using windows-1252. I have also experienced problems with RSS feeds - certain character entities are only valid in Netscape RSS v.0.91, not in later versions - another reason to use the true characters!

A lot of programs such as Homesite which produce static pages automatically convert accented characters into the equivalent entities, but in most cases this is an unneccessary step. If you are using a content management system or automated page generation packages, then just use the standard characters.

The future is undoubtedly UTF-8. I am still mostly sticking to ISO-8859-1 as I have a lot of legacy content which would need to be converted, but with UTF-8 you will never need a character entity again. All major modern (version 5.x et higher) browsers support it, and all the major search engines (Google, Yahoo, MSN) use UTF-8 by default.

jcmoon

7:48 pm on Aug 10, 2005 (gmt 0)

10+ Year Member



Thanks for the advice. I'm actually not using a CMS, nor am I using Homesite. I'm using more of an all-purpose editor called TextPad.

The pages do mention charset ISO-8859-1 so for now, things will do fine. I do like the idea that UTF-8 (the most universal) is the future ... but on what timeframe? What's the adoption rate? Or are we waiting for the W3 to rubber-stamp it?