Forum Moderators: open

Message Too Old, No Replies

Inserting UTF-8 characters into HTM files

characters as is? as ampersand, or decimal representations?

         

Alina

10:51 am on Apr 14, 2009 (gmt 0)

10+ Year Member



Please can you help? We need to use the charset UTF-8 in our htm pages. If we are using UTF-8 how should the actual characters such as "kliknite" and "zpusobem" be represented our *.htm pages? Should they:

(a) be displayed as actual characters i.e "klikněte" and "způsobem"?
(b) be displayed as &text characters wherever possible such as for example: é?
(c) be displayed as decimal such as for example: klikněte způsobem.
(d) be displayed as combinations of the above?

What is the correct way of encoding these characters?

PS: I typed in "kliknite" and zpusobem" as czech but the WWW posting system appears to have displayed them differently to the way I input - not sure why. Point (c) above is the correct decimal representation.

encyclo

11:41 pm on Apr 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you are using UTF-8 then option (a) above is by far the best choice - in fact it is for this reason (being able to enter the characters directly without awkward workarounds) that UTF-8 is so useful and versatile.

The only exceptions are that you should use

&
for the ampersand, and
<
and
>
for < and > in code samples. Personally, I prefer using an entity reference when I need a non-breaking space:
&nbsp;
.

(Note that WebmasterWorld does not declare as UTF-8, which is why some UTF-8 characters will not display here)

swa66

12:33 am on Apr 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally I edit it all on a unix host that's not really setup for I18N, so I stick to (b) all the way. Never goes wrong, never any issue as long as I remember the string (to be honest: I'd have no idea for some of the typically czech characters, but that's why you have a browser.)

Alina

9:28 am on Apr 15, 2009 (gmt 0)

10+ Year Member



After reading your posts, I think that options (a) and (b) together will suffice.

PS: I read your previous post encyclo "Character encoding, entity references and UTF-8 (A short introduction)" - but was still not sure so I thought I would ask.

Thank you both v.much for your help.