Forum Moderators: open

Message Too Old, No Replies

Converting from ISO-8859-1 to UTF-8

How to directly enter HTML entity reference characters?

         

directrix

11:23 am on Mar 12, 2008 (gmt 0)

10+ Year Member



I'm in the process of converting my website from ISO-8859-1 to UTF-8. I've read encyclo's excellent reference post [webmasterworld.com]. I see the ability to enter various characters directly (rather than using HTML entity references) as a big plus, but I'm puzzled as to how to actually enter them. I'm using UltraEdit on a Windows machine. How would I enter, for instance, a non-breaking space ( ) and a minus sign (−)?

[edited by: tedster at 5:05 am (utc) on Mar. 15, 2008]
[edit reason] turn off graphic smiles [/edit]

davidpbrown

12:25 pm on Mar 12, 2008 (gmt 0)

10+ Year Member



As I understand it ISO-8859-1 is a direct subset of UTF-8, in a similar way that ASCII is within ISO-8859-1. That is you create the character in the same way, just save the file as UTF-8.

The only complication I've encountered is the Byte Order Mark BOM which some editors will add, that then spoils output for the web. Check the editor you use saves without BOM and you shouldn't have a problem.

encyclo

4:54 pm on Mar 12, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I understand it ISO-8859-1 is a direct subset of UTF-8

Unfortunately, that's not the case - US-ASCII is a subset of UTF-8, but non-ASCII characters in ISO-8859-1 are encoded differently than in UTF-8.

I can't give specific instructions in terms of composing special characters in UltraEdit, as I don't use either that editor or indeed Windows at all. However, you should check out the table of characters available in the Accessories / System Tools directory off the Windows Start button. There's always going to be difficulties entering characters which are not represented on your keyboard.

davidpbrown

5:04 pm on Mar 12, 2008 (gmt 0)

10+ Year Member



or might [save-as UTF-8] do the translation without corruption of those characters?

penders

1:04 am on Mar 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How would I enter, for instance, a non-breaking space ( ) and a minus sign (−)?

As encyclo suggests, check out Windows "Character Map". The   (numeric character reference  ) can be typed on a Windows machine by hitting Alt+0160 (as indicated in Windows 'Character Map'). However, the − (numeric character reference −) does not seem to have a direct keyboard shortcut? You can copy and paste it from the Character Map. If you need to type it a lot then perhaps save it as a macro (if your editor supports it)?

[edited by: tedster at 5:04 am (utc) on Mar. 15, 2008]
[edit reason] turn off graphic smiles [/edit]

directrix

6:11 pm on Mar 15, 2008 (gmt 0)

10+ Year Member



Thanks, everyone. I've managed to enter characters in UltraEdit directly (e.g. Alt+0160 for the non-breaking space) and using copy and paste. UltraEdit seems to have pretty good support for UTF-8 and Unicode.

I wonder what proportion of visitors will see what I intend when using the true characters in UTF-8 versus HTML entities in ISO-8859-1? Is it likely to be more? Or less? What factors affect whether the visitor will see what I intend? Operating system? Font? Browser?