Forum Moderators: open

Message Too Old, No Replies

HTML entities

Which ones are 'safe' with ISO 8859-1?

         

directrix

1:59 am on Sep 27, 2005 (gmt 0)

10+ Year Member



I have a site that specifies encoding ISO 8859-1. Currently, I use only characters less than or equal to 127, plus the ISO 8859-1 symbol entities and character entities in the range 160-255.

I'd like to use some of the other HTML entities, such as the double quotation marks (“ and ”) and the ellipsis (…), but I'm concerned that they may not be supported by older browsers. (Or maybe not even by recent browsers outside the US and Western Europe?)

I've tested the characters in IE5, IE6, Firefox 1.0.7, and Opera 8.5, and all looks OK. Does anyone know whether older browsers support these three characters? What about users in, say, India?

[edited by: BlobFisk at 3:04 pm (utc) on Sep. 27, 2005]
[edit reason] Fixed Smilies [/edit]

us60

9:53 am on Sep 27, 2005 (gmt 0)



Good on my Netscape Communicator 4.7 test case browser.

I tossed your paragraph into my localhost web site and brought the browser to it.

I'd like to use some of the other HTML entities, such as the double quotation marks (“ and ”) and the ellipsis (…), but I'm concerned that they may not be supported by older browsers.
As seen on Netscape
Larry

moltar

10:31 am on Sep 27, 2005 (gmt 0)

encyclo

4:21 pm on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How old a browser do you need to test on, directrix? I have converted quite a few sites to UTF-8 rather than the legacy ISO-8859-1 - that will allow you to use the whole gamut of extended characters directly without having to use character entities at all, but support is limited to more modern browsers (IE 5+, Mozilla and Netscape 6+ - not NN4).

directrix

8:58 pm on Sep 27, 2005 (gmt 0)

10+ Year Member



Thanks everyone.

encyclo, I still have several IE4 and NN4 visitors, so I'm wary of moving to UTF-8. I also have lots of visitors from India, so I want to be sure that any change I make does not cause their browsers to fall back on some local default charset.

Having said that, ISO 8859-1 is so restrictive (thanks, moltar) that I may convert to UTF-8 if IE4/NN4 numbers fall. Could you point me to (possibly via sticky) a document that explains how to make the conversion.

moltar

9:48 pm on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want to go that far back into history, then you might consider following HTML 3.2 specifications. Character Entities for ISO Latin-1 [w3.org].

encyclo

1:21 am on Sep 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you have any significant audience at all using IE4 or NN4, then I can't recommend UTF-8. Both browsers claim to support it, but both suffer from strange bugs. However, IE5+ and the Mozilla-based browsers (including Firefox, the Mozilla suite and Netscape 6+) all support UTF-8 perfectly (as does Konqueror/Safari).

For conversion to UTF-8, I use iconv [gnu.org] on Linux, automated via a batch process. My OS (Ubuntu Linux) is fully UTF-8 enabled by default, whereas Windows uses its own encoding (usually windows-1252) which by the way suffers from compatibility problems with ISO-8859-1. On the Windows side, I believe there are several tools, and most text editors (but not Notepad) can save files as UTF-8, as can Dreamweaver.

However, if you want to stick to numerical entities, IE4+ and NN4+ support the full list of HTML 4.0 character entities, whereas IE3/NN3 only support the HTML 3.2 entities. As for the numerical entities you mention, I believe (not tested) that you will need a browser which can handle UTF-8 correctly - in which case, you might as well go for real UTF-8 and use the real character rather than the entity.

One issue with UTF-8 is the installed fonts. If you are dealing with just English, then there is no difference as you are not going to be using many characters outside the US-ASCII subset, but if you are using languages such as Japanese or Korean or some of the Indian languages, then the end user has to have the appropriate fonts to display all the characters.