Forum Moderators: open

Message Too Old, No Replies

Need character entity for Trademark symbol

#153 deprecated and &174; is not the right one

         

Lorel

10:52 pm on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi

I found the list on W3.org of character entities and it lists Trademark as &174; but that produces an R with a circle around it.

I'm looking for a small TM in superscript.

Is the trademark symbol longer used and replaced with the (R) symbol?

thanks,
Lorel

DrDoc

10:55 pm on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



™ = ™
® = ®
© = ©

twist

11:29 pm on Feb 17, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ASCII Character number 169 is and has been the copyright symbol as far as I know.

© = ©
™ = ™
® = ®

Do a search on google for HTML ASCII, there are plenty of sites with tables.

#153 deprecated and &174; is not the right one

I am not sure why the w3c would choose to depricate characters in the ascii table. Could you please give a link showing where it talks about this, thanks.

ergophobe

12:49 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It is not deprecated - it was never a valid value in HTML.

ISO-8859-1 reserves values 127-159 for control characters. These values are undefined and always have been for

- ASCII (which only goes to 127; there is no such thing as ASCII character 153, 169 or 174. ASCII is a seven-bit system so that programs can use the other bit to check for parity and so forth).

- ISO-8859-1 (which reserves these values for non-displaying control characters).

- Unicode (which overlaps with ISO-8858-1 for 0-255)

These values are defined for Windows-1252 and will cause you problems in almost any other character set and should never be used in HTML. You should use the Unicode value, that is ™ in decimal notation.

Tom

choster

3:53 am on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you prefer numerical or Unicode entities, try

trademark = ™ = ™ (or ™ )
registered trademark = ® = ®
service mark = &#8480 = ℠
copyright = © = ©
sound recording copyright = ℗ = ℗

thehittmann

7:03 am on Feb 18, 2004 (gmt 0)

10+ Year Member



™ is the one that i get from dmx 2004 :-)

my $0.02

Lorel

2:28 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

if I understand correctly I should be using
charset=iso-8859-1

and always use Unicode entities like as Choster posted above.

thanks for the help folks,

Lorel

ergophobe

4:35 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




if I understand correctly I should be using
charset=iso-8859-1

Maybe, maybe not. You should use the correct character set. Support for ISO-8859-1 is probably the second most widespread (after ASCII), but if you want to use true typographic markup (em dashes and such), you would probably want to use UTF-8.


and always use Unicode entities like as Choster posted above.

More or less. As I said in the original post, if you are going to serve up pages as iso-8859-*, you should always use Unicode entities for code points that have numbers 128-159 decimal in the Windows-1252 character set. However, if your underlying text from your word processor or whatever is Unicode text and you serve pages up as UTF-8, these "same" characters should render just fine ("same" meaning they look pretty much like one another, though if you look at them with a hex editor, they will not be the same).

Basically it depends on how many such characters you are using. Examples are curly quotes (single and double), the florin sign, en and em dashes.

* Far and away the best discussion of the problem in general and particularly as it relates to HTML presentation is offered on Jukka Korpela's page On the use of some MS Windows characters in HTML [cs.tut.fi]. He has a complete chart of problem values.

* See also the Windows-1252 code point table near the bottom of the Wikipedia article on ISO-8859-1 [en.wikipedia.org] with problem characters highlighted.

* And Chris Wendt's comments [lists.w3.org] from way back in 1998.

* The quick converter from codeside is an easy way to convert Windows-1252 to numeric entities [code.cside.com].

Tom

twist

5:21 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So let me get this straight,

If I set my character type to charset=iso-8859-1, and want to produce the copyright symbol these websites with charts suggest the following,

http://slackerhtml.tripod.com/html/ascii.html suggests using ©

http://www.ascii.cl/htmlcodes.htm states this at the top of it's page,

Standard ASCII set, HTML Entity names, ISO 10646, ISO 8879, ISO 8859-1 Latin alphabet No. 1
Browser support: All browsers

and says to use © or ©

[w3schools.com...] says to just use ©

There are many other websites that all say use either © or ©

So whats the deal, is everybody just doing it wrong?

ergophobe

5:36 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



None of them are wrong. All are valid (except, as mentioned, numbers 128-159), though support does vary. In any case, it's becoming less and less of an issue. I believe that in order of degree of support for characters not given named entities prior to html 4, it would be

decimal numeric entities
named entities
hexadecimal numeric entities

Modern browsers support all of them pretty well. However, modern browsers also support UTF-8 just fine, so the whole thing is becoming less and less important. Looking way ahead in internet time, in ten years you could probably get rid of most entities except < > & etc which must be retained because of their special meaning in markup.

Tom

Lorel

6:23 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Ergophobe

I have just one more question.
you said:
>As I said in the original post, if you are going to serve up pages as iso-8859-*, you should always use Unicode entities for code points that have numbers 128-159 decimal in the Windows-1252 character set.

Can you tell me what is meant by Window-1252 set?

Is this how Windows views the characters?

I have a Mac, BTW. Both unicode and Ascii work fine on my computer, IE 5.1.

Thanks for your help
Lorel

ergophobe

6:48 pm on Feb 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Mmmm where to start.

The brief answer: it's just a character set like any other. Certain hex numbers have certain values, and those are not necessarily the same for a given character encoding. There is no such thing as "plain text" as all text has to be encoded into numbers and those numbers depend on the character set you are using.

I don't know about Macs, but there is no reason that you couldn't have Windows-1252 installed on your Mac. Perhaps, for example, if you use older Microsoft products, they may depend on that character set and may have installed it (newer MS products use Unicode).

For Western European languages there are four main encodings you might find being used on a Windows machine:

- ASCII (no accents or fancy punctuation, so English only - think a standard US typewriter)

- ISO-8859-* family (accents for most Euro languages and some other characters, but not large character sets - only 255 characters each - so each region needs its own encoding. Western Europe uses 8859-1, aka Latin-1)

- Unicode, which uses a variety of encodings (UTF-8, UTF-16LE, UTF-16BE) but is a standard that covers all European characters and several other languages and a lot of fancy punctuation (64K characters).

- Windows-1252 which takes unused code points in ISO-8859-1 and assigns values for commonly used characters, such as em dashes and curly quotes.

These determine the actually underlying *number* used to represent a character. Unicode values overlap perfectly with iso-8859-1 for the first 255 characters and iso-8859-1 overlaps with ascii for the first 128 characters. However, values 128-159 in Windows-1252 do not overlap with any other character set that I know of and those values are reserved for control characters in iso-8859-1. That means the display value for those numbers is undefined in any character set other than Windows-1252. So it is up to the OS or user agent to decide what to do with a code like ™ The display value is only defined if the page is being served up (using an xml tag or a meta charset tag) as Windows-1252.

If, when you put a ™ in a page, it renders as you expect on your Mac, that means you have the Windows-1252 character set on your machine [edit: see correction below]. However, you can't count on others having that character set or, if they do, you can't count on them interpreting the code as you wish. Indeed, if the page is iso-8859-1 or utf-8, those numbers should NOT be displayed as anything other than a box or a question mark. One could consider it a browser bug if the browser actually displayed anything in that case.

[correction: older Macs have a charset known as MacRoman which differs from both ISO-8859-1 and Windows-1252 in the 80-9F (128-158) range, but I don't know what the overlap is. Some code points may render the same in Windows-1252 and MacRoman - I don't know]

Whew! Huge post and I'm not even sure I answered your question...

Tom