lucy24 - 6:57 am on Jul 26, 2012 (gmt 0)
Yes, that definitely sounds like something that was originally created in Windows-Latin-1 being viewed in UTF-8. A number of common characters, including all curly quotes (single and double), the oe ligature and the long dash, are in a range of Windows-Latin-1 that UTF-8 doesn't use. Canonically you will get the Unicode Replacement Character which looks like a black diamond with question mark in the middle, but some applications will simply not display the character at all.
Don't be misled by the term "charset" though. All browsers can display all characters; the only difference is what they look like in the page source. The proper word is "file encoding". (There is an arcane historical reason for the term "charset" but I have long since forgotten it :()
Check the preferences for all text editors. Wherever there is an option for saving UTF-8 documents either with or without BOM, say without.
Finally: If you use a text editor to work on html, make sure options such as "smart quotes" are OFF. You don't want any fancy characters sneaking in unless you deliberately put them there.
The problem is that some character sets use "curly quotes" which look like a superscript comma, a dot with a tail.
Careful. Curly quotes are in addition to, not instead of, the "typewriter" characters that live down in the ASCII range. So you can use curly quotes for displayed text without breaking your internal HTML, which requires straight quotes (single or double).
If you are a coward you can use entities like “ and ’ or numeric equivalent ;) But it makes your code unreadable-- and adds several bytes to each character.
The apostrophe and the single closing quote are the same character. Most of the good stuff lives in the 2000-206F (hex) range.