Forum Moderators: open
1.) use this meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
2.) Get the client to add their text with a cms.
1.) use this meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Just to note, the document will need to be saved as UTF-8 encoding as well (without a BOM - signature) - so your editor will need to support this. And presumably if you are using a CMS to store content in a DB then the DB will need to support/save as UTF-8 ?
if it is going to be all ArmenianIf possible I would like some English characters on the same page (if thats not possible, those pages will be all Armenian).
the document will need to be saved as UTF-8 encoding as well (without a BOM - signature)I use DreamWeaver. A typical test page I just created has this in the css file:
if you are using a CMS to store content in a DB then the DB will need to support/save as UTF-8Our CMS edits html pages directly and does not use a DB.
I do not now DreamWeavers' handling of files. The easiest way to see whether your CMS does it right is to try.. Create a test file and open it in a browser. If the Armenian is displayed correctly, you are on the right track. In your view>Character Encoding verify that the option UTF8 is displayed: That tells you that the page was rendered using UTF8 definitions.
J
Why does it need to be saved in UTF-8? The source file I'm using is in ANSI and it displays in HTML with the headers in UTF-8 because I read some values from DB that are in UTF-8. In fact, if I encode the source file to UTF-8 and I use session_start or something that uses the headers, it sends some wierd characters before and fails. So all my source files are encoded in ANSI even though it displays UTF-8 charcaters in portuguese and so on.
Can anyone confirm which windows text editor can convert from ANSI to UTF-8?
There is a bewildering list here:
[alanwood.net...]
I can see that some say they handle this or that format - but I need to get my client to CONVERT from one format to another (ansi to utf8).
Reading other theads I see recommendations for worldpad and textpad. Can anyone confirm which editor will convert ansi Armenian to utf-8 Armenian?
So would Notepad++ allow him to past ANSI text into Notepad++ and then (still in Notepad++) convert to UTF-8?
Or is another text editor better for this?
Penders: "Just to note, the document will need to be saved as UTF-8 encoding as well (without a BOM - signature) - so your editor will need to support this. And presumably if you are using a CMS to store content in a DB then the DB will need to support/save as UTF-8 ? "mikewm: Why does it need to be saved in UTF-8? The source file I'm using is in ANSI and it displays in HTML with the headers in UTF-8 because I read some values from DB that are in UTF-8. In fact, if I encode the source file to UTF-8 and I use session_start or something that uses the headers, it sends some wierd characters before and fails. So all my source files are encoded in ANSI even though it displays UTF-8 charcaters in portuguese and so on.
Why does it need to be saved in UTF-8?
The document is ANSI but you are telling the browser to display it as UTF-8. Try typing the copyright symbol (ALt+0169 on Windows), save it as ANSI, tell the browser it's UTF-8 and it won't display correctly. Save it as ANSI, display it as ANSI - OK. Save it as UTF-8, display it as UTF-8 - OK. The copyright symbol does not share the same position in Unicode as it does in ANSI.
I read some values from DB that are in UTF-8.
A test: These two 'ANSI' characters "π" are in fact the single UTF-8 character for the 'Greek Small Letter Pi' (U+03C0). Change the character encoding in your browser to UTF-8 and you will see the UTF-8 character as intended. The other characters remain the same, yet they are ANSI (but share the same codes as UTF-8). This is how your webpage is coping.
if I encode the source file to UTF-8 and I use session_start or something that uses the headers, it sends some wierd characters before and fails.
Warning: session_start(): Cannot send session cache limiter - headers already sent...?
This sounds as if you are including the BOM (Byte Order Mark) when you save the file as UTF-8? This must be omitted. The BOM appears in the first 3 bytes of the file (although invisible to you in your text editor when viewed as UTF-8). And importantly before your "<?php ...". Unfortunately, as far as I'm aware, PHP does not understand the BOM. It will treat this as output (some weird characters) before the headers are sent and will consequently fail.
In Notepad++ this is Format > Encode in UTF-8 without BOM. In Notepad2 this is the other way round; you explicitly have to request 'with Signature' in order to get a BOM, simply picking UTF-8 does not include it.
See this recent thread for more info on the BOM (and removing it): [webmasterworld.com...]
So, you seem to be OK in your particular situation providing you keep to the basic set of characters in your HTML/PHP document or use numeric char refs. Certainly something to be aware of. What is best practise in this case? Have a mixture of character encodings or go 100% UTF-8?
kapow: A typical test page I just created has this in the css file:
@charset "utf-8";
Do external CSS files need to be UTF-8 encoded? Do they contain any content? Do they contain any non-ANSI chars?
So would Notepad++ allow him to past ANSI text into Notepad++ and then (still in Notepad++) convert to UTF-8?
Yes, I believe so (as mikewm mentions above), but be sure to pick the "Format > Convert to UTF-8 without BOM", not simply the "Encode to" option as this could lose data!
What is best practise in this case? Have a mixture of character encodings or go 100% UTF-8?
To quote encyclo, from this thread: ANSI, Unicode, UTF-8, and the path of most resistance! [webmasterworld.com]
>> 3.) Should I save all the files as utf-8?
The answer to question three is yes, keep it consistent - go utf-8 for everything by default - even if you think the page only contains ASCII characters you will be saving yourself a lot of hassle in the long run. Note: don't use anything outside the usual ASCII range for PHP function names and such.
...and my table's collation in the database is set to utf8_bin.
I really don't know much about the database specifics here I'm afraid. But isn't the collation just the rules that govern how characters are compared, not how they are actually stored/encoded?
[dev.mysql.com...]
utf8_bin - Binary collation (case-sensitive comparison ?)
Looking around, I would guess that the correct character set encoding to use is simply 'utf8'. And perhaps set after having connected to the DB like:
SET NAMES utf8
Just a thought... if data has already been written to the DB with a latin charset, it may need to be converted... read as latin, written back as utf8? (not sure)
This is rather speculative, however, so may be the Database forum [webmasterworld.com] can offer more sound advice? In fact I notice that the latest thread "MySQL converting character sets question [webmasterworld.com]" perhaps deals with a related topic (although no replies as yet).
I would be interested to know the outcome of this.