The UTF-8 meta tag you have in your HTML source may be overruled by a content type header sent by the web server software. Headers sent before the HTML source have higher precedence than meta tags. You can check this with a server header checker. There are tool websites which offer that functionality (WebmasterWorld has such a header checker in the subscription area for example), or you can use the Live HTTP Headers add-on for FireFox. The way to change this header differs depending on the server software (Apache, IIS, Nginx) you use.
Also - make sure that you save the actual html text file as utf-8. By default, most text editors save as Latin-1 or Mac Os Roman. You can change it in 'preferences' (It may be called something different in Windows) of your text editor. If you don't save the file as utf-8 then even with the declaration in your file header, it won't work.
Thanks for your help guys. I'm implement these ideas when I'm next on this project and post my results.
No... Actually, Comannder - it worked! It had just converted the foreign characters to funny little things. Having re-inserted them it works a treat!
Thanks for the help!
One more question though, please.
What's the difference between with and without BOM (byte order mark). Which one shall I use?
The BOM (three unique bytes at the beginning of a file which define the file type) causes text editors to automatically switch to the right encoding setting when a file is loaded from disk. But I have seen some strange browser behaviour when serving pages with a BOM in it. For webserving purposes, it is therefore better to leave them out.
Glad I could be of assistance -
UTF-8 does not require BOM and in some browsers it will give you a blank line or funny little things at the beginning of a document. So it's best not to use it. Save your files as UTF-8 with no BOM.
Read this - [w3.org ]
or this - [w3.org ]
and since you are working heavily with international alphabets, try to find the time to read all of this - [w3.org ]
If you are using Windows you may need to know that
|A particular protocol (e.g. Microsoft conventions for .txt files) may require use of the BOM on certain Unicode data streams, such as files. When you need to conform to such a protocol, use a BOM. |
from here [unicode.org ]
I would like to add that when you set your text editor to save as UTF-8, you also have to set set it to open this type of file. If you choose UTF-8 also, then your editor may not open Latin-1 or Mac OS Roman encoded files. In this case you get a 'cannot open file' warning.
Text Editor on OS X has the open option Automatic. This opens any file.
But then remember that your text editor will still save all files, and convert all files that you open, to UTF-8.
So if you suddenly have trouble opening files after changing both 'open as', and 'save as', settings in your text editor to UTF-8, just go in and change those settings back to default or play around with them. Usually you will just have to reset the text editor to open files as Latin-1 if you run across a file that won't open, since that is the more common default encoding. Then reset back to UTF-8 when you need to open the UTF-8 files again.
But I recommend using the setting automatic for open, and UTF-8 for save.
Wow, thanks for such comprehensive replies peeps. Really appreciate it.
I have already begun reading through those pages, Commander. Thanks!
Really can't express my gratitude enough. It really had me stumped!