Forum Moderators: open

Message Too Old, No Replies

UTF-8 Encoded HTML page, malformed HTML source?

utf-8, malformed html source, encoding issues, encoding problem

         

Jan_Jaap

7:51 pm on Oct 12, 2006 (gmt 0)

10+ Year Member



Hi!

I am trying to make my forum compatible for any language, but I am running into the folowing problem:

All works fine now, everything from the database to the output of the scripts are UTF-8 encoded and the sites display special characters fine! Just, when you click "View source" the HTML source is malformed with UTF-8 encoded characters! This is a real problem, since Google will also index these malformed characters as we discovered in the past!

Does anyone know a solution for this?

I use the folowing setup:
- utf8 encoded database
- utf8 fetching of the database in PHP (required)
- utf8 header in PHP scripts
- utf8 header via .htaccess
- utf8 metatag in the HTML

I hope someone knows the solution for this!

Best Regards,
Jan Jaap Hakvoort

encyclo

12:58 am on Oct 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the characters display correctly when viewing in the browser, then they are correctly encoded assuming that you are declaring UTF-8 and that UTF-8 is the selected charset in the browser. How exactly are you viewing source? And when you say that Google indexes the malforned characters, is this because you are viewing the source of the Google cached page? How were the pages/template created (which editor)?

There are two possibilities. Firstly that the font used by your view-source program does not include glyphs for the characters (which in fact is not a problem at all and can be overcome by choosing a Unicode font in that program). Second possibility is that you have an issue with a byte-order mark at the beginning of your document - the BOM is optional and should be omitted when using UTF-8 on the web, but certain editors (in particular Notepad) add it.