Forum Moderators: coopster

Message Too Old, No Replies

How to serve XML files with special characters

Special characters are not getting through

         

asantos

10:42 pm on Dec 22, 2008 (gmt 0)

10+ Year Member



Hi,
im serving XML files through a get.php file.

This is the header:
header('Content-type: application/xml; charset="UTF-8"',true);
echo '<?xml version="1.0" encoding="UTF-8"?>';
echo '<data>';

...and footer:
echo '</data>';die();

Almost everything works great. Just a few hours ago I tested data with special characters (spanish, german and portuguese); and this was the result:

<?xml version="1.0" encoding="UTF-8"?>
<data>
<meta>
<new>2</new>
<total>2</total>
</meta>
<list>
<item>
<id_msg>5</id_msg>
<name>John Smith</name>
<date>02:32</date>
<msg>Im fine. How are you?</msg>
</item>
<item>
<id_msg>15</id_msg>
<name>Roman Polanski</name>
<date>Aug 30</date>
<msg>Hi, these are special characters: spanish ? ? ? ? ?, deutsch ? ?, portuguese ?</msg>
</item>
</list>
</data>

The last <msg> node should contain:
spanish á é í ó ú ñ, german ä ë, portuguese ç

The information that gets served inside the data node comes from a UTF-8 mysql database.

Why are the original characters not getting through?

IanKelley

2:45 am on Dec 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you maybe running them through a non UTF compatible PHP string function (i.e. most of them) between the database and the output?

asantos

9:01 pm on Dec 23, 2008 (gmt 0)

10+ Year Member



I get the data directly from the database with Adodb for PHP. No string functions in between.

Could it be a charset definition on the adodb configuration? I have read that a mysql connection doesnt support that property with adodb.

IanKelley

11:04 pm on Dec 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never used Adodb (or any other) so I can't say for sure but yes it's definitely possible that something is happing in one of their routines that is messing up the char encoding.

What happens if you print multibyte strings directly from your script?

Example:

echo 'Iñtërnâtiônàlizætiøn';

asantos

11:29 pm on Dec 23, 2008 (gmt 0)

10+ Year Member



@IanKelley
it prints:

Iñtërnâtiônàlizætiøn
(the php file is UTF-8 encoded)

If i burn the "Iñtërnâtiônàlizætiøn" value on the xml, it prints fine. It only gets screwed when i get it from the DB (which uses UTF-8):
I?t?rn?ti?n?liz?ti?n

IanKelley

11:33 pm on Dec 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah that leaves no doubt... it's something in the DB abstraction that's doing it.

My suggestion would be to ditch the database abstraction it's HIGHLY over rated.

You will end up with a much more efficient script and no multibyte character issues.

asantos

12:12 am on Dec 24, 2008 (gmt 0)

10+ Year Member



@IanKelley:
"My suggestion would be to ditch the database abstraction it's HIGHLY over rated."

Sorry, i dont quite understand that. Could you be more explicit? Thanks!

IanKelley

3:27 am on Dec 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of using Adodb to access MySQL, use the built in PHP database functions:

http://php.net/mysql [php.net]

asantos

4:54 pm on Dec 29, 2008 (gmt 0)

10+ Year Member



Hi IanKelley,
thank you very much for the help, but i solved it by keeping the adodb for php with this line:

$cnn->execute('SET NAMES utf8');

;)