Forum Moderators: coopster

Message Too Old, No Replies

parsing foreign characters in xml

my é need to be &#233

         

thing3b

2:55 am on Jan 17, 2005 (gmt 0)

10+ Year Member



I have an xml file that contains characters like é for example:


<entry>
<term>Acher&eacute;</term>
<use>Atcher&eacute;</use>
</entry>

What I am trying to do is get the xml file to be parsed by php and then turn the data into a viewable webpage.

I think that the &eacute; somehow needs to be an &#233 ( or é with a semicolon on the end ) etc....

When creating the parser I am using...

$xml_parser = xml_parser_create('UTF-8');

and if I use html_entity_decode on the string before parsing it, it says: "XML error: not well-formed (invalid token) at line 101"

If I do not use the html_entity_decode or set the parser to UTF-8 the characters are turned in question marks as the documentation states that If PHP encounters characters in the parsed XML document that can not be represented in the chosen target encoding, the problem characters will be "demoted". Currently, this means that such characters are replaced by a question mark. (http://nz.php.net/xml [nz.php.net] )

Any Ideas?
Thanks you very much to all of you that help.

Oh and PHP Version 4.3.10

ergophobe

5:54 pm on Jan 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You shouldn't get question marks with utf-8. Are you looking at the output in a browser or a text editor? Have you tried ISO-8859-1?

In most cases, I find that if the encodings are set right, you don't have to convert to entities except for a few characters (& amp;) and in urls and such.

I'm not familiar with the xml_parser functions though.

thing3b

12:14 am on Jan 20, 2005 (gmt 0)

10+ Year Member



Thanks for that, you have been a great help. I am quite annoyed how easy the solution was though. All I did was change the html from

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

to

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

Looks like my PHP skills are up to scratch, just my html is not anymore :)

ergophobe

4:44 pm on Jan 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Look on the bright side. Just think how much more annoyed you would be if the solution were really really hard.