Forum Moderators: open

Message Too Old, No Replies

xml and UTF-8

uuml undefined entity?

         

matthias

5:54 pm on Aug 4, 2002 (gmt 0)

10+ Year Member



I serve a xml document and the encoding is UTF-8 but I am not able to include a 'ü' (ü) in the document. All browsers keep telling me that uuml is an unknown entity. How else should I code 'ü'?

tedster

12:40 am on Aug 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm pretty sure XML requires unicode instead of the named HTML character - ü in this case.

matthias

4:54 am on Aug 5, 2002 (gmt 0)

10+ Year Member



Thank you! Looks good.

2nd part of the question would be, how can I translate characters in unicode using php?

(or should I ask such additional but related question separately in the php forum?)

Thors Hammer

5:00 am on Aug 5, 2002 (gmt 0)

10+ Year Member



Try this link, it might help

[html.about.com...]

Hope it helps :)

Thor

matthias

12:47 pm on Aug 5, 2002 (gmt 0)

10+ Year Member



If anybody cares, I found a xmlentities function on [docs.akbkhome.com...] which solves my problem.

ergophobe

9:27 pm on Aug 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Matthias,

I have wondered about the same thing and have been meaning to experiment with the XML functions native to PHP. Have tried just typing whatever text you want without entities, but in ISO-8859-1, and then using utf8_encode()? Here's the man page:

utf8_encode

(PHP 3>= 3.0.6, PHP 4 >= 4.0.0)
utf8_encode -- encodes an ISO-8859-1 string to UTF-8

Description

string utf8_encode ( string data)

This function encodes the string data to UTF-8, and returns the encoded version. UTF-8 is a standard mechanism used by Unicode for encoding wide character values into a byte stream. UTF-8 is transparent to plain ASCII characters, is self-synchronized (meaning it is possible for a program to figure out where in the bytestream characters start) and can be used with normal string comparison functions for sorting and such. PHP encodes UTF-8 characters in up to four bytes, like this:

Table 1. UTF-8 encoding

bytes bits representation
1 7 0bbbbbbb
2 11 110bbbbb 10bbbbbb
3 16 1110bbbb 10bbbbbb 10bbbbbb
4 21 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb

Each b represents a bit that can be used to store character data.

matthias

10:13 pm on Aug 5, 2002 (gmt 0)

10+ Year Member



Yes, I saw this one too but it didn't worked.