Forum Moderators: coopster

Message Too Old, No Replies

XML XSLT Sablotron Strangeness...

         

rabbit_fufu

5:13 pm on Mar 21, 2004 (gmt 0)

10+ Year Member



Hi Guys,

I've actually found a workaround for this issue already, but I'm not sure I fully understand what is happening here, so I thought I'd bring it up to see if anyone has any input...

Here's the scenario. I'm using dom_xml to create xml data out of a recordset retrieved from mysql. I then perform an xsl transform on the xml data with sablotron.

So far so good. In practice however, I found that sablotron would occassionally experience a fatal error (invalid token.) After much cussing I traced the problem to certain cyrillic characters (ie. #156). As a solution I'm simply stripping all cyrillic characters out of my resultset before I turn it into XML. Works fine now, but being relatively new to this I'm nevertheless interested in learning a bit more about where the fault actually lies. Is it...

1) Certain cyrillic characters are not legal in XML, and the dom_xml functions are not doing the conversion job properly... So this problem lies in php's dom functions.

2) Certain cryillic characters are not legal in an XSL stylesheet.

3) The characters are legal, but there is some kind of a bug in sablotron itself that causes it to conk out when it encounters certain high ascii characters...

If anyone has encountered these sorts of troubles before I'd certainly be interested in learning a bit more about what is actually going on.

Thanks!

-trav

ergophobe

5:53 pm on Mar 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What character encoding are you using?


high ascii characters...

Well, we can rule that out, because you are not using high ascii characters. Extended ASCII has no Cyrillic characters.

My guess is that you are using multiple character sets.

- Your document is set to a character encoding from the Unicode or ISO-8859 families of character encodings.

- Your data is using a Windows or Mac encoding.

The problem is that certain values (130-159 decimal) used in the Windows encodings are illegal in ISO-8859 and Unicode. Therefore, you need to convert your input text to the proper character encoding, or you need to set your xml page to the same character encoding as your input.

For more information see

[cs.tut.fi...]

rabbit_fufu

7:09 pm on Mar 21, 2004 (gmt 0)

10+ Year Member



Hi there, thanks for the nudge in the right direction. This is still a bit confusing to me, but I think I'm on the right path now... After more reading it looks as though the internal encoding for dom when storing the xml doc is utf-8, so any non utf-8 characters need to be converted before storing, and then converted back to ISO-8859-1 on a dump_mem. anyhow, things seem to be working now, (know on wood!)

-trav