Msg#: 3185025 posted 12:26 pm on Dec 11, 2006 (gmt 0)
I have an XML feed which includes a description element like the one below: <description> <![CDATA[ <img src='http://www.example.com/images/1_thumb.jpg' border='0' /><br/> Text description of item ]]> </description>
At the moment I'm parsing it so that the entire element is output "as is", with the image, then the line break, then the text description.
Is there any way of parsing this so that I can get the image and the text description as separate elements?
P.S. My parsing knowledge is minimal, at the moment I have just adapted an existing script to do this for me.
Msg#: 3185025 posted 4:28 pm on Dec 11, 2006 (gmt 0)
When you've wrapped content in a <![CDATA]> section, you're asking the XML parser to treat the contents as text. CDATA is not parsed by definition, and the escaped pseudo-elements within it are not part of the document tree.
That said, there are parser-specific extensions that will read text into nodes, such as saxon:parse(). Check the documentation on your parser to see if such a function is supported.
Msg#: 3185025 posted 4:45 pm on Dec 12, 2006 (gmt 0)
An RSS parser or and CSS parser (i.e. browser) is consumer-level, and is probably going to be more "forgiving." But the parser is supposed to ignore CDATA. That is an XML-wide rule, not something unique to XSL.
As I noted, there are extensions to the common parsers which will force it to interpret the contents of a CDATA section as XML nodes, but these are specific to the parser being used.