Splitting Description field

Forum Moderators: open

Message Too Old, No Replies

Splitting Description field

How can I separate these items

WebWalla

12:26 pm on Dec 11, 2006 (gmt 0)

I have an XML feed which includes a description element like the one below:


<description>
 <![CDATA[ <img src='http://www.example.com/images/1_thumb.jpg' border='0' /><br/>
 Text description of item
 ]]> 
</description>

At the moment I'm parsing it so that the entire element is output "as is", with the image, then the line break, then the text description.

Is there any way of parsing this so that I can get the image and the text description as separate elements?

P.S. My parsing knowledge is minimal, at the moment I have just adapted an existing script to do this for me.

Thanks!

choster

4:28 pm on Dec 11, 2006 (gmt 0)

When you've wrapped content in a <![CDATA[]]> section, you're asking the XML parser to treat the contents as text. CDATA is not parsed by definition, and the escaped pseudo-elements within it are not part of the document tree.

That said, there are parser-specific extensions that will read text into nodes, such as saxon:parse(). Check the documentation on your parser to see if such a function is supported.

WebWalla

8:12 pm on Dec 11, 2006 (gmt 0)

From my limited knowledge, that's what I thought. But then the owner of the feed said ..

"RSS parser could do it - so could CSS"

Is it really not possible then?

Thanks.

choster

8:21 pm on Dec 11, 2006 (gmt 0)

CSS or XSL? They do very different things.

WebWalla

8:37 pm on Dec 11, 2006 (gmt 0)

The quote is verbatim - CSS.

But could it be done with XSL? If so, can you give me an indication how?

[edited by: WebWalla at 8:44 pm (utc) on Dec. 11, 2006]

choster

4:45 pm on Dec 12, 2006 (gmt 0)

An RSS parser or and CSS parser (i.e. browser) is consumer-level, and is probably going to be more "forgiving." But the parser is supposed to ignore CDATA. That is an XML-wide rule, not something unique to XSL.

As I noted, there are extensions to the common parsers which will force it to interpret the contents of a CDATA section as XML nodes, but these are specific to the parser being used.

WebWalla

7:58 pm on Dec 12, 2006 (gmt 0)

OK, I get what you're saying.

I'm using the rss2html script to parse this feed. I think I'll just have to wait until the feed owner changes the format.

Thanks for the info.