homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / XML Development
Forum Library, Charter, Moderators: httpwebwitch

XML Development Forum

    
Transforming DOM to XML output
Fotiman




msg:3766234
 4:40 pm on Oct 15, 2008 (gmt 0)

I'm working on a Java application which transforms a DOMSource object (javax.xml.transform.dom.DOMSource) to a StreamResult object (javax.xml.transform.stream.StreamResult) which is then written to a file. This seems to output empty elements using the short notation:

<myelement />

Unfortunately, a 3rd party that I'm working with needs the XML to use the long format with a closing tag:

<myelement></myelement>

This question may be somewhat Java specific, but does anyone know of a way to configure the transformer to output the long format?

 

httpwebwitch




msg:3766458
 9:00 pm on Oct 15, 2008 (gmt 0)

The closed format is the "proper" format for a node with no children. But I know what you mean - some HTML tags (like <script> and <textarea>) mustn't be closed like that, or Bad Things May Happen.

I have that problem with a certain .NET XSLT parser. It has problems with all empty nodes - and it totally barfs on "<textarea></textarea>". There is no config option available to fix it, so we've had to put spaces in all our nodes to keep them from self-closing... it's a nasty solution that has caused several other problems further up the stack.

<node>&#160;</node>

You might also try:

<node>&NULLENTITY;</node>
where NULLENTITY is declared to be NULL (or, an empty string) in the DTD, like this:
<!ENTITY NULLENTITY "">

Not being a Javaist I don't know if any of the above will work

Fotiman




msg:3766466
 9:14 pm on Oct 15, 2008 (gmt 0)

Thanks. In our case, the output is an RSS feed. But the company that ingests it must not be using a standard parser. Thanks for the suggestions. If I can't find a way to force the end tag to be generated, then perhaps an empty string will work.

httpwebwitch




msg:3767095
 2:32 pm on Oct 16, 2008 (gmt 0)

if you have the XML as a string, maybe there's a reliable way to do it with a REGEX replace()

like,

<([^\s]*)([^>]*)/>
replace with
<$1$2></$1>

Fotiman




msg:3773037
 6:53 pm on Oct 24, 2008 (gmt 0)

Thanks. I'm using a Java's String method replaceAll like this:

s = s.replaceAll("<([^\\s]*)([^>]*)/>", "<$1$2></$1>");

That regex turned this:

<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"/>

into this:

<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

Close, but not quite. Any suggestions?

httpwebwitch




msg:3773076
 7:52 pm on Oct 24, 2008 (gmt 0)

<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

OK that's just weird. where did "/media:keywords>" come from? it's not in the matched string; in that spot (highlighted red above) should be "media:thumbnail".

Is something amiss with the Java replaceAll() method?

Fotiman




msg:3773098
 8:18 pm on Oct 24, 2008 (gmt 0)

The element before this one. Here's a more complete XML snippet:

<media:keywords>Example</media:keywords>
<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

Fotiman




msg:3773104
 8:23 pm on Oct 24, 2008 (gmt 0)

<([^\\s]*)

Matches:
< + any non-whitespace character (including /)

Perhaps I need this:
<([^\\s/]*)([^>]*)/>

?

Fotiman




msg:3773118
 8:45 pm on Oct 24, 2008 (gmt 0)

Just gave it a try and that seemed to be it. :)

httpwebwitch




msg:3773226
 2:07 am on Oct 25, 2008 (gmt 0)

excellent!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / XML Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved