Welcome to WebmasterWorld Guest from 54.159.250.110

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Transforming DOM to XML output

   
4:40 pm on Oct 15, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



I'm working on a Java application which transforms a DOMSource object (javax.xml.transform.dom.DOMSource) to a StreamResult object (javax.xml.transform.stream.StreamResult) which is then written to a file. This seems to output empty elements using the short notation:

<myelement />

Unfortunately, a 3rd party that I'm working with needs the XML to use the long format with a closing tag:

<myelement></myelement>

This question may be somewhat Java specific, but does anyone know of a way to configure the transformer to output the long format?

9:00 pm on Oct 15, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The closed format is the "proper" format for a node with no children. But I know what you mean - some HTML tags (like <script> and <textarea>) mustn't be closed like that, or Bad Things May Happen.

I have that problem with a certain .NET XSLT parser. It has problems with all empty nodes - and it totally barfs on "<textarea></textarea>". There is no config option available to fix it, so we've had to put spaces in all our nodes to keep them from self-closing... it's a nasty solution that has caused several other problems further up the stack.

<node>&#160;</node>

You might also try:

<node>&NULLENTITY;</node>
where NULLENTITY is declared to be NULL (or, an empty string) in the DTD, like this:
<!ENTITY NULLENTITY "">

Not being a Javaist I don't know if any of the above will work

9:14 pm on Oct 15, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Thanks. In our case, the output is an RSS feed. But the company that ingests it must not be using a standard parser. Thanks for the suggestions. If I can't find a way to force the end tag to be generated, then perhaps an empty string will work.
2:32 pm on Oct 16, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



if you have the XML as a string, maybe there's a reliable way to do it with a REGEX replace()

like,

<([^\s]*)([^>]*)/>
replace with
<$1$2></$1>

6:53 pm on Oct 24, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Thanks. I'm using a Java's String method replaceAll like this:

s = s.replaceAll("<([^\\s]*)([^>]*)/>", "<$1$2></$1>");

That regex turned this:

<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"/>

into this:

<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

Close, but not quite. Any suggestions?

7:52 pm on Oct 24, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

OK that's just weird. where did "/media:keywords>" come from? it's not in the matched string; in that spot (highlighted red above) should be "media:thumbnail".

Is something amiss with the Java replaceAll() method?

8:18 pm on Oct 24, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



The element before this one. Here's a more complete XML snippet:

<media:keywords>Example</media:keywords>
<media:thumbnail height="100" url="http://example.com/a.jpg" width="133"><//media:keywords>>

8:23 pm on Oct 24, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



<([^\\s]*)

Matches:
< + any non-whitespace character (including /)

Perhaps I need this:
<([^\\s/]*)([^>]*)/>

?

8:45 pm on Oct 24, 2008 (gmt 0)

WebmasterWorld Senior Member fotiman is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Just gave it a try and that seemed to be it. :)
2:07 am on Oct 25, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



excellent!