Forum Moderators: phranque

Message Too Old, No Replies

Xml/xsl

converting xml files to another xml file

         

TDarksword

12:59 pm on Mar 28, 2004 (gmt 0)

10+ Year Member



Hmm as there doesn't appear to be an XML/XSL forum i guess this will go here.

I have an xml file that is an export from a central server (in other words I can't get the way it is dumped changed :-) ). Unfortunately this is not quite in the form I'd like it. I've heard I can transform one XML file into another using XSL but im not sure how to do it (ideally it needs to be done server side aswell).

The current XML layout is as follows:-

serverdump.xml

<?xml version="1.0" encoding="UTF-8"?>
<all_server_dump>
<clusters>
<cluster name="X4452L" cluster_id="1">
<servers>
<server name="H2212A" server_id="1">
<users>
<user user_id="77397610">
<name>Tony Harper</name>
<grade>Administrative Assistant</grade>
<grade_type_id>1</grade_type_id>
<legacy>reviews</legacy>
<email>outlook</email>
<office>msoffice</office>
<update_timestamp>199093</update_timestamp>
</user>
<user user_id="88495343">
<name>Nicola Smith</name>
etc

each server cluster (of which there are 20) has 4 servers within it. Users can exist on more than one server within the cluster.

what I need to do is divide the main dump into a dump for each server cluster, and make each cluster dump look like

serverdump_X4452L.xml

<?xml version="1.0" encoding="UTF-8"?>
<users>
<user user_id="77397610">
<name>Tony Harper</name>
<grade>Administrative Assistant</grade>
<grade_type_id>1</grade_type_id>
<legacy>reviews</legacy>
<email>outlook</email>
<office>msoffice</office>
<update_timestamp>199093</update_timestamp>
<servers_on>"H2212A,H2212C,H2212D"</servers_on>
</user>
<user user_id="88495343">
<name>Nicola Smith</name>
etc

as can be seen aswell as splitting the server clusters up I need to rearrange the data slightly, sorting it primarily by user instead of server, and by adding an element to each user that contains a concatonated list of the servers the user is on.

Mike_Levin

5:57 pm on Mar 28, 2004 (gmt 0)

10+ Year Member



Hi TDarksword,

It looks like you're trying to output for a different file for each server, is that correct? If so, this is very doable, but you're running up against one of the incompatibilities between XSL parsers.

What server-side programming technology do you plan to use? Is it Windows or Linux? Theoretically, it shouldn't matter, because XSL parsing is XSL parsing. But Microsoft's MSXML object doesn't support the xsl:document tag. I usually use on oldie but goodie called XT [blnz.com] for this but you will need to be able to run Java.

Let me know if you want to take the next step. If you've never done this before, it can be a strange trip. XSL template matching is somewhat mind-blowing at first (probably the reason it's not more widely talked about).

TDarksword

7:34 am on Apr 1, 2004 (gmt 0)

10+ Year Member



Sorry its taken so long to get back to you, been trying to get hold of the webhost to find out which server its on (pretty sure it's Linux) but it appears he's on holiday.

It looks like you're trying to output for a different file for each server, is that correct? If so, this is very doable, but you're running up against one of the incompatibilities between XSL parsers.

Yes I am trying to output a different file for each server cluster.

davidpbrown

8:40 am on Apr 1, 2004 (gmt 0)

10+ Year Member



Hi TDarksword,

Welcome to WebmasterWorld.

My limited knowledge of XSLT suggests using XSLT 2.0, which has xsl:result-document that can generate multiple output documents from many inputs. You'd need a Saxon 7.0 processor which I expect is avaliable on a variety of platforms.

If you don't get a complete answer here, you could ask the experts at Mulberry Technologies XSL-List [mulberrytech.com].

Mike_Levin

1:31 pm on Apr 1, 2004 (gmt 0)

10+ Year Member



You don't have to do XSLT processing on the server in real time. You can do it directly on your desktop, or some other computer you set up with a scheduled task. Just grab the XML from the central server, do the transform, and send the resulting files back to where you need them to go (permissions allowing).

TDarksword

2:07 pm on Apr 1, 2004 (gmt 0)

10+ Year Member



The reasons I'd like to do it server side is:-
1. that the server export is updated hourly
2. I wont be the only one using the resulting files, and if I'm off for a few weeks on holiday, and I'm doing the transform on my computer, the files wont be updated on the website for those 2 weeks:), as opposed to if they're done server side (I have a page script that checks if theres an update, if there is, downloads it to the server and runs the script the XSLT will go in).

(note the XML example isn't the full XML file, I can't post that for security reasons, and it includes some time sensitive information).

davidpbrown

6:40 pm on Apr 1, 2004 (gmt 0)

10+ Year Member



TDarksword,

I misread your first post and took it that you had multiple inputs files. Seeing there's only one I thought I'd have a go.

Bare in mind this is the forefront of my experience of XSLT but it works for me on a file similar to the one you suggest. I don't know about how well optimised it is and there is probably similar avaliable in XSLT 1.0 but I've jumped into XSLT 2.0 and know nothing of the workarounds you'd need.

Note: it is XSLT 2.0 and you will need the Saxon processor from [saxon.sourceforge.net ] (currently the only processor for XSLT 2.0). I don't know about which platforms it works on but expect you'll find one to suit.

Also: you need to replace PATHTODIRECTORY with the path to the output directory.. eg 'C:/Output/'

XSL follows..
Hope it helps.

<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" indent='yes' encoding="utf-8"/>

<xsl:template match="cluster">
<xsl:variable name="filename" select="concat('file:///','PATHTODIRECTORY','serverdump_',@name,'.html')"/>
<xsl:result-document href="{$filename}">
<users>
<xsl:for-each select=".//user[not(@user_id=preceding::*/@user_id)]">
<xsl:sort select="."/>
<user><xsl:attribute name="user_id" select="./@user_id"/>
<xsl:copy-of select="name"/>
<xsl:copy-of select="grade"/>
<xsl:copy-of select="grade_type_id"/>
<xsl:copy-of select="legacy"/>
<xsl:copy-of select="email"/>
<xsl:copy-of select="office"/>
<xsl:copy-of select="update_timestamp"/>
<servers_on>
<xsl:variable name="user" select="./@user_id" />
<xsl:for-each select="../../..//user[@user_id=$user]">
<xsl:value-of select="../../@name"/>
<xsl:if test="position()!=last()">, </xsl:if>
</xsl:for-each>
</servers_on>
</user>
</xsl:for-each>
</users>
</xsl:result-document>
</xsl:template>

</xsl:transform>

--added--
I've just added <xsl:sort select="."/> which should sort by user_id ascending.
Also, one flaw I'm aware of is that if a user did exist on more than one cluster they don't show on any but the first.
(I did paste the code in with indents but they've been stripped off.)
--/added--
Another ~flaw.. obviously, there's no conflict management. If data in a user's records were to differ the output would give the first occurence of it.

TDarksword

6:45 am on Apr 2, 2004 (gmt 0)

10+ Year Member



Thankyou, that looks nearly exactly what I want:)

Couple of quick questions though...
1. The Saxon processor needs to be located in what directory? (root of webpage I assume)

2. The output appears to be an html file, what would I need to do if I wanted the output as an XML file? (Ideally I'd like the output as an XML file and there's further as I'd like the functionality on the website for the user to be able to sort the info how they'd like (plus I can also run the simpler transforms on it for displaying for example users who have been last updated within the last day etc)).

davidpbrown

7:12 am on Apr 2, 2004 (gmt 0)

10+ Year Member



1.
I've not installed on a server but expect it's similar to PC.

Install Saxon in the directory of your choice and so long as your PATH can find java, you can call by command line as
--------
java -jar C:\saxon\saxon7.jar input.xml transform.xsl >output.html
where C:\saxon\saxon7.jar is your location for saxon7.jar
--------
I don't know how you'd call from script.

2.
The output is XML. My mistake - the file is named .html if you correct that to .xml it will represent what the file actually is.
ie. 'serverdump_',@name,'.html')"/>
should be 'serverdump_',@name,'.xml')"/>

If you then want HTML you can do a transform that contains an html template and xsl:value-of

eg
<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" indent='yes' encoding="utf-8"/>
<xsl:template match="/">
<html><body>
<xsl:for-each select="//name">
<td><xsl:value-of select="."/></td>
</xsl:for-each>
</body></html>
</xsl:template>
</xsl:transform>

davidpbrown

7:19 am on Apr 2, 2004 (gmt 0)

10+ Year Member



One side affect is that calling Saxon like that generates an output.html which is empty.

On my to-do list is finding how to call Saxon without generating this redundant file.

TDarksword

9:19 am on Apr 2, 2004 (gmt 0)

10+ Year Member



Many thanks again:)

TDarksword

12:26 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



hmm I can't get saxon to run properley, and as far as I can tell the documentation for saxon is useless if your not used to java:) (IE how do I install it from scratch assuming I have no form of java or saxon previously on my system, pretty sure I'm missin some class libraries but have no idea where to get them or where to put them)

when i do:- run java -jar saxon7.jar input.xml transform.xsl >output.html

I get the following errors


Set uncaught java.lang.Throwable
Set deferred uncaught java.lang.Throwable
>
VM Started:
Exception occurred: java.lang.ClassNotFoundException (uncaught)"thread=main", java.net.URLClassLoader$1(), line=199 bci=72

Mike_Levin

1:53 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



I find that the best thing to do is to live with the redundant file. You can make it a log file. The idea is that it is the normal output file that would occur if the xsl:result-document were not being used.

davidpbrown

2:09 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



edit: Mike_Levin got there first and his answer may be more relevant.
====
Saxon needs JDK 1.4 (~= "Java Developers Kit")

It's been a while since I've installed Java but goto [java.sun.com ]
under Download Bundles click on Separate Bundles = J2SE 1.4.2 and accept the t&c and you'll be offered different flavours.

Once installed as suggested above you'll need to set your PATH to include the path to Java.. which if memory serves, means setting to x/y/z/java/bin/ but it should be clear from the docs.. if not try Google for similar to 'setting path to java 1.4.2 Linux'

HTH

TDarksword

4:01 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



ok poss some confusion there:)

I don't care about the output.html (i'm not getting that far, and could live with it if i was:) )

I have jdk 1.4.2sdk (win XP) installed at H:\j2sdk1.4.2_04\, with java.exe and jav.exe in H:\j2sdk1.4.2_04\bin.

When i run jdb.exe and put run java -jav path\saxon7.jar path\serverdump.xml path\transform.xsl >output.html
i get the errors as detailed above.

I think its tied up with java not being able to find the classes stored in the subdirectories of saxon7-9-1\source\, but I don't know where they should be or how to direct java at them

davidpbrown

6:08 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



Could it be that you have suggested -jav not -jar

java -jav path\saxon7.jar path\serverdump.xml path\transform.xsl >output.html

should be
java -jar path\saxon7.jar path\serverdump.xml path\transform.xsl >output.html

-----
added
Also have you set PATH
eg.
PATH = "C:\WINDOWS;C:\WINDOWS\COMMAND;C:\j2sdk1.4.2_02\bin;C:\XML-MSV"

TDarksword

6:42 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



hmm it was tied up somwhere with the CLI i was using, used a different CLI and it worked fine:) (until it hit a space in one of the server names:))

davidpbrown

6:55 pm on Apr 7, 2004 (gmt 0)

10+ Year Member



Glad to hear it.. I was out of ideas.
:)

TDarksword

2:17 pm on Apr 8, 2004 (gmt 0)

10+ Year Member



Heh another little thing I've come across...


<servers_on>
<xsl:variable name="user" select="./@user_id" />
<xsl:for-each select="../../..//user[@user_id=$user]">
<xsl:value-of select="../../@name"/>
<xsl:if test="position()!=last()">, </xsl:if>
</xsl:for-each>

leaves me with the servers squashed together (eg h2332hh3332hh2223h) how do i get it so the string is at least space deliminated (h2332h h3332h) or comma/space deliminated (h2332h, h3332h).
Not quite sure how this bit works so cant do it myself:)

davidpbrown

3:13 pm on Apr 8, 2004 (gmt 0)

10+ Year Member



That's what the
<xsl:if test="position()!=last()">, </xsl:if>
does.
I can't suggest why it wouldn't be working now for you.. it did for me.
--------
added..

I suppose you could try explicitly suggesting the output there is text..

<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>

TDarksword

5:59 pm on Apr 8, 2004 (gmt 0)

10+ Year Member



Heh, it works but it leaves the servers as one word (eg server1server2server3), bit hard to read when I output it. Ideally i need some spaces in there somewhere:)