Welcome to WebmasterWorld Guest from 54.145.173.36

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Remove duplicates in XSLT 1.0

   
9:05 am on Aug 23, 2011 (gmt 0)

5+ Year Member



Hi,
I am stuck with a problem removing duplicate nodes.
My XML looks like:

...
<stat>
<overview>

<tools>
<item resp="abc"><link id="...">Item 1</link></item>
<item resp="abc"><link id="...">Item 2</link></item>
<item resp="abc"><sub><link id="...">Item 3</link></sub></item>
</tools>

<tools>
<item resp="abc"><link id="...">Item 4</link></item>
<item resp="abc"><sub><link id="...">Item 1</link></sub></item>
</tools>

<tools>
<item resp="abc"><link id="...">Item 1</link></item>
<item resp="abc"><link id="...">Item 5</link></item>
</tools>

</overview>
</stat>


Now I want to output all unique "Items" with @resp=abc.
To just print all is no problem using:

<ul>
<xsl:for-each select="//overview/tools//item[contains(@resp,$role)]">
<xsl:apply-templates select="."/>
</xsl:for-each>
</ul>

That gives me:
Item 1
Item 2
Item 3
Item 4
Item 1
Item 1
Item 5

But my desired outcome, removing duplicates, is:
Item 1
Item 2
Item 3
Item 4
Item 5

I tried to compare each item's name with all preceding ones and skip the ones already existing by:

<xsl:for-each select="//overview/tools//item[contains(@resp,$role) and not(text() = preceding::item[contains(@resp,$role)]/text())]">
<xsl:apply-templates select="."/>
</xsl:for-each>

but with the same output (with duplicates).

Any help is much appreciated!
Regards /Claes
1:21 pm on Aug 23, 2011 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Hi Claes100,
I suspect that XSLT is capable of this, but it won't be easy and I don't know the answer. My take on XSLT is "GIGO" - Garbage In, Garbage Out... When I'm transforming XML I do expect that the XML is going to be formatted and sorted and deduped and packaged up appropriately. The tools available in XSLT are not ideal for doing this kind of work.

Approaching the same problem, I'd be inclined to change the XML input rather than knit together complex XSLT templates to do data-manipulation jobs that XSLT isn't suited for. My language of preference is PHP, and I'm pretty sure I could dedupe the XML in about 20 lines of code, with 30 minutes of work, and the result would execute faster than the same thing done with XSLT.

Do you have control over the source of the XML?

If XSLT is your only option, there may be a way

Here's a snippet I found on the interwebz. It checks each node to see if any of the preceding ones have the same value


<xsl:template match="@*|node()">
<xsl:if test="not(node()) or not(preceding-sibling::node()[.=string(current())])">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
source [jguru.com]
1:43 pm on Aug 23, 2011 (gmt 0)

5+ Year Member



Thanks!
No, I don't have control over the XML source unfortunately...
I'll give it a try tomorrow, and yes, it would have been easier (and more fun) to solve it by php... :)
/Claes
8:39 pm on Aug 30, 2011 (gmt 0)

5+ Year Member



Just wanted to post a short comment that I didn't need to test the snippet above. The structure of the XML is changed and duplicates will no longer exist.
Thanks anyways!
/Claes