homepage Welcome to WebmasterWorld Guest from 54.145.209.80
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / XML Development
Forum Library, Charter, Moderators: httpwebwitch

XML Development Forum

    
xslt and css
vero

5+ Year Member



 
Msg#: 3798217 posted 1:50 pm on Dec 2, 2008 (gmt 0)

I'm very new to this and have a large number of files that were converted from html to xml. Portions of the text within a content tag are supposed to be bold, when displayed. This is how they look now, basically nested (I know, I know bad, but I didn't realize this when I did it).

<content>
<paragraph>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent justo. <strong>Quisque et velit</strong> - Lorem ipsum dolor sit amet.
</paragraph>
</content>

In my xslt, I now display the content of the paragraph with

<p>
<xsl:value-of select="content/paragraph"/>
</p>

And I use a css to make the font arial.

Now I want to make what's between the strong tags as bold/strong. I looked at the excellent haiku example in the library, and tried adding "for-each", but that just listed everything bewteen the strong tags separately in addition to the paragraph, not within the paragraph.

Is there a way to get this effect as it's written now? Like some kind of if-else thing? Or do I have to go and re-do all the the xml files, adding non-strong tags to all that isn't supposed to be strong? (AArrrgh!)

Thanks!

 

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3798217 posted 4:40 pm on Dec 2, 2008 (gmt 0)

vero, you're correct to wallow in self-loathing. marked up HTML can just be put intact into a CDATA section but I'll spare you the lecture ;)

Can you do a search-and-replace through all the XML files, and enclose all your <content> with <![CDATA[ ... ]]> ? and change your <paragraph>'s to <p> ? I bet a regular expression could take care of that pretty easily. Just make sure you make a backup of the original files before you start doing global string replacements!

barring that I'll need to do a little experimentation before I can answer your question

what happens if you add "disable-output-escaping" to the <xsl:value-of> command? (must check the manual for exact syntax...)

vero

5+ Year Member



 
Msg#: 3798217 posted 5:08 pm on Dec 2, 2008 (gmt 0)

Thanks so much for the suggestions.
I tried in the xslt:
<xsl:value-of select="content/paragraph" disable-output-escaping="yes"/>
But it still was non-bold.

Then I added CDATA around the strong tags

<paragraph>blah blah <![CDATA[<strong>]]>blah blah<![CDATA[</strong>]]> blah blah</paragraph>

and it did display correctly

But... I assume this means that an xml parser wouldn't know that a term was supposed to be "different" from the other text - is that right?

So just to be completely correct, I think I may just go wallow in self-loathing for a day or so, then re-do the files the right way. But it's good to have an alternative - so thank you!

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3798217 posted 6:13 pm on Dec 2, 2008 (gmt 0)

actually, you ought to add the CDATA around the entire HTML fragment.

<content>
<![CDATA[
<p>lorem ipsum <b>bold</b></p>
<p>lorem ipsum <b>bold</b></p>
]]>
</content>

that way the entire HTML content is non-parsed, and the XML/XSLT transformation won't barf even if your HTML isn't perfectly formed! Not that I would accuse you of creating imperfect HTML... but...

When you have HTML inside XML (as in your situation), and it gets parsed (which is a precondition of transforming it with XSLT), the real DOM tree gets butchered. Consider this example:

<content>
<paragraph>Lorem ipsum <strong>bold text</strong> dolor sinc</paragraph>
</content>

When parsed, the DOM looks like this:

<content>
<paragraph>
< [[cdata text node]] />
<strong>bold text</strong>
< [[cdata text node]] />
</paragraph>
</content>

When you parse the <content> node of that example, <paragraph> actually has 3 children, not one. If you do complex HTML markup with tables, ordered lists, etc the result can be nearly unrecognizable. The HTML gets chopped to bits, making any XPATH and XSL terrifically awkward.

going forth, you can do a search-and-replace replacing
"<content>" with "<content><![CDATA["
"</content>" with "]]></content>"
"<paragraph>" with "<p>"
"</paragraph>" with "</p>"

<strong> can be left as-is, because it's the semantic equivalent of <b>, and it's valid HTML

Put disable-output-escaping on the <xsl:value-of _ > to prevent your angle brackets from being rendered as &lt; and &gt;, and you'll be done.

I use an enhanced text editor for searching and replacing in files; most good text editors have some such feature

There might indeed be an elegant way to avoid all this with XSLT, but I'm taking this "rewrite the XML" possibility to its conclusion in case you want to motor ahead with it - the benefit being you'll end up with better XML source files.

Cheers

vero

5+ Year Member



 
Msg#: 3798217 posted 6:43 pm on Dec 2, 2008 (gmt 0)

If I'm going to re-do it, and really wanted it to be correct, would the right way be:

<paragraph>
<sentence stype="notstrong">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent justo. </sentence>
<sentence stype="strong">Quisque et velit.</sentence>
<sentence stype="notstrong">Lorem ipsum dolor sit amet.</sentence>
</paragraph>

and then use xsl:choose ?

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3798217 posted 7:10 pm on Dec 2, 2008 (gmt 0)

I guess it depends if you need to parse the text content, using XPATH or XSLT. Creating new nodes called <paragraph> and <sentence> means they're part of the XML node tree, so you can "match" them with XPATH, iterate through them with <xsl:for-each>, etc.

But, as you've found, try displaying the whole <content> node as marked-up HTML and you'll find it more difficult than necessary

Personally I'd just wrap the HTML as CDATA and leave it at that. If I ever needed to grab the bold text out of the HTML, there are other ways to do it using regular expressions and whatnot.

Interesting detail:
If you wrap the HTML as CDATA, your text can include HTML entities like &copy;, and they'll render in the browser as . Inside an XML node, you can't use entities like that because they're not defined in the DTD.

vero

5+ Year Member



 
Msg#: 3798217 posted 8:28 pm on Dec 2, 2008 (gmt 0)

Excellent points - (which is why you are the moderator and I am the lowly junior member)
Thanks again!

httpwebwitch

WebmasterWorld Administrator httpwebwitch us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3798217 posted 5:18 am on Dec 3, 2008 (gmt 0)

no problem vero, I'm glad to help

Go over to New To Web Development [webmasterworld.com] and pay it forward

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / XML Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved