homepage Welcome to WebmasterWorld Guest from 54.227.20.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / XML Development
Forum Library, Charter, Moderators: httpwebwitch

XML Development Forum

    
HTML entities within XSL stylesheet UTF-8 - arrgh!
Help me somebody
GordonS

5+ Year Member



 
Msg#: 331 posted 11:37 am on Apr 30, 2006 (gmt 0)

I am attempting to use ASP.NET to transform an XSL stylesheet and XML data document into XHTML.

The problem is that no matter what I try, HTML entities such as the non-breaking space   are displayed properly in the browser, but in the source code, they are munged and appear as ┬.

This wouldn't matter except Google also sees these munged source characters, causing our Google listings to look like garbage.

If I look at the source code in Firefox, it's fine. It's only a problem when viewed through Internet Explorer, Opera or Google.

Why is this and how do I solve it?

Here's an example input XSL:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="no" method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

<xsl:template name="page">
<html>
<body>
space&#160;here
</body>
</html>
</xsl:template>
</xsl:stylesheet>

And here's the output source, copied from Internet Explorer View Source:

<html><body>
space here
</body></html>

Anybody seen this and what did you do?

 

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 331 posted 5:27 pm on Apr 30, 2006 (gmt 0)

I'm not too familiar with XSL, but it looks like the parser is recognizing and interpreting the
&#160; as a non-breaking space and printing the literal character rather than keeping the entity reference. However, you have an underlying encoding problem: the document seems to be UTF-16 not UTF-8 - the is a giveaway as if you read a (two-byte) UTF-16 document in a non-UTF-16-aware user agent (Opera, IE, Google, but not Firefox) then the represents the first byte of the two-byte character.

Are you able to define the output charset? If so, it needs to be ISO-8859-1 or (better) UTF-8.

macrost

10+ Year Member



 
Msg#: 331 posted 3:16 pm on May 3, 2006 (gmt 0)

Give this a shot. I know it's not the most elegant way to accomplish this, but it should get you in the right direction.

<xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

GordonS

5+ Year Member



 
Msg#: 331 posted 7:40 pm on May 4, 2006 (gmt 0)

Thanks, that does actually work.

What is baffling me though is why Google, Opera etc seem to be seeing my files as UTF-16 when they are clearly UTF-8 encoded - I'm sending UFT-8 Content-Type headers, I have UTF-8 declared in the XML and XSL templates, I have saved all my files in UTF-8 format. And yet they still seem to be coming out the other end as UTF-16.

I am at no point using ASP.NET strings - just an XslTransform and XmlDocuments.

I even see the BOM (Byte Order Mark) at the start of my file, as shown below:

´╗┐<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><body>
space&#187;here
</body></html>

Totally baffled - any advice much appreciated.

G.

mrMister

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 331 posted 2:23 pm on May 8, 2006 (gmt 0)

Change your xml:output to this.

<xsl:output encoding="utf-8" indent="no" method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

ExecWrkn

5+ Year Member



 
Msg#: 331 posted 8:25 pm on May 17, 2006 (gmt 0)

You could also try

<!DOCTYPE names [
<!ENTITY nbsp "&#160;">
<!ENTITY quote "&#34;">
]>

just above the <xsl:stylesheet> declaration

and then use &nbsp;

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / XML Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved