Forum Moderators: open

Message Too Old, No Replies

HTML entities within XSL stylesheet UTF-8 - arrgh!

Help me somebody

         

GordonS

11:37 am on Apr 30, 2006 (gmt 0)

10+ Year Member



I am attempting to use ASP.NET to transform an XSL stylesheet and XML data document into XHTML.

The problem is that no matter what I try, HTML entities such as the non-breaking space   are displayed properly in the browser, but in the source code, they are munged and appear as Â.

This wouldn't matter except Google also sees these munged source characters, causing our Google listings to look like garbage.

If I look at the source code in Firefox, it's fine. It's only a problem when viewed through Internet Explorer, Opera or Google.

Why is this and how do I solve it?

Here's an example input XSL:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="no" method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

<xsl:template name="page">
<html>
<body>
space&#160;here
</body>
</html>
</xsl:template>
</xsl:stylesheet>

And here's the output source, copied from Internet Explorer View Source:

<html><body>
space here
</body></html>

Anybody seen this and what did you do?

encyclo

5:27 pm on Apr 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not too familiar with XSL, but it looks like the parser is recognizing and interpreting the
&#160;
as a non-breaking space and printing the literal character rather than keeping the entity reference. However, you have an underlying encoding problem: the document seems to be UTF-16 not UTF-8 - the
Â
is a giveaway as if you read a (two-byte) UTF-16 document in a non-UTF-16-aware user agent (Opera, IE, Google, but not Firefox) then the
Â
represents the first byte of the two-byte character.

Are you able to define the output charset? If so, it needs to be ISO-8859-1 or (better) UTF-8.

macrost

3:16 pm on May 3, 2006 (gmt 0)

10+ Year Member



Give this a shot. I know it's not the most elegant way to accomplish this, but it should get you in the right direction.

<xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

GordonS

7:40 pm on May 4, 2006 (gmt 0)

10+ Year Member



Thanks, that does actually work.

What is baffling me though is why Google, Opera etc seem to be seeing my files as UTF-16 when they are clearly UTF-8 encoded - I'm sending UFT-8 Content-Type headers, I have UTF-8 declared in the XML and XSL templates, I have saved all my files in UTF-8 format. And yet they still seem to be coming out the other end as UTF-16.

I am at no point using ASP.NET strings - just an XslTransform and XmlDocuments.

I even see the BOM (Byte Order Mark) at the start of my file, as shown below:

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><body>
space&#187;here
</body></html>

Totally baffled - any advice much appreciated.

G.

mrMister

2:23 pm on May 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Change your xml:output to this.

<xsl:output encoding="utf-8" indent="no" method="xml" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

ExecWrkn

8:25 pm on May 17, 2006 (gmt 0)

10+ Year Member



You could also try

<!DOCTYPE names [
<!ENTITY nbsp "&#160;">
<!ENTITY quote "&#34;">
]>

just above the <xsl:stylesheet> declaration

and then use &nbsp;