Welcome to WebmasterWorld Guest from 50.19.34.234

Forum Moderators: incrediBILL

Message Too Old, No Replies

MIME types, text/xml and Firefox

Am I misunderstanding the spec?

     

encyclo

3:28 am on Jan 28, 2006 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



OK, so I have a simple XML document:

<?xml version="1.0" ?>
<element>content</element>

I serve the file with a MIME type

text/xml
over HTTP and view the result with Firefox. I have not specified any character encoding either via HTTP or within the document and there is no BOM.

What should the charset be? On reading RFC 3023 [ietf.org], I get the impression it should be US-ASCII. Only if I serve the file as

application/xml
should it be UTF-8. However Firefox considers it to be UTF-8 even with
text/xml
. So who is wrong, Firefox or me?

confuzed2

4:38 am on Jan 28, 2006 (gmt 0)

10+ Year Member



Anne van Kesteren has an explanation on his weblog. Perform a google search on the following text: "text/xml is seriously broken over HTTP". Be sure and read the comments.

HTH,
CK

encyclo

4:56 pm on Jan 28, 2006 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Thanks confuzed2, I'm aware of that article, but I can't find any meaningful specification or explanation for Firefox's behavior. Usually Mozilla prides itself on being standards-compliant, especially for its XML parser, so I would be surprised if the choice of serving a
text/xml
document as UTF-8 was accidental.

confuzed2

6:41 pm on Jan 28, 2006 (gmt 0)

10+ Year Member



If the 3023 and text/xml discussions in Bugzilla don't help, I'm at a complete loss. Please let us know what you find out.

Thanks,
CK

iamlost

9:31 pm on Jan 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RFC 3023 is (one of) the most ignored/violated RFCs by Internet software. Many (most?) ignore the headers looking directly to the interior XML encoding defaulting to utf-8 if none.

This is for a very practical 'real world' reason: US-ASCII is a very small part of the real world. Consider KO18-R (or Big5 or utf-16 or...) encoded xml sent via http specifying text/xml but without charset ... an RFC 3023 compliant parser would serve up US-ASCII glop. These products simply decided that a utf-8 default will cause fewer complaints than US-ASCII and so informally 'revised' the standard.

I do not know if FF behaves this way or for this reason but if so it would certainly be with the majority.

encyclo

1:44 am on Jan 31, 2006 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I do not know if FF behaves this way or for this reason but if so it would certainly be with the majority.

I found a few Bugzilla conversations and a few submitted patches, so I get the impression that later Firefox versions are going to implement RFC 3023 more strictly.

I found the best answer to my question in an article from XML.com which is rather well-titled XML on the Web Has Failed [xml.com]. RFC 3023 dictates the primacy of HTTP over an internally-declared charset, but real-world implementations mean that that primacy has to be ignored simply due to the sheer number of feeds which would be considered ill-formed if RFC 3023 was followed to the letter. A real eye-opener.

I was aware of the strong recommendations never to use

text/xml
for anything, but the reasons why are much clearer to me now!
 

Featured Threads

Hot Threads This Week

Hot Threads This Month