Welcome to WebmasterWorld Guest from 54.167.102.69

Forum Moderators: incrediBILL

Message Too Old, No Replies

MIME types, text/xml and Firefox

Am I misunderstanding the spec?

     
3:28 am on Jan 28, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9063
votes: 2


OK, so I have a simple XML document:

<?xml version="1.0" ?>
<element>content</element>

I serve the file with a MIME type

text/xml
over HTTP and view the result with Firefox. I have not specified any character encoding either via HTTP or within the document and there is no BOM.

What should the charset be? On reading RFC 3023 [ietf.org], I get the impression it should be US-ASCII. Only if I serve the file as

application/xml
should it be UTF-8. However Firefox considers it to be UTF-8 even with
text/xml
. So who is wrong, Firefox or me?
4:38 am on Jan 28, 2006 (gmt 0)

New User

10+ Year Member

joined:Aug 30, 2004
posts:4
votes: 0


Anne van Kesteren has an explanation on his weblog. Perform a google search on the following text: "text/xml is seriously broken over HTTP". Be sure and read the comments.

HTH,
CK

4:56 pm on Jan 28, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9063
votes: 2


Thanks confuzed2, I'm aware of that article, but I can't find any meaningful specification or explanation for Firefox's behavior. Usually Mozilla prides itself on being standards-compliant, especially for its XML parser, so I would be surprised if the choice of serving a
text/xml
document as UTF-8 was accidental.
6:41 pm on Jan 28, 2006 (gmt 0)

New User

10+ Year Member

joined:Aug 30, 2004
posts:4
votes: 0


If the 3023 and text/xml discussions in Bugzilla don't help, I'm at a complete loss. Please let us know what you find out.

Thanks,
CK

9:31 pm on Jan 28, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 25, 2003
posts:889
votes: 56


RFC 3023 is (one of) the most ignored/violated RFCs by Internet software. Many (most?) ignore the headers looking directly to the interior XML encoding defaulting to utf-8 if none.

This is for a very practical 'real world' reason: US-ASCII is a very small part of the real world. Consider KO18-R (or Big5 or utf-16 or...) encoded xml sent via http specifying text/xml but without charset ... an RFC 3023 compliant parser would serve up US-ASCII glop. These products simply decided that a utf-8 default will cause fewer complaints than US-ASCII and so informally 'revised' the standard.

I do not know if FF behaves this way or for this reason but if so it would certainly be with the majority.

1:44 am on Jan 31, 2006 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9063
votes: 2


I do not know if FF behaves this way or for this reason but if so it would certainly be with the majority.

I found a few Bugzilla conversations and a few submitted patches, so I get the impression that later Firefox versions are going to implement RFC 3023 more strictly.

I found the best answer to my question in an article from XML.com which is rather well-titled XML on the Web Has Failed [xml.com]. RFC 3023 dictates the primacy of HTTP over an internally-declared charset, but real-world implementations mean that that primacy has to be ignored simply due to the sheer number of feeds which would be considered ill-formed if RFC 3023 was followed to the letter. A real eye-opener.

I was aware of the strong recommendations never to use

text/xml
for anything, but the reasons why are much clearer to me now!
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members