Welcome to WebmasterWorld Guest from 54.196.233.208

Forum Moderators: httpwebwitch

Message Too Old, No Replies

Non-unicode characters in XML syntax

     
2:33 pm on Oct 25, 2008 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


A well-intentioned attempt to make XML less exclusive to certain ethic groups actually risks causing breakage for those it's intended to help.

XML co-inventor Tim Bray and others have raised a last-minute objection to the planned XML Fifth Edition working its way through the World Wide Web Consortium (W3C). They say it could make it harder to program with or parse some legacy XML documents.

"programmers writing in scripts such as Amharic or Cherokee, which have been added since then [1998, when XML 1.0 was created], can't use their characters in tag or attribute names."

source [theregister.co.uk]

also see Tim Bray's reaction [tbray.org]

1:33 am on Oct 26, 2008 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


the point here is that Unicode is constantly growing (like, the more recent addition of characters used to write Cherokee and Amharic), but the character set allowed in tag, entity and attribute names in XML does not. XML5 plans to remedy that by bringing the XML spec in gear with Unicode. However, as Tim points out:
the change introduces an inconsistency between XML 1.0 and XML Namespaces 1.0, which is intolerable. They have to be either revised together or not at all.
source [tbray.org]
12:29 pm on Oct 27, 2008 (gmt 0)

Administrator

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:July 31, 2003
posts:12533
votes: 0


Some interesting responses in there in regards to the whitespace characters, especially from mainframe/midrange programmers. The comments seem a tad off topic to me though, unless I am missing the connection?
1:33 pm on Oct 27, 2008 (gmt 0)

Moderator This Forum from CA 

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 29, 2003
posts:4059
votes: 0


yeah I agree, the comments do go a little off the rails.

That whitespace IBM episode seems like a sore topic among XML RFCgazers. Keep in mind the type of personality who pays attention to all the granular details of XML specs - that's the same personality to whom those things would matter, a LOT.