Welcome to WebmasterWorld Guest from 54.163.115.193

Forum Moderators: httpwebwitch

Non-unicode characters in XML syntax

   
2:33 pm on Oct 25, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A well-intentioned attempt to make XML less exclusive to certain ethic groups actually risks causing breakage for those it's intended to help.

XML co-inventor Tim Bray and others have raised a last-minute objection to the planned XML Fifth Edition working its way through the World Wide Web Consortium (W3C). They say it could make it harder to program with or parse some legacy XML documents.

"programmers writing in scripts such as Amharic or Cherokee, which have been added since then [1998, when XML 1.0 was created], can't use their characters in tag or attribute names."

source [theregister.co.uk]

also see Tim Bray's reaction [tbray.org]

1:33 am on Oct 26, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



the point here is that Unicode is constantly growing (like, the more recent addition of characters used to write Cherokee and Amharic), but the character set allowed in tag, entity and attribute names in XML does not. XML5 plans to remedy that by bringing the XML spec in gear with Unicode. However, as Tim points out:
the change introduces an inconsistency between XML 1.0 and XML Namespaces 1.0, which is intolerable. They have to be either revised together or not at all.
source [tbray.org]
12:29 pm on Oct 27, 2008 (gmt 0)

WebmasterWorld Administrator coopster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Some interesting responses in there in regards to the whitespace characters, especially from mainframe/midrange programmers. The comments seem a tad off topic to me though, unless I am missing the connection?
1:33 pm on Oct 27, 2008 (gmt 0)

WebmasterWorld Administrator httpwebwitch is a WebmasterWorld Top Contributor of All Time 10+ Year Member



yeah I agree, the comments do go a little off the rails.

That whitespace IBM episode seems like a sore topic among XML RFCgazers. Keep in mind the type of personality who pays attention to all the granular details of XML specs - that's the same personality to whom those things would matter, a LOT.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month