|XML is 10|
10th anniversary of xml, and it hasn't aged a day
In 1999, I bought an XML manual. Apart from describing a schema-like validation syntax that didn't catch on, it's as accurate today as it was then. XML is solid, stable, and ... inert. In an industry where we expect technologies to change and innovate, the stability of XML is psychologically disappointing.
In other words, XML book sales are in the toilet.
Tomorrow, the XML specification [w3.org] will be 10 years old. In those 10 years, XML has seen some fascinating spinoff innovations, but the spec itself has remained unchanged. There is no XML 2.0, 3.0, nor does it seem there's any need or desire for one.
XML has had a busy decade. XML inbred with its cousin HTML to produce the uptight offspring we know as Strict XHTML. XML begat RSS, the primary format for sharing news headlines. XPATH was created as an essential language for finding nodes, and XSLT became XML's "killer app", easing the transformation from XML into other formats. .NET uses XML extensively for config settings. XML became the language of choice for REST APIs.
There hasn't been an innovation in XML in years - only different technologies that use XML in new ways. The most interesting news coming out of the XML world is when another technology tries to do what XML does, resulting in headlines like "XYZ - Better than XML?"
I assert that the simplicity and extensibility of XML led to its ubiquity; and being a simple language, it doesn't need to innovate. XML is the Baking Soda of the web. It has a million uses, but the world doesn't need a new and improved baking soda.
Marking XML's 10th anniversary with a look into its past and future, the horizons look pretty flat in both directions... and solid. You can build your megaplatform on the bedrock of XML, because it's not going anywhere, and you'll never have to upgrade.
I worked with a nightmare called X.409 [portal.acm.org], which was sort of a "flexible container" like XML.
I don't miss it one teeny bit.
I believe in XML. I consider it to be a data transfer medium, and it does that extremely well. The "X" in XML is what it's all about.
The wheels often come off when we try to shoehorn it into programming languages and data manipulation syntaxes.
I believe in XSLT [w3schools.com] as well, but it is a very specific tool, meant for very specific purposes. I think it suffers when people try to shoehorn it into other uses.
I think XSLT, and, to a lesser extent, XML, have been miserably served by their community. The documentation is often unreadable (I have learned to throw the book out after a few intro pages and go it alone. It took a long time for me to come to that conclusion).
There are a great many very intelligent and creative people in the XML community that are so abstracted from the practical implementations of their work that they are well-nigh useless. XSLT is a perfect example. The books universally stink. They have all the information, and they are technically accurate, but they simply can't convey their content in a useful fashion. Almost every example I see out there is based on XSLT 2 [w3.org] and XPath 2 [w3.org], yet there are no practical, free and open source implementations of XSLT/XPath 2. Indeed, the author of much of the standard sells his XSLT 2 processor. XSLT 1 is a widely-available standard (libxslt [xmlsoft.org]), and is even embedded in PHP 5 [us2.php.net].
I remember being in a conversation about this with someone in the XSLT community, and mentioned how important the PHP implementation was to people "getting" XSLT, and was stunned by their response. They simply didn't care. PHP is an "amateur" language, and they were actually very much against it being implemented that way, as it would encourage "unqualified" people to start using XSLT.
This was pretty much exactly the attitude of the Internet community when The September that Never Ended occurred. The "old guard" were horrified, and still, to this day, complain about it.
If AOL hadn't opened to the 'Net, the Internet would still be a geeky footnote.
Sometimes, you just need to open the gates and let the peasants come in...
XML PHP [us2.php.net]
While we're at it...
PHP DOM [us2.php.net]
PHP JSON [us2.php.net]
SimpleXML is only good for very very small pieces of XML. I had to use it recently and it kills performance, you have to use a sax style parser instead.
What we really need is a binary format which will use less memory, disk and network overhead. People always dismiss it because it makes the xml non-human readable, but XML over a couple of meg without line breaks is not human readable anyway. The tools could always make the binary format transparent so that the text can be dumped out if needed.
I don't think of XML as "human readable." However, text compresses (and encrypts) very well, is a universally-recognized, cross-platform and cross-language standard and needs no special transport mechanisms.
httpwebwitch makes a point that I'd never considered before: XML is the only standard that I've ever seen that hasn't changed; even in two years, let alone ten.
Wow. That's pretty heavy.
By the way, X.409 was a binary format. You laid it out in text, then "compiled" it into binary.
What about HTTP 1.1? Or FTP? or POP/SMTP? Even HTML has barely changed over the last 10 years, HTML 4 certainly hasn't seen any improvements. XHTML is just HTML4 with a strict XML syntax so it doesn't count.
Lets not even talk about ASCII or CSV ;)
Its unlikely that we would need extra features in XML since its entire purpose is to be extensible in itself (in the same way we can extend HTTP with extra X- headers). We have seen many many changes in technologies which are used to process XML (XSLT, XPath, XQuery, XLink etc etc) and in specific formats (XHTML, XForms, RDF, RSS etc).
P.S. Compressing XML whilst in transit does not matter because it will need to be inflated before it is parsed so takes up a lot of CPU and memory, a binary XML format would not need that because it would be parsed much more quickly and with less RAM.
happy birthday, XML!
My knowledge of XML is limited to the basic syntax, but I have to say that as a data storage format it is superb (and I am a very difficult guy to impress).
I appreciate that there can be issues with speed and size, but for any specific purpose, a compiler can be produced very easily to convert to a binary format (and a decompiler to do the reverse).
I'm only using XML in one application at the moment - in that, I read the data into a heirarchical data structure in a single pass, and then read the data from that structure as and when it is required. When I wrote the code I was concerned that it might be a tad slow, but it's great. I even ensured that the saved output is beautifully formatted so that it can be hand-edited should the need arise.
As others have pointed out, XML is not unique in reaching 10 years without change. I've just been studying UTF-8 which is a little older I think. Again, it is a near-perfect solution (for character encoding). If Windows used UTF-8, I could use one simple/fast set of string comparison functions instead of three (two of which are somewhat slower).
|XSLT is a perfect example. The books universally stink. |
Translated: of the books you personally read, none were well-suited to your needs, style of learning, or both.
Michael Kay's "XSLT 2nd Edition" has perrenially handled nearly all my XSLT questions. The XPath section is poorly organized and handled, but a quick google that includes the term "Michael Kay" usually takes me right to the answer in those cases. I've written a large amount of XSLT, and Kay's book has answered most of the questions.
A technology only interests me when it affects me. XHTML has.
About the only time I can remember using xml directly was after being invited to a Google Beta where they made me provide them data as xml. It was a total pain and unnecessary. A simple flat file would have been much easier to produce and do the trick.
Oh, and if you dig down into vbulletin, you see a well thought out use of xml. That was the first time I really understood its power...
I'm not saying that xml isn't great. Just that I don't think it directly hits many of us very often. Only indirectly.
I bet most people will just gloss over this thread because it goes over most people's heads...
|Michael Kay's "XSLT 2nd Edition" |
a.k.a. "the Big Red Book". If you work in XSLT every day, as I do, it's your dog-eared bookmark stuffed companion. For all its worthiness, half of what is in the Big Red Book is stuff I can't use, because even newfangled PHP5 doesn't support XSLT2.
It's no coincidence that Kay's name is attached to the only XSLT manual worth having. He invented XSLT. Quite a unique individual: mild-mannered, polite in correspondence and very analytical minded. I admire his work.
Besides the Big Red Book, there's a very active email list group called Mulberry (also hosted by Kay et al), and the occasional blogger posting nifty transformation techniques.
Documentation support for XSLT really isn't that bad... The real loser in the XML world is XSD Schema. I have a total hate-on for Schema, because it's difficult to learn, impossible to find good resources, and it has baffling limitations, which arise in the XML forum with surprising regularity. (along the lines of, "it can do A, B, C, E, and F... what about D? Can't I do D with Schema? And why can't it do B and C at the same time?")
One of my first experiences with XML was creating little snips of data to be fed into Macromedia Flash. That plugin had a nice built-in XML parser. Flash made me learn XML.
All data formats have their pros and cons... CSV, JSON, SQL, binary - all are good at what they do. It's silly to argue whether XML is better than or worse than other carriers; It's all situational. However I've found that since becoming intimate with the gamut of XML technologies, I've found myself using XML for all sorts of things where I might have otherwise used SQL or CSV or plain TXT.
|Translated: of the books you personally read, none were well-suited to your needs, style of learning, or both. |
|Michael Kay's "XSLT 2nd Edition" |
You mean the XSLT 2 book [amazon.co.uk]?
I have a subscription to O'Reilly Safari. It's not in there. If it's as good as you say, I'll get it. I know who Kay is. I've had dealings with him, and agree with your assessment. We got Saxon SA [saxonica.com] for our infrastructure system, then didn't use it, because PHP plays such a huge role for us.
I think I mentioned, somewhere, that all the help and documentation are for XSLT 2, which, combined with $2.50 will get me a Venti Latté at Starbucks. The same goes for the mulberry list [mulberrytech.com], which does have the XSLT "heavy hitters" on it. I've learned not to ask XSLT questions on that list, because I get tired of being told XSLT 2 answers.
I'm not kidding. The XSLT community has completely abandoned XSLT 1, and has absolutely no patience whatsoever for people who use it.
We actually use XSL-FO and XSLT in my "day job" quite a bit, and XML is a "Swiss army knife" for us.
I agree with httpwebwitch's complaint about Schema, but I've run into similar issues with XSLT 1 (which is why there is an XSLT 2).
It would be great to have some company over in the XML Forum [webmasterworld.com]. I certainly don't claim to be an expert. I'm just a poor schlub who had to larn this here new-fangled X-stuff on my lonesome, and have some compassion for others in the same boat. I'm thrilled to have someone as experienced as httpwebwitch over there.
There's a great, funny, poignant article by Tim Bray about the people who worked on XML: XML People [tbray.org]
|...Jon [Bosak] decided that TimBL’s W3C should make SGML happen on the Web. The W3C didn’t see it that way and ignored him; since he wouldn’t shut up and go away, they told him sure, he could launch an “SGML on the Web” activity if he did all the work, but to shut up and go away while he did it. |
We’ll jump a little bit ahead in the story here; Jon’s SGML-on-the-Web eventually became known as “Extensible Markup Language” (XML for short) and is now a certifiably Big Deal on the Internet...
There's really not much to XML, itself. It's tags someone else might understand, and want you to use, closing tags or slashing empties, and namespaces. With HTML, you have to learn what all the tags do, and where they might go. XML is just - you name the tags. End of book.
As for XSLT, this is what helped draw people into XML. It's slow. But it works beautifully - with-the-grain as it were in creating HTML, also XML, text, even pdf (xsl-fo). It is useful to separate out the data collection from the page generation. But there's typically a lot of text. Text handling can be slow. XSLT tends to be slow.
In addition, XSLT gurus can, only some of them, seem to get on a purist jag where precise definitions only will do, needlessly complicating things for those who are using the namespace commands (elements) and arguments (attributes), properly. And XPath. What is the context is often the question during execution of an XSLT script. What IS position() #1? and so on.
But it's okay that you have a choose and else and otherwise for every if/else conditional. But that verbosity becomes a problem when data is transmitted, lots of it. Proprietary formats were likely pretty efficient, without a lot of duplication. So first, not in elements, but in attributes. You have to use the attribute name every time within an element. And if attributes in the source and transmitted as elements in the XML, you still have the duplication. XML is very verbose in that way. JSON does make more sense, and solves the universality problem. Like XML, it's not some proprietary format.
I think if it hadn't been for M$ adoption of Xpath/XSLT, and for providing its xml/xslt v1 program modules, that XML would have been another of these that didn't take, like the latest from ECMA, perhaps we'll see what comes in as the 'new' HTML.
Not if you use PHP 5's [us3.php.net] built-in libxslt [xmlsoft.org] implementation.
In my applications, it's stunningly fast. It's much faster than the pure PHP stuff, and it has actually been so zippy that I deliberately do stuff in XSLT that I could do in PHP.
However, libxslt is XSLT 1 [w3.org], and the XSLT GURUs (Good Understanding, but Relatively Useless) don't think of either PHP or XSLT 1 as worthy of consideration.
I don't know how MS' implementation goes, but I've heard it's also very fast.
MS' .NET implementation (C#) is blazingly fast. I once had my hands all over a C#.NET platform which used XSLT extensively; it handled *millions* of transformations per day without complaint, and easily handled huge bursts of traffic without any noticeable lag. In fact the XSLT transformations were usually the fastest proc in the engine; getting the XML always took 1000% longer (a fake stat) than transforming it for output.
The key to fast XSLT is to keep your XSLT transformations on the server, not in the client. Client-side XSLT is sweet when you're delivering content for consumption as XML, so it looks pretty in a browser. But for most applications of XML as a data carrier, the interface is HTML and the XML/XSLT layer should be hidden in a rendering proc before HTML output is sent to the client. One technique that speeds things along is caching the compiled XSLT as an object, rather than parsing the text version each time it's used.
As I gain more experience with JSON and XML, I am gaining clarity into the strengths and usefulness of each. Especially in AJAX applications, JSON is handling more of the problems that I used to solve with XML. But there are still lots of situations where XML trumps a JSON implementation.
I disagree with the assertion that XML wouldn't have "taken" without XSLT. I was using XML for lots of things before my first experimental steps into XSLT just a few years ago. But I acknowledge that now that XSLT is in my toolbox, it's become so ubiquitous I rarely see any XML without immediately thinking of how my app could/should use XSLT to transform it.
> where XML trumps a JSON implementation. <
Because of the related software, or because XML actually is better for storing the info than JSON? XML is verbose. Thus, JSON. XSLT is verbose, as well, for the same reason (though it's what I use to generate pages for various sites). What data would be better represented in XML?
> compiled XSLT <
I don't know if that's possible with mxsml. But again, I don't generate the pages realtime. They are generated, then uploaded. I'm tied to msxsl because I use extensions. Those calls may - additionally - slow things down.
Happy Birthday XML !
And welcome Protocole Buffer !
And thanks Httpwebwitch for keeping us updated.
I am just starting and still have difficulties to choose between XML/XSL, JSON, XML/DOM... Or may PB ?
tomda, that's a big topic for a new thread - so let's start one!
Too late, I have flagged this thread :)
|choose between XML/XSL, JSON, XML/DOM... Or may PB |
There's a saying in the US, that if your only tool is a hammer, all your problems look like nails.
I firmly believe that one of the biggest problems people have with XML is that they try to use it for the wrong things. Httpwebwitch points that out in his initial posting. XML has a purpose, and there are better standards for more specific needs. I tend to like JSON for my AJAX work. I could use XML, but that would be awkward. However, I prefer XML for my behind-the-scenes server-to-server data transfers. JSON would be completely unsuitable for that.