Forum Moderators: open

Message Too Old, No Replies

Squeezing more juice from the XML fruit

XML content management, structure & semantics

         

aspr1n

7:02 pm on Nov 4, 2002 (gmt 0)

10+ Year Member



Hi all,

This post is as much about me thinking out loud, as it is about a specific problem I have, so I hope I'm not reinventing the wheel too much here, and would certainly like to here the community's thoughts on this.

I have noticed a number of topics on content management of XML (some Qs posted by me), and have seen a number of interesting replies, specifically dealing with storage of xml/xhtml in databases.

I've seen a number of "neat tricks", to deal with tags in table fields, specifically trying to deal with the issue of non techies creating content, and still expressing a document structure and style. For example programmatically stripping or adding the <p> tag to paras through a PHP function and the such like.

The conventional wisdom (pls correct me here!), seems to be for example, a table field for my doc title, another for perhaps page heading & keywords, another for the body content, all combined at runtime to create a usable, validating XML document.

However, it strikes me this approach it somewhat denigrating what XML was and is fundamentally all about, not just expressing document structure and semantics but the relationships between them.

A well structured XML document should surely "stand on its own", by denigrating XML to fields in a database, we loose all the document structure, until runtime, when hopefully it is recombined correctly. This is emphasised by the "neat tricks" for adding and removing xhtml tags, they are there to provide document semantics, not as an annoyance to be removed for something better.

By storing <tag less> "XML" (AKA text) content in "logical" chunks in fields, all we are doing is trying to describe the document with a database field rather than the XML itself - "field1" is for doc titles, "field2" is for meta keywords, "field3" is for H1 headings and so forth.

When in actual fact what we should be doing is storing the document as a single XML entity, where the content describes itself completely. I think this is a hang over from the DB experts who traditionally are used to splitting up data streams into logical chucks to describe the information, rather than letting the data describe itself.

The db experts would no doubt argue about indexing and quantifying the content etc, but that is what the XML structure is there to do. <h1> is there to describe the most important heading, we don't need a database field to do that for us, apart from the fact that in 6 months time it might become an <h2> or <p> instead.

If you want to index for keywords, that's what <meta keywords> is there for, or <meta description> for descriptions etc, by ignoring these we loose the structure and idea of the document as a single entity and thus the entire point of XML in the first place. I am not suggesting that storing XML in dbs is of no value, just that it is there to store and provide the content not to describe it, as far as the db's concerned, it should just be a lump of text.

This then brings me neatly back to my thoughts on content management. I think what we as developers (content providers etc) really need to do is expose our CSS styles to the content creator for usage, so it can be stored as an integral part of the document. This is where a database really might help.

Imaging providing a "user.css", that describes styles allowed to be applied by a user to a particular portion of content, now imagine all those styles are stored in a db. The "user.css" can be dynamically created - "custom built" at runtime according to the general rights of the user, and their specific rights to a document template.

Who needs MS Word, when all this can be deployed and built on the fly?

Ironically, I wrote a little app in 1999 almost identical to the link below, to do just this because I thought it was cool and might be useful, without realising why it really was so powerful. Check out this link and imagine a user instead of applying html tags, applying styles from their dynamically created "user.css" built according to the content type they are creating, then saving the now complete XML doc into a single database field.

[uk.f116.mail.yahoo.com...] (unfortunately only seems to work properly in IE)

This I would suggest is the answer - at least in part, to the content conundrum.

Answers on a postcard please…..

asp

lorax

7:09 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Interesting but what about a true XML database combined with XSLT?

andreasfriedrich

7:48 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Add an XML editor that uses XSL and you have the UI. This is a pretty good solution for larger documentations, books, longer articles, etc.

For other data driven applications (catalogues, booking systems, ecommerce solutions, directories) I would stick with a RDBMS and a HTML forms based UI.

I donīt like editors like the one you pointed to or editors based on IEs Active-X editing component. They donīt go well with the idea of XML representing structure not layout.

As for the argument of non-techies adding content I would expect them to be able to fill out HTML forms and if neccessary to learn some simplified markup to markup the logical structure of the text they are entering. In the XML editor approach an author would certainly know when he wants to write a heading, paragraph, code example, add a picture, etc. Otherwise he canīt be a very good author.

Imaging providing a "user.css", that describes styles allowed to be applied by a user to a particular portion of content

I donīt even want to think of such a horror scenario. All I ever want the user to be able to to is do logical markup. Having users play around with styles just keeps them from doing other more productive work.

Andreas

lorax

8:06 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>Add an XML editor that uses XSL and you have the UI.

I was looking into XMetal and XMLSpy not too long ago. One is for programmer types and the other more applicable to users as a tool to add content.

Re: XML db. Best for static content as the structure is inherently verbose (the nature of XML). If content is to be changed often, then I agree with Andreas - a relational DB is the next best option.

aspr1n

12:41 am on Nov 5, 2002 (gmt 0)

10+ Year Member



I'm not arguing that there's a problem storing in an RDBMS, just don't split the XML content across relational tables. As for queries etc, for performance, really we should be using SAX2 (Apache Exerces has an excellent SAX interface)

There's no reason why you'd need to use IEs Active-X editing components in a browser UI, javascript works fine.

There's also no nothing to suggest that using a GUI UI editor harms XML's content/structure, I use DW-MX quite a bit - all my code validates fine ;-)

Regarding Andreas concern over "users", and to clear it up a bit further:

Using an LDAP directory I authenticate myself as user 'xyz'. I need to send a fax to a company.

A call up the "fax template", my user.css is built from an RDBMS via my authentication tokens provided via LDAP. I am allowed within the template to use the "Verdana h3 heading" (red or black), I am allowed to apply <strong> and <em>.

I type the document, apply the styles I'm permitted to use and save the document. The XML content is stored in a DBMS, with a hard reference to the "fax template". As I save, a javascript prompt requests <meta description> and <meta keywords>, my LDAP authentication handles <meta author> information in the background.

Document users apply styles, template managers, can manipulate a,b & c styles, template owners can maniplate, create & delete templates.

The process we've just created is no different to millions of Microsoft Word users complete everyday, only now the content and structure is separate and stored in a db, and most importantly of all, we have empowered the user, and remained in complete control.

Now add to that the price of an MS Office license, and the fact that most people use a miniscule portion of Word functionality, and this runs on any platform, anywhere in the world.

...the killer app ;-)

asp