HTML vs. XHTML

Forum Moderators: open

Message Too Old, No Replies

HTML vs. XHTML

What are the advantages of one or the other?

MatthewHSE

1:20 am on Aug 17, 2003 (gmt 0)

I've heard a lot of conflicting information about XHTML and HTML lately. Some say it's easier to make your pages cross-browser compatible if you use XHTML; others say XHTML causes compatibility problems. Some have said search engines have trouble with XHTML, while others are saying that XHTML is cleaner code, which the SE's like.

Suffice to say, I'm confused. I don't know enough about them to make an intelligent choice myself. I'd sure like to hear what other people have to say in defense of (or against) each method!

MonkeeSage

1:47 am on Aug 17, 2003 (gmt 0)

HTML is the predecessor of XHTML. HTML is at v. 4.01, XHTML is at v. 1.1.

The main differences are:

- Some HTML tags are deprecated in XHTML ( for example).
- XHTML is more "complete" in terms of syntax (i.e., no hanging tags -- all tags must be either self-closed, e.g., <img ... /> or closed normally, e.g., ).
- XHTML can be served as either XML or HTML mime-type, HTML can only be served as HTML.

Either one works just as well (in my experience) if proper DTDs are used.

Jordan

tedster

3:19 am on Aug 17, 2003 (gmt 0)

...if proper DTDs are used

Here's where I see a crucial differentiation, and one that is very seldom discussed -- the difference between using a transitional DTD and using strict DTD. That is very important, no matter whether you are writing XHTML or HTML. Learning to write strict code is the big deal.

Since XHTML 2 will be a whole new ballgame, the fact that you use XHTML currently is not all that important. But the fact that you build knowledge and skill with strict code rather than transitional code - that IS a big deal. A transition from strict HTML to strict XHTML is relatively painless.

We're all in for a ride transitioning to XHTML 2.0, no matter which way we code right now - but strict DTD's will take your skills farther in the right direction.

NickH

10:34 am on Aug 17, 2003 (gmt 0)

Having just converted my site from HTML 4.01 Transitional to XHTML 1.0 Strict, I concur with MonkeeSage and tedster that the transition to Strict is the significant one. Apart from the issue of certain tags and attributes being unavailable, you also need to ensure inline content is enclosed in an appropriate block-level container.

Interesting authoritative reference: The Web's future: XHTML 2.0 [www-106.ibm.com]

Nick

BjarneDM

9:07 am on Aug 19, 2003 (gmt 0)

The problem with XHTML 1.1 is that it's incompletely supported under IE6 that just treats XHTML as an esoteric variation on HTML. And IE6 simply *cannot* accept pages served as xhtml.

These problems are documented here:
[xml.com...]
[b-spoke.de...]
[hixie.ch...]

The *only* browser capable of treating XHTML as XHTML with the correct content-type is Mozilla, and boy - is that one stict! The smallest of errors and you just get an error-message. I've done a XHMTL project at the UNI, and the requirements were : 1) XHTML 1.1 2) working in IE6 ;; I had to point out to the teacher that you couldn't do both.

So, until XHTML 2.0 is supported I'll stay with HTML 4.01 and use as much of the styling conventions from XHTML as possible on my pages to make the eventual transition easier.

On my **private** pages I do XHTML strict with the correct content-type and give sh*t about IE6 - but that's just not a valid commercial solution.

mattur

3:47 pm on Aug 19, 2003 (gmt 0)

Using XHTML or HTML makes no difference to browser compatibility, accessibility or SE rankings - it's the content and how you use the markup that is important.

If you or your visitors need to programatically access your web page content, there are better ways of doing this than using XHTML: e.g. using a database/CMS, providing RSS feeds or a SOAP interface (like Google :)).

The idea that humans and machines should use the same interface (web pages) seems fundamentally flawed to me. Why not design each interface to suit the needs of each audience?

Zeldman's infamous NYPL "xhtml benefits":
1. painless transition to xml/"future-proof"
2. Cleaner, more logical [sic] markup
3. Increased interoperability
4. Greater accessibility

Are completely bogus - even Zeldman's book quietly lost the "Forward Compatibility" from the title...

No-one has found a valid reason for xhtml to exist as yet, let alone a reason for using it ;)

g1smd

11:16 pm on Aug 19, 2003 (gmt 0)

Having been involved with a XML project at Oasis last year, I still really think that all this stuff is being invented for the sake of it, and in the vain hope that someone will eventually find a useful application for it out in the real world.

I am going to completely ignore XHTML (1.0, 1.1 etc) until version 2 is fully supported. For now, HTML 4.01 does all that I want, and if you hide the CSS from older browsers the pages work in any browser.

pageoneresults

11:53 pm on Aug 19, 2003 (gmt 0)

There are a lot of features available in xhtml. Unfortunately as stated above, support is not as wide spread as it should be to make the transition in a commercial environment.

I'll definitely agree with tedster in regards to making the transition from transitional to strict. I did this a few months back with a few properties that I manage and it was a breeze. Since I was already transitional, the few errors that came up switching to strict were quick fixes. Most of the errors were border attributes. You are more or less forced to take all presentational markup off the page and into css, you have no choice.

A little OT, ever notice when your pages validate at the W3C that you get a totally different color scheme and they start throwing in little tips here and there? Sort of a reward for validating. Some of those tips are priceless. ;)

mattur

12:58 am on Aug 20, 2003 (gmt 0)

I am going to completely ignore XHTML (1.0, 1.1 etc) until version 2 is fully supported

Wow g1smd! I thought I was the only one!

It's such a shame that the w3c have lost it. All the innovation appears to be happening in the blogosphere. Meanwhile the w3c invents more and more increasingly complex ways of doing the same thing.

I read Tantek's log recently, and I think his TBL quotes (from Weaving the Web) speak volumes, eg:

Of course if I had insisted everyone use HTTP, this would also have been against the principle of minimal constraint. If the Web were to be universal, it should be as unconstraining as possible. Unlike the NeXT computer, the Web would come as a set of ideas that could be adopted individually in combination with existing or future parts.

This quote captures a few important principles: modularity, minimalism, and compatibility. Certainly many current W3C specifications attempt to adhere to this principle. Yet, in my opinion many of the specifications are more monolithic in nature (neither modular nor minimal), and were not designed to work well (not compatible) with either existing or future technologies. In fact some even appear to have been designed specifically to antagonistically replace existing technologies. This counter-principle, or syndrome, is better known as NIH (Not Invented Here).

The people of the Internet built the Web, in true grassroots fashion.

Grassroots and decentralized was how the Web was built, not by just one company, nor one product, nor one site.

Philosophically, if the Web was to be a universal resource, it had to be able to grow in an unlimited way. Technically, if there was any centralized point of control, it would rapidly become a bottleneck that restricted the Web's growth, and the Web would never scale up.

Its being "out of control" was very important. I'm sure folks have asked the question whether W3C itself (and all the global DTD URLs pointing there), ironically, could be interpreted as a "centralized point of control", and thus eventually a bottleneck.

MonkeeSage

1:17 am on Aug 20, 2003 (gmt 0)

Personally speaking, I prefer the consistancy of the XHTML DTD. But I'm also a symmetry freak, so having all tags closed and all content in block-level tags, &c., might not be such a big deal to others.

But speaking in terms of preparing for XHTML2, it is a BIG STEP I think, because (see NickH's link) what are people going to do when XHTML2 comes out and they not only have to close every tag, but they have to tag every line!?!

People accustomed to doing something like:

Text here 
more text 
and a bit more
...

Are going to find it a real pain to convert, imo. So while it might not be anything to shout about right now...it can be a good stepping stone to the future of markup. Same goes for XML / XUL.

My humble opinion.

Jordan

mattur

2:01 am on Aug 20, 2003 (gmt 0)

..when XHTML2 comes out and they not only have to close every tag, but they have to tag every line!?!

That's the problem with this scenario Jordan: no browser maker can afford to turn off support for tag soup. It would be utter lunacy.

Say a user is looking for Linus Torvalds homepage. Should Linus drop the other things he does (doesn't he know xhtml2 is the way to go? ;)) to instead waste his precious time learning xhtml to make his pages render exactly as they do right now, but in a different way? Would a browser maker ever drop support for non-validating pages when it meant making a large, highly relevant portion of the web invisible?

It's up to us folks. How the web develops depends on us, not the w3c. They've jumped the shark. :)

tedster

2:12 am on Aug 20, 2003 (gmt 0)

Ever is a very long time. Eventually any standards that make sense (and if that isn't XHTML 2.0 it might be XHTML 3.0) will be adopted as all kinds of new user agents come to life.

But I don't expect to see tag soup disappear for years and years, and therefore browsers will continue to support it and incoroporate some very hearty error recovery routines to keep end users happy.

MonkeeSage

2:21 am on Aug 20, 2003 (gmt 0)

mattur:

Yeah, I agree with alot of what you said, as well as the blog comments you posted previously. Mabye it would be possible to have the best of both worlds, though?

One of the ideas I've been kicking around in my head is that there could be three groups of DTDs instead of two: Strict, Transitional and Deprecated. And browsers could be modularized accordingly. Module for Strict / Transitional as the default engine, and an optional module for Deprecated that (if installed) would autoload when Strict or Transitional parsing failed, or when triggered by certain inline DTDs.

I think a scheme like this would allow for browsers to be smaller and faster by default (all the hoops that have to be jumped through for rendering broken pages could be local to the Deprecated mod), and would still allow all the web pages out there (which is usually as you said--'tag soup') to be viewed.

Just an idea...

Jordan

Ps. Linus is cool enough to just get a whole new element...anything in <linus></linus> tags must be rendered as he likes it. ;)

mattur

2:36 am on Aug 20, 2003 (gmt 0)

I don't know, MonkeeSage. It strikes me that one thing could really move stuff along the w3c-axis is standard ways to markup search, page title, global nav, etc. But there is no way the w3c would ever do this 'cos they've hitched themselves to the semantic web bandwagon (machines don't need search boxes, so they're irrelevant ;))

But some of the blog stuff is really impressive: how long will it take the w3c to "standardise" trackbacks? ;)

NickH

10:02 am on Aug 20, 2003 (gmt 0)

Like MonkeeSage, I prefer the symmetry of the XHTML DTD. Of course, you can write Strict HTML 4.01 as per XHTML: closing all tags, using lowercase quoted attributes, and so on. But then it becomes trickier to verify all these things. The W3C Validator won't enforce them, as they're not in the DTD, so you end up relying on pseudo-validators. Easier to convert to XHTML, and have a number of real validators to choose from.

Why put yourself through the hassle of closing tags that don't need to be closed, quoting all attributes... you may ask! Well, if you have a mathematical and/or programming background, it can seem the natural way to code HTML. I also heard it can speed up page rendering, but I imagine that can only be a marginal difference.

I read somewhere -- I think it was in Zeldman's book -- that recent versions of Mozilla/Netscape treat XHTML differently to HTML. I think it was that HTML Strict triggers "almost-standards mode", while XHTML Strict triggers "standards mode" proper. Is that correct? Could this be a good reason for converting to XHTML?

mattur

6:23 pm on Aug 20, 2003 (gmt 0)

No ;)

TheDoctor

12:30 am on Aug 25, 2003 (gmt 0)

No NickH is not correct or No it isn't a good reason?

PeterD

1:14 am on Aug 25, 2003 (gmt 0)

I'm so glad to hear more and more people openly saying that XHTML 1 is an unnecessary distraction. Serve it as 'text/html', it's simply badly-formed HTML. Serve it as 'application/xhtml+xml' or 'application/xml', very few browsers can handle it and the slightest error makes the page un-renderable. It's solution to which there is no problem. XHTML 2 is so radically different that there's not even an advantage to making a gradual transition.

The W3C has lost touch with reality and gone into orbit. Take a close look at XHTML 2. It's nothing but unnecessary complexification that takes the web out of the hands of ordinary content-creators and puts it back into the hands of a professional markup priesthood.

All I hear is the squeaking of the upgrade treadmill.

aevea

1:57 pm on Aug 25, 2003 (gmt 0)

Alright, I'm new to this so maybe my opinion doesn't count for much but I don't see a big difference between xhtml and html. My html habits were short lived, I made pages with a wysiwyg one month and went to xhtml/css the next.

So what's the big deal about closing a couple of tags? To me, it seems like the real "important skill" for future design is learning to control presentation with css.

MonkeeSage

2:10 pm on Aug 25, 2003 (gmt 0)

PeterD:

"Badly-formed"

Can you explain what you mean by that?

aevea:

CSS is half (or a third, depending on how you slice it), markup is the other half. You need something to style in the first place. ;)

You are correct that there is not that much different, but there is more than just closing tags, that's just a main example, but the whole syntax is consistant. It means that browsers can render it faster, with less guesswork, and habitualizing good coding skills. It teaches you (contrary to what Pete said) to write well-formed markup that be relatively could easily be transitioned into XML, RDF, or other well-formed standards as the web progresses.

Four years down the road an HTML page might have to be converted to (our present incarnation of) XHTML, or something very much like it, before it can be converted to whatever the prevailing standard of the time is. Might as well learn to do it now and save the trouble later.

These are just my personal opinions of course--YOMV--your opinion mat vary. ;)

Jordan

mattur

2:58 pm on Aug 25, 2003 (gmt 0)

TheDoctor: I meant this is not a good reason to use XHTML. I've no idea whether Gecko switches from "almost-standards mode" to "standards mode" for XHTML Strict. But I suspect it would make very, very little difference even if it does. ;)

MonkeeSage: any evidence that XHTML renders faster in the real world? I think PeterD meant by "badly-formed" that IE treats XHTML as tag-soup html, so is unlikely to be quicker.

Four years down the road an HTML page might have to be converted to (our present incarnation of) XHTML, or something very much like it

If you forsee this to be a problem for your sites, you would be better off storing your content in a database. Then any kind of transform you may need to do will be significantly quicker since databases were designed for efficient storage and querying of structured information. XHTML is a less efficient storage format since it also has to render in browsers and be human readable.

As NickH said, one can write pages in html 4.01 structured exactly as XHTML pages. What does XHTML get you? Marginally longer pages. ;)

No one will publish XHTML2 pages until the majority of browsers can render them, since it is not backwards compatible. Strewth, we still moan about folks using NN4 and IE5. XHTML2 will require dropping Opera 7, all Gecko browsers, IE6, Safari, all current PDAs and phones... :)

Plus XHTML2 is so different to XHTML1 that there is no upgrade-path. HTML *.* or XHTML1.*, we would have to completely re-write all our pages.

The W3C regards every page as a structured document. Is a checkout page a document or an interface? How about Google's homepage - will anyone ever need to programatically access this "document"?

PeterD

8:16 pm on Aug 25, 2003 (gmt 0)

"Badly-formed" Can you explain what you mean by that?

Sorry, I wasn't clear. What I mean to say is that xhtml sent as text/html is interpreted as tag soup, relying on the browser's error-handling mechanisms, as Mattur suggested.

MonkeeSage

1:29 am on Aug 26, 2003 (gmt 0)

mattur:

"[...] any evidence that XHTML renders faster in the real world?"

What do you mean by evidence? Well-formed markup renders faster because the browser doesn't have to call error handling routines to guess at where tags are supposed to close, where values are supposed to stop when they have spaces or other non-standard chars in them, &c. It also uses the XML parser, not SGML. I've never held a stopwatch on it though, if that's what you mean.

"Plus XHTML2 is so different to XHTML1 that there is no upgrade-path."

All tags must be closed, all attributes must be lowercase, all values must be quoted, the doctype is modular...from what I've read about XHTML2, that is a nice transitional. The only difference with 2 will be what the tags are, but the semantics will be the same.

PeterD:

"What I mean to say is that xhtml sent as text/html is interpreted as tag soup..."

I think you've got that reversed. HTML is (many times) tag soup (doesn't have to be, you could write HTML just like XHTML within it's own semantic range), and thus calls for error handling, re-parsing, &c. XHTML, served with any mime-type, cannot be tag soup by definition (doctype definition that is ;) ), or else it is invalid. Valid XHTML is well-formed under any mime-type.

Jordan

ratboy

6:55 am on Aug 26, 2003 (gmt 0)

I use XHTML, more because I wanted to see how it worked in a real production environment, and it's a pain, especially as the pages tend to degrade over time, especially if other people ever touch the code. There are some major bonuses, such as being able to actually validate and debug your pages through sites like the w3c's, that is valuable.

I gave up on HTML 4.01 doc types because it was harder for me to produce an error free page with that than with XHTML 1. transitional, closing all the tags makes good sense I have to say, keeps you from getting sloppy.

But overall I agree with Mattur, this is objectively speaking, pretty much a total waste of time, with the one exception of being able to easily debug your code, and tell clients that you produce error free XHTML. But no one will ever see the difference.

But still I go on, writing page after page of error free XHTML 1 transitional, I can't be bothered to create classes and id's for every single image or whatever I put on the page just so I can say it's 'strict', that's just silly as far as I'm concerned, although I'll do it one day, maybe when I get really bored, just to do it.

IE 6 treats declared doc type pages differently than undeclared, so does Mozilla, this can be a major pain, there are page layout things that basically can't be done with mozilla in declared xhtml mode, so one has to wonder what the real point is.

You turn XML into HTML that gets sent to the client, so there is no particular reason to worry that much about the HTML, since it will all pretty much work fine in most browsers out there. So that sort of makes the whole 'XHTML is closer to XML' argument pretty irrelevant. I would say in the real world, the sites using the most XML to HTML are the major online newspapers, and they are churning out the same old html they always have as far as I can see, maybe with a bit more CSS now, but overall the code is just as error filled as it always has been, because they know perfectly well that anybody can read it with any browser out there.

My guess is it's just kind of gratifying knowing that your work doesn't have any mistakes in it.

MonkeeSage

7:18 am on Aug 26, 2003 (gmt 0)

ratboy:

"IE 6 treats declared doc type pages differently than undeclared, so does Mozilla, this can be a major pain [...]"

Certain doctype declarations put the browsers in "Standards Compliance Mode." It leaves out the hacks and limitations that are present in the "Quirks Mode." (Cf. Mozilla's DOCTYPE sniffing [mozilla.org], Mozilla Quirks Mode Behavior [mozilla.org]).

"You turn XML into HTML that gets sent to the client [...]"

Are you talking about using XSL Transforms to generate HTML from XML? Your XML still has to be well-formed, and XSL is much harder to learn than either XML or XHTML, so why not just write valid XHTML to begin with and skip the transformation?

Jordan

mattur

12:18 pm on Aug 26, 2003 (gmt 0)

MonkeeSage wrote:

"[...] any evidence that XHTML renders faster in the real world?"
What do you mean by evidence? Well-formed markup renders faster because the browser doesn't have to call error handling routines to guess at where tags are supposed to close

Not for IE users - that's the point: served "properly" XHTML triggers buggy-html mode. Plus of course you can close all your tags in ordinary HTML. No need whatsoever for XHTML. ;)

It's the Emperor's new clothes: W3C's new XHTML standards have no discernible purpose/advantage, and we've all been too polite to say it ;)

NickH

1:18 pm on Aug 26, 2003 (gmt 0)

Not for IE users - that's the point: served "properly" XHTML triggers buggy-html mode.

I'm not sure what you mean by 'buggy-html mode'. I understood that XHTML 1.0 Strict, served as text/html, triggered standards-compliance mode.

Are you saying that's not the case? Or are you referring to the fact that XHTML 1.1 should not be served as text/html?

Nick

claus

1:30 pm on Aug 26, 2003 (gmt 0)

This is an interesting thread.

>> "valid code is quicker"

I'd like to say that there is no way it can be determined if a page validates without reading and parsing it first. Valid DOC-type and Content-type does not make a valid page, valid markup does. And even valid markup is not sufficient for fast or error-free rendering.

Quirks mode or not, user-agents has no way of determining the proper way to render the page unless you write valid code, regardless of (x)(ht)ml flavour. This, in turn can only be determined after the code is parsed, which is some amount of miliseconds after reading the DTD and C-T.

What happens then, is that the UA rendering engine interprets the page content and markup according to the (UA specific) rendering rules and displays the page on screen (or not). It's the (different) rendering rules that cause the problems, it's really not the web standards or markup differences.

>> "future UA�s"

I recently got a visit from a cell phone on a HTML 4.01 Transitional page (not wml that is):

Nokia7250/1.0 (3.12) Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0)

This one was using a proxy at Google. This is not the point, it is essentially just another rendering engine. I do believe that future UA's will be able to interpret and render [other / lower level] markup languages just like the modern browsers are fully capable of rendering Gopher content.

/claus

MonkeeSage

1:35 pm on Aug 26, 2003 (gmt 0)

mattur:

"...served "properly" XHTML triggers buggy-html mode"

AFAIK, only including an XML declaration (or "prologue") causes quirks mode in IE, but it's hard to know for sure as I haven't been able to find out anything offical from MS on what triggers the different modes in IE. Best list I've found is here [hut.fi] (they also have a tester for the different DTDs).

"W3C's new XHTML standards have no discernible purpose/advantage..."

I see at least these:
- Gets rid of attributes and elements that have been deprecated by advances in other standards (e.g., CSS, DOM)
- Forces well-formedness (which ensures compatibility / portability with other well-formed standards)
- Enables the possibility for faster rendering (even if it is not yet taken advantage of by any particular browser)
- Modularizes document entities based on specific content types (XHTML 1.1)

I agree that HTML can work just as well with a proper doctype, but I don't think that makes the differences with XHTML meaningless, it just means that browsers haven't caught up with the state of the art yet. All good things come in time though. :)

Jordan

========

added: The formal titles of XHTML 1 and 1.1 might be of interest on this topic:

XHTML� 1.0 The Extensible HyperText Markup Language: A Reformulation of HTML 4 in XML 1.0

XHTML� 1.1 - Module-based XHTML

NickH

1:43 pm on Aug 26, 2003 (gmt 0)

AFAIK, only including an XML declaration (or "prologue") causes quirks mode in IE, but it's hard to know for sure as I haven't been able to find out anything offical from MS on what triggers the different modes in IE.

To test a particular document, you can check document.compatMode. It should be either 'BackCompat' or 'CSS1Compat'.

Nick

This 52 message thread spans 2 pages: 52

HTML vs. XHTML

What are the advantages of one or the other?

MatthewHSE

MonkeeSage

tedster

NickH

BjarneDM

mattur

g1smd

pageoneresults

mattur

MonkeeSage

mattur

tedster

MonkeeSage

mattur

NickH

mattur

TheDoctor

PeterD

aevea

MonkeeSage

mattur

PeterD

MonkeeSage

ratboy

MonkeeSage

mattur

NickH

claus

MonkeeSage

NickH

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week