Forum Moderators: open

Message Too Old, No Replies

Why most of us should NOT use XHTML

         

DrDoc

7:39 pm on Apr 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ian Hickson, a member of the Mozilla.org Browser Standards Compliance QA team and an invited expert in the W3C CSS Working Group, explains why XHTML should not be sent as text/html: [hixie.ch...]

walrus

6:19 pm on Apr 2, 2006 (gmt 0)

10+ Year Member



This is an amazing thread. Just wanted to say how great it is to be able to find advice like this.
Last week I spent a lot of time reading up on the pros and cons of xhtml for a client, decided to aim for 4.1 strict as much as possible.

DrDoc

6:38 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One thing that I have a really hard time understanding is why people say that it is ok to use XHTML with a text/html mime type. Sending XHTML with text/html forces it to be parsed as HTML 4 ... thus losing any advantages XHTML had in the first place. So, if you want your code to be parsed as HTML 4, why not just use the HTML 4.01 doctype, as it will accomplish just that, but without any of the drawbacks!

To go back to the SHORTTAG minimization in HTML 4 ... Saying that "it's ok to send XHTML as HTML 4.01" simply because no browsers have implemented shorttag minimization is dangerous. You are now relying on a browser bug to get your documents to render correctly. I, for one, hope that the Gecko engine will fix this problem and implement shorttag minimization properly, in order to be more fully HTML 4 compliant.

Finally, on to a very important point... We all know by now that XHTML served as text/html is truly "HTML 4.01 in quirksmode". Unfortunately, when using the W3 validator to validate a document (with XHTML doctype specified) it should truly force a doctype override. Thus, if you have specified XHTML Strict, the validator should validate against HTML 4.01 Strict when the document is sent as text/html as this is what browsers will treat it as anyway. Further, as XHTML documents sent as text/html are not really XHTML to begin with, all the XHTML requirements (such as wellformedness, closing all tags [including empty ones like "<br />"]) are no longer in effect, wherefore you can safely write plain ol' HTML 4.01 compliant markup, throw in the XHTML doctype ... as long as you serve the document as text/html. If that doesn't sound messed up, I don't know what does.

Web Quirksmode 2.0

pageoneresults

7:35 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.

[w3.org...]

Does that mean that the suggestions from the W3 are to be questioned?

DrDoc

7:42 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Pageone, just remember that W3's statement is said in the light of using the XML prolog and application/xhtml+xml mime type. Without the two, there is no "entering the world of XML with its benefits" whatsoever. XHTML without the two is not XHTML. It is HTML 4 in quirksmode.

WITH the two, however ... yes, the there are certainly benefits to be had!

XHTML family document types are XML based

henry0

8:42 pm on Apr 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I might have to read it a few more times
But I would like understanding why
My pages using
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml 1-transitional.dtd">
Validate OK, (well pure “XHTML” does not my PHP pages!) should I go back to HTML 4…?
Plus a few years ago I went back to U and took a course mostly dedicated to web architecture.
XHTML was becoming real big and we were told to use a XHTML DTD

So I would like to understand better what’s happening

Robert Lofthouse

9:26 pm on Apr 2, 2006 (gmt 0)

10+ Year Member



Unfortunately, I believe that everyone just ran to XHTML without thinking about the consequences. It was the latest thing, all the standards gurus were going on about it, but nobody really took the time out to see how this new markup language worked, and what affect it had on other technologies.

I used to praise up XHTML, but I believe that until it is fully supported (i.e. we have the ability to send it with the correct mime type and prolog), we should use HTML 4.01 strict instead.

Obviously if you are developing a site for an intranet, where you know everyone in your company is using firefox, then feel free to use XHTML with the correct prolog and mime type, but if you're developing a public site then I see no point in using it just yet.

Elijah

11:58 pm on Apr 2, 2006 (gmt 0)

10+ Year Member



Unfortunately, I believe that everyone just ran to XHTML without thinking about the consequences. It was the latest thing, all the standards gurus were going on about it, but nobody really took the time out to see how this new markup language worked, and what affect it had on other technologies.

That pretty much describes me. After reading "Sending XHTML as text/html Considered Harmful ", I plan to change my documents from being XHTML 1.0 served as text/html to being HTML 4.01 Strict.

Thanks to DrDoc for posting the link.

encyclo

1:38 am on Apr 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DrDoc:
XHTML served as text/html is truly "HTML 4.01 in quirksmode"

Well, to be fair, a document with an XHTML doctype is parsed in standards-compliance mode rather than quirks mode (except in IE6 when preceeded by an XML prolog), but it is true to say it is parsed as "HTML with errors" (the undefined attributes which constitute the trailing slashes).

The validator won't complain about XHTML served as

text/html
for two reasons, firstly that the file is merely being checked for conformity to a published DTD rather than whether it is being correctly served, and secondly that at least for XHTML 1.0 (not 1.1),
text/html
is explicitly permitted as an acceptable MIME type for legacy use.

DrDoc:

Saying that "it's ok to send XHTML as HTML 4.01" simply because no browsers have implemented shorttag minimization is dangerous. You are now relying on a browser bug to get your documents to render correctly. I, for one, hope that the Gecko engine will fix this problem and implement shorttag minimization properly, in order to be more fully HTML 4 compliant.

They will never do it, simply because it will break legacy XHTML 1.0/

text/html
support (which is specifically permitted by the XHTML specification). The SHORTTAG problem is a non-issue because user agents have never been SGML-compliant agents, and because a significant number of XHTML 1.0 documents exist whereas no HTML documents which use SHORTTAG notation exist precisely because there has never been an user agent support.

Robert Lofthouse:

Unfortunately, I believe that everyone just ran to XHTML without thinking about the consequences. It was the latest thing, all the standards gurus were going on about it, but nobody really took the time out to see how this new markup language worked, and what affect it had on other technologies.

Firstly, welcome to WebmasterWorld Robert, and thank you for your input! I think there has been a shift in the message carried by what we can generally call "standards evangelists" over the last couple of years for serveral reasons. When the XHTML specifications were published, they were seen as a great leap forward from the prevalent "tag soup" HTML. There was an assumption that the tools and browser support would rapidly follow, and that XHTML would push HTML into obsolescence.

I count myself in this group, and I was already producing XHTML documents served as

application/xhtml+xml
back in 2001/2002 which functioned in early Mozilla pre-release builds. But the tools and browser support never followed, IE7 still will have no support for XHTML. We misread the future.

The second failure of XHTML was with the implementation - the specification itself is flawed-to-broken. In almost every case of even very experienced developers putting in place a "true" XHTML solution for a public (not test) website, the difficulties and disadvantages of using

application/xhtml+xml
far out-weighed the supposed advantages. Most if not all early adopters walked away.

The message from standards evangelists is much more mitigated than in the early days. There was a great deal of "HTML is eeeeviiiil" and "tables are eeeeviiiil", and a certain utopian feel to the refrains. But I think it is now a mistake to suddenly switch to a "XHTML is eeeeviiiil" discourse which is equally uncompromising. The idea is not (as in the early days) to assume that HTML is going to disappear rapidly from the scene and be replaced by XML variants. Rather it is more important to to consider "now-compatibility" rather than "forwards-compatibility". We messed up predicting the future so let's talk about what you should be doing here and now to produce robust, well-built, compatible websites.

If you are building a site today, HTML 4.01 Strict is the best choice, as it has the best user agent support, has the advantage of legacy support too, is easiest to parse and render by the widest variety of browsers and search-engine spiders today.

XHTML 1.0 served as

text/html
is overall a safe choice too. The HTML mime type ensures legacy support, the trailing slash problem is far less serious than the vast majority of real tag soup invalidity, many tools now produce XHTML-compatible syntax by default, and the doctype ensures standards-compliace rendering mode in modern browsers. It is also an easier standard to understand than HTML 4.01 as it is based on the simple constructs of XML syntax rather than the arcane rules of SGML.

Is XHTML ideal? No, but then using tables for layout isn't either, but surely we should pull back from standards extremism and suggest a more pragmatic approach to standards compliance?

A useful reference guide to XHTML from the W3C is: HTML and XHTML Frequently Answered Questions [w3.org].

DrDoc

2:10 am on Apr 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let me just say something I was hoping to avoid having to say in the first place ...

First, let me preface my statement by saying that I am not completely anti-XHTML. As someone cleverly pointed out to me in a sticky, I use XHTML myself (I was wondering how long it would take for anyone to point that out). ;)

What I am against, however, is uneducated and misinformed implementations of XHTML without fully understanding the implications. Way too often have I seen XHTML implementations which were so carelessly carried out as to completely missing the target. XHTML can, and perhaps should be, a viable alternative for public consumption if, and only if, it is implemented 100% correctly.

What this means is that you should not use XHTML unless:

  1. you are willing to take the time to educate yourself with regards to what XHTML is, and how it differs from regular HTML
  2. you are willing to make the effort of implementing XHTML properly
  3. you have the technical understanding and ability to do both of the above

The education can best be carried out in two ways: first, careful reading and pondering of Ian Hickson's article; second, careful reading and pondering of W3C's XHTML documentation. Read until you understand.

Once you have gained the needful knowledge about XHTML through the two aforementioned documents you can move to implement your newfound knowledge. Doing so requires more than mere markup. Reading between the lines in both Ian's article and W3C's documentation, one more change is necessary. You should ensure that your server does the following, whether directly in your server's settings, or through explicit header output through server side scripting.

To avoid throwing IE into quirksmode we really want to avoid including the XML prolog on our pages. In order to do so, the W3C documentation declares that character set must be specified by a "higher protocol". To you, that means it must be sent by the server. Again, this can be done directly in the server configuration, or by properly outputting the appropriate "content-type" header.

Now, in all honesty it should be said that sending XHTML as text/html is not going to cause severe problems in IE. But, doing so across the board is also not going to justify using XHTML over HTML 4.01, as "text/html" effectively nullifies use of XHTML to begin with, and removes any of the benefits for choosing XHTML in the first place. You therefore want to check whether the browser sends the

HTTP_ACCEPT
header.

If

HTTP_ACCEPT
is being sent and contains
application/xhtml+xml
, use this content-type header:
Content-Type: application/xhtml+xml;charset=ISO-8859-1

If
HTTP_ACCEPT
is being sent, but does not contain
application/xhtml+xml
, use this content-type header:
Content-Type: text/html;charset=ISO-8859-1

If
HTTP_ACCEPT
is not being sent at all, use this content-type header:
Content-Type: application/xhtml+xml;charset=ISO-8859-1

In PHP, that is accomplished as follows:

<?php 
if((
isset($_SERVER["HTTP_ACCEPT"])
AND stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml")
) ¦¦ (
!isset($_SERVER["HTTP_ACCEPT"])
)) {
header("Content-Type: application/xhtml+xml;charset=ISO-8859-1");
}
else {
header("Content-Type: text/html;charset=ISO-8859-1");
}
header("Content-Style-Type: text/css");
header("Content-Script-Type: text/javascript");
?>

Adjust charset as appropriate. I put the content-style-type and content-script-type headers in there for good measure as well.

Now, by doing that you are not only sending XHTML according to the W3C recommendations, you are truly making your site both forward and backward compatible while doing what XHTML was designed to do -- taking the web one step closer to being able to fully utilize the power of XML.

If you are incapable of the above implementation, or simply lack the ability to do so, stick with HTML 4.01 as it is the better tool for the job at this point.

encyclo

2:33 am on Apr 3, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Firstly, the PHP script above is incomplete in that it doesn't take into account the weighting (quotient) for each MIME type. For example, a standard Firefox
HTTP_ACCEPT
would be:

text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Your code will serve the document as

application/xhtml+xml
, which would be correct. However if my
HTTP_ACCEPT
was:

text/html,application/xhtml+xml;q=0.9,application/xhtml+xml;q=0.8,text/plain;q=0.7,image/png,*/*;q=0.5

Then your code will still serve

application/xhtml+xml
despite the stated preference for
text/html
.

I also use XHTML for one large (30K+ pages) site. I serve it as

text/html
, I have no XML prolog, it uses transitional markup with tables for layout, and the majority of the pages (which are user-generated) do not validate and are not well-formed. I use XHTML because the script I use produces XHTML-compatible syntax, and I prefer consistency.

Should I try to serve this content as

application/xhtml+xml
in order to do better follow the standard? I won't do it. Real-world requirements mean that
application/xhtml+xml
is the wrong solution, because the variables involved with processing third-party input make the risk of well-formedness errors too great, and whereas with
text/html
the page still displays, with
application/xhtml+xml
the page will break. XHTML can't reliably be generated from current tools.

The "MIME-type switching" idea is flawed too, because it actively penalizes conformant browsers when an error inevitably occurs. the document becomes fragile in

application/xhtml+xml
-aware environments, but remains robust in legacy user agents. Of course, the draconian error-handling is already being circumvented by user agents - visit an ill-formed
application/xhtml+xml
document in recent versions of Opera, and you are given the possibility of viewing the document anyway as
text/html
.

Finally, using XHTML and

application/xhtml+xml
is not necessarily "forwards-compatible", because it is far from certain that furure user agents will support what is a very marginal standard. If Microsoft never supports it at all and the browser companies head towards WHAT-WGs HTML 5, XHTML served as
application/xhtml+xml
may well be just a minor early 21st-century fad.

For reference, a semi-related article (which discusses mostly syndication formats, but the same issues exist for XHTML): XML on the Web Has Failed [xml.com].

This 75 message thread spans 8 pages: 75