The W3C HTML Validator is Broken - HTML forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

The W3C HTML Validator is Broken

«
1
2

DrDoc

10:10 pm on Jul 16, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Why on earth does the W3C HTML Validator [validator.w3.org] allow developers to use XHTML with a text/html content type?

Yes, I know the spec technically allows for it, even though one "should not" send XHTML as text/html.

What really irks me about this is that even if the validator says that the code is syntactically valid, in real life it is not. XHTML sent as text/html is the exact same thing as invalid HTML.

Browsers encountering XHTML with a text/html content type will NOT render it as XHTML. And, if it is not rendered as XHTML in the first place, why send XHTML code, when the browser will interpret it as HTML? And if it will be interpreted and rendered as HTML, why on earth should the HTML Validator tell me the code is valid, when the same code with an HTML 4.01 doctype would be flagged as invalid?

Conclusion -- the W3C HTML Validator is broken.

When validating an XHTML-formatted document, if the text/html content type is used it should not give you the "Congratulations, this document validates as XHTML 1.0 Strict!" However, it should not flag it as invalid either. Instead, it should notify you "oops, I see you are using the text/html content type, which means it will be treated as HTML. Do you wish to revalidate this document as HTML instead? Alternatively, change your Content-Type declaration to application/xhtml+xml."

I can't believe this old cow is still such a problem ... and that there are so many clueless developers out there.

If you do not send your XHTML document as application/xhtml+xml you should not be using an XHTML doctype. You should be using an HTML 4.01 doctype instead. No ifs or buts about it.

FAQ: Choosing the best doctype for your site [webmasterworld.com]
Why most of us should NOT use XHTML [webmasterworld.com]

Every web developer should be forced to read those two threads.

DrDoc

9:10 pm on Jul 19, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Where is the incentive for UA makers to provide better tools if there is no noticable benefit? Therefore, we all need to do our part in promoting and producing valid HTML. And we also need to start demanding tools that can properly render such valid markup. Then of course, we also need to gently (or forcefully) push others in that same direction.

[edited by: tedster at 3:48 am (utc) on July 25, 2007]
[edit reason] cleaning up a side-topic [/edit]

lavazza

12:02 am on Jul 20, 2007 (gmt 0)

10+ Year Member

And we also need to start demanding tools that can properly render such valid markup

Start?

Opera and Mozilla have been welcoming input for years

Get Involved with Mozilla [mozilla.org]

[forums.mozillazine.org...]
* Firefox Bugs
* Web Development / Standards Evangelism

Opera has a vibrant community [my.opera.com] with, as of 5 days ago, over 900,000 members registered and they are fairly transparent about which Web Specifications are Supported [opera.com], plus Opera has a host of features that make identifying the validity (or otherwise) of markup:
e.g. (on a PC)
F4 loads the Info Panel - showing encoding, mime type and other info
Shift+F11 toggles between 'full-screen' and 'small-screen' mode
Ctrl + Alt + V loads the w3C html validator results page

:)

DrDoc

12:06 am on Jul 20, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

By "tools" I wasn't just talking about browsers, but also development tools and such. Things have become much better on the tool front the last several years, but there is still a lot that can be improved.

lavazza

12:52 am on Jul 20, 2007 (gmt 0)

10+ Year Member

The word 'renders' inferred UAs to me

As for development tools, yes please!

A true WYSIWYG - one that can churn out valid, cross-platform compatible, dB driven content - is surely a utopian dream

I've had a play with the w3C's Amaya [w3.org] and , although it looks promising, there's a lot of work to be done

alias

7:53 am on Jul 20, 2007 (gmt 0)

10+ Year Member

[w3.org...]

Why is it allowed to send XHTML 1.0 documents as text/html?
XHTML is an XML format; this means that strictly speaking it should be sent with an XML-related media type (application/xhtml+xml, application/xml, or text/xml). However XHTML 1.0 was carefully designed so that with care it would also work on legacy HTML user agents as well. If you follow some simple guidelines, you can get many XHTML 1.0 documents to work in legacy browsers. However, legacy browsers only understand the media type text/html, so you have to use that media type if you send XHTML 1.0 documents to them. But be well aware, sending XHTML documents to browsers as text/html means that those browsers see the documents as HTML documents, not XHTML documents.

So. Tell me again. Why SHOULD NOT I use XHTML? ;)

Hester

8:31 am on Jul 20, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

No-one's yet mentioned HTML 5 (the future) or XHTML 2 (irrelevant?). Before, there was a clear path from HTML 4 to XHTML 1, but now we're going back to HTML. XHTML would have been the future, but IE6 held it back.

DrDoc

8:35 am on Jul 20, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Why? ;)

As you so aptly quoted yourself:

But be well aware, sending XHTML documents to browsers as text/html means that those browsers see the documents as HTML documents, not XHTML documents.

Sending XHTML as HTML means the UA will see tagsoup. It is no different (NO different) from sending XHTML-formatted documents with an HTML 4.01 doctype.

But, this is not really about whether you should use XHTML or not. There have been plenty of threads on that topic before ... two of which I linked to in my original post. Instead, this thread is about the flawed portrayal of XHTML put forth by the W3C.

I understand why they approached XHTML the way they did. Fine. But it's not legacy UAs that the XHTML problem lies with. It's with new UAs, with spiders ... and with the insufficient information provided to us developers.

An inexperienced or underinformed coder may happily choose to use XHTML, equipped with the text/html content type, without understanding the implications of such a decision.

And, is the W3C there to assist in telling us the implications? No. They will happily lead you to believe that you are now sending "valid" XHTML which will work great, when in reality you are sending broken HTML with an invalid doctype.

[edited by: DrDoc at 8:39 am (utc) on July 20, 2007]

encyclo

10:04 am on Jul 20, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

However XHTML 1.0 was carefully designed so that with care it would also work on legacy HTML user agents as well.

The above wording is certainly more accurate than saying that XHTML 1.0 is "compatible" with HTML. It is true that XHTML will "work" in legacy user agents, but that does not mean the XHTML is valid HTML.

If you take an XHTML document out of its context - ie. if you save a file to a disk and view it later, then it may well be served with a different MIME type to the one chosen originally by the author. As the two types of permitted MIME types for XHTML (legacy

text/html

and the later

application/*

variants) have very significant differences, the original author cannot be sure which MIME type would be used for the document. MIME type is part of the context but not part of the document itself.

This is what the validator is doing - it doesn't validate in context. But documents don't just exist in one single context, so you could argue like the W3C that parsing the document as XML is acceptable. But that makes legacy compatibility a sham, and XHTML a broken standard.

You can go even further and question the choice of language when dealing with user agents, specifically the word "legacy" which is used throughout. Legacy, in the terms of the W3C, is a user agent which supports only HTML, because the assumption was that HTML would be superceded by XHTML. Reality shows us that there has been no interest in adopting the flawed XHTML standard, so HTML-only user agents are not "legacy" programs at all.

lavazza

10:11 am on Jul 20, 2007 (gmt 0)

10+ Year Member

An inexperienced or underinformed coder may happily choose to use XHTML, equipped with the text/html content type, without understanding the implications of such a decision

There seems to have been a sudden increase in the number of (incorrectly declared) Xhtml pages and my guess is that the blame lies with the (sadly popular) so-called WYSIWYGs, which I assume (I use a text editor and have no alternative apps installed) have xhtml and text/html as the default

zCat

12:03 pm on Jul 21, 2007 (gmt 0)

10+ Year Member

Hmm, so if XHTML as text/html is semantically iffy, and xml+xhtml won't work universally without hackery, I see no business case for continuing to use XHTML for my current project (where I'd decided to use XHTML as an experiment because I thought it was time to "move ahead").

Tap tap tap... OK, HTML 4.01 works just as well. Looks like I'll have to go for transitional at the moment, because my app outputs forms with no block elements between the form tag and the form elements (which 4.01 strict and presumably XHTML require, although why on earth this should be so beats me...)

Disclaimer: I and my apps produce well-formed HTML as a matter of principle, and I develop with usability etc. in mind, so upgrading / adapting the project would be relatively trivial, if and when there's an incentive / need to do so.

Fotiman

3:31 pm on Jul 23, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Looks like I'll have to go for transitional at the moment, because my app outputs forms with no block elements between the form tag and the form elements (which 4.01 strict and presumably XHTML require, although why on earth this should be so beats me...)

That is an excellent point! Why on earth is that a requirement? I've gotten in the habit of just doing something like:

<form>
<div>
... form elements ...
</div>
</form>

But I can't understand why the form element would only allow block elements (or script elements) as immediate children. I can't think of a good reason.

JAB Creations

3:15 am on Jul 25, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Replying to the original post...

I just want to clarify that Web Developers work on serverside languages specifically...a Web Designer works on clientside code specifically. Either may work on the other...and when working on actual XHTML a Web Designer must have some know-how of serverside languages in order to serve the proper mime.

That being said I completely agree that my XHTML 1.1 should not validate if the Validator detects an incorrect mime such as text/html. I can not agree that HTML 4 can produce cleaner code then XHTML, look at some of the elements that are allowed to validate even though they are clearly of an adjective nature.

- John

DrDoc

6:33 am on Jul 25, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Clarification: when I said that HTML 4.01 can produce cleaner code -- that was in direct reference to XHTML served as text/html (which the UA sees as tagsoup brokenage HTML).

This 43 message thread spans 2 pages: 43

«
1
2