| This 43 message thread spans 2 pages: 43 (  2 ) > > || |
|The W3C HTML Validator is Broken|
Why on earth does the W3C HTML Validator [validator.w3.org] allow developers to use XHTML with a text/html content type?
Yes, I know the spec technically allows for it, even though one "should not" send XHTML as text/html.
What really irks me about this is that even if the validator says that the code is syntactically valid, in real life it is not. XHTML sent as text/html is the exact same thing as invalid HTML.
Browsers encountering XHTML with a text/html content type will NOT render it as XHTML. And, if it is not rendered as XHTML in the first place, why send XHTML code, when the browser will interpret it as HTML? And if it will be interpreted and rendered as HTML, why on earth should the HTML Validator tell me the code is valid, when the same code with an HTML 4.01 doctype would be flagged as invalid?
Conclusion -- the W3C HTML Validator is broken.
When validating an XHTML-formatted document, if the text/html content type is used it should not give you the "Congratulations, this document validates as XHTML 1.0 Strict!" However, it should not flag it as invalid either. Instead, it should notify you "oops, I see you are using the text/html content type, which means it will be treated as HTML. Do you wish to revalidate this document as HTML instead? Alternatively, change your Content-Type declaration to application/xhtml+xml."
I can't believe this old cow is still such a problem ... and that there are so many clueless developers out there.
If you do not send your XHTML document as application/xhtml+xml you should not be using an XHTML doctype. You should be using an HTML 4.01 doctype instead. No ifs or buts about it.
FAQ: Choosing the best doctype for your site [webmasterworld.com]
Why most of us should NOT use XHTML [webmasterworld.com]
Every web developer should be forced to read those two threads.
In my very humble opinion ...
By slapping a "valid XHTML" stamp on a site using XHTML and text/html, you are no different from someone who is using XHTML markup, but with an HTML 4.01 doctype, and validating the site using a doctype override.
100% beef, but with 25% soy mixed in because no one can tell the difference.
If CSS 2.1 could be taken back to the drawing board, so can XHTML. And I for one hope they make a complete 180° on the text/html content type.
Is this relevant?
The lang and xml:lang Attributes [w3.org]
|Use both the lang and xml:lang attributes when specifying the language of an element. The value of the xml:lang attribute takes precedence. |
XHTML 1.0 The Extensible HyperText Markup Language (Second Edition)
A Reformulation of HTML 4 in XML 1.0
W3C Recommendation 26 January 2000, revised 1 August 2002
Relevant to XHTML, yes. But not to this topic ;)
From the above referenced thread...
|If you send XHTML as application/xhtml+xml, and the code validates to W3C, does anyone know if that will display ok in legacy UA's? |
|No, if you serve a document as application/xhtml+xml then IE6 (or even IE7) is unable to parse the file, and you will get a download prompt. Same goes for search engine spiders such as Googlebot which do not handle application/xhtml+xml - use it and the page cannot be indexed by any major search engine. |
I'm a little lost here? Why would I want to serve application/xhtml+xml if indexers are unable to parse it?
|I'm a little lost here? Why would I want to serve application/xhtml+xml if indexers are unable to parse it? |
... and if you don't serve it as
application/xhtml+xml, why would you use XHTML in the first place?
|why would you use XHTML in the first place? |
Because many of us hopped on the bandwagon way back when not fully understanding XHTML. Many of us thought it was the next level of HTML when it was not. So, its just a matter of going back and undoing what was done. In the mean time, those sites continue to perform just as expected. I don't see this being an issue although I'm not too certain that I'm using XHTML to its fullest extent either. In fact, I know "I'm" not.
[edited by: pageoneresults at 8:02 pm (utc) on July 18, 2007]
Ah, now ... that's a different issue!
I'm not putting any form of blame on anyone who maintains a site that was originally developed using XHTML. I, too, made the same mistake on a few sites back then. I likewise thought that XHTML was the future of HTML, when in reality it was meant to be more of a completely new standard.
So, yes, if you have a successful site coded according to XHTML but with the text/html conten type -- by all means -- maintain it that way.
What I'm talking about here is new development. No one in their right mind should be developing new sites using XHTML today. Yet, so many still think that XHTML is the future, and that browsers and other useragents are ready for it. And, they are under the false impression that text/html simply solves the problem of incompatibility.
And don't give me the argument of wellformedness and clean code. You can get just as clean code (cleaner, in fact) using HTML 4.01. And wellformedness ... well, that's no longer a benefit without application/xhtml+xml.
My issue with the validator is that it still validates the document as if it were sent using application/xhtml+xml, regardless of actual content type. It checks for wellformedness and other things.
What it should do instead, as I mentioned above, is gently remind the developer that XHTML served as text/html will be treated and rendered as HTML by the browser. So now we're back to tagsoup again from the browser's perspective.
Okay, next question...
Why, if IE6 and other UA's cannot parse application/xhtml+xml, would someone want to use it in the first place? I think I know the answer but I don't want to make a fool of myself if I'm wrong being the W3 Groupie that I am. :)
|Why, if IE6 and other UA's cannot parse application/xhtml+xml, would someone want to use it in the first place? |
Well, there are a couple of reasons why you would want to use XHTML with application/xhtml+xml. The reasons are few and far outnumbered by the reasons for deciding against it.
1) Several standards compliant UAs do support it. Firefox and Opera, to mention two.
2) Using application/xhtml+xml you now know that you can use the document in XML parsers. There are all sorts of fun things to be had with XML parsers. Content scraping for use in your site search engine, for example. Site structure representations with buckets of contents and their internal relationships. Makes for awesome sitemap generation. On large sites it also makes for wonderful structural overviews. A high-level SEO/SEM tool.
3) Content negotiation. Send HTML 4.01 to UAs which do not prefer XHTML, but send XHTML to those who prefer that.
4) Perhaps the biggest of all -- ability to use XML namespaces.
There clearly are benefits to using XHTML. The spec itself makes that very clear. The wellformedness and strict structure of XML is but one. But the windows of opportunity it opens up are endless and not possible with plain HTML.
The ability to use XML namespaces is a huge benefit. In fact, it is the only benefit (unless you depend on XML parsers as stated above) truly worth a transition to XHTML, in my opinion. I can see no other reason really worth opting for XHTML over HTML.
[edited by: DrDoc at 8:17 pm (utc) on July 18, 2007]
And, please don't take me wrong.
I am all for W3C. I will stand up for their support any day.
But they are certainly not the best educators about their own technologies and standards. And being the huge authority they are, the surprising lack of clearly spelled out information for us uneducated developers who blindly trust the W3C not to lead us astray is wreaking havoc.
|No one in their right mind should be developing new sites using XHTML today. |
Especially if UAs cannot parse application/xhtml+xml which is required to serve XHTML documents properly.
I started switching back to HTML 4.01 Strict when you posted the above topic back in 2006 April.
Here's the problem. In working with Windows and .NET, it appears that XHTML is the defacto DOCTYPE. Or at least that is what I "used" to get from some of my .NET developers. I've since stopped them from doing that. ;)
|In working with Windows and .NET, it appears that XHTML is the defacto DOCTYPE. |
Eek. I had no idea. I've had some developers give me XHTML in the past. I thanked them, but asked them to give it to me in HTML. We had a _long_ discussion about why ...
But, I refuse to touch XHTML for any new development these days. Although, as encyclo so nicely pointed out to me in a sticky the other day -- I used to think otherwise. ;)
So, what about validation? Am I wrong in thinking that the W3C handles validation incorrectly?
This thread has been an eye opener.
Personally, I wish UAs followed the standards exactly.
When I write compiled code, if there's a syntactical error, it doesn't compile, and thus, doesn't run.
Why should it be any different for HTML (and it's peers)?
If there is an error, it simply shouldn't render the page at all.
|So, what about validation? Am I wrong in thinking that the W3C handles validation incorrectly? |
I don't think so. But, this is something that could change. I remember the W3C CSS Validator going through some changes not too long ago that addressed Webmaster complaints.
But, in this instance, I'm not too certain it would do any good. As long as the page validated, I feel many could care less if it was being served as text/html. The task of undoing XHTML tagging is not one that I look forward to, especially a .NET site, arrrggghhh!
So, in theory, there should be no XHTML websites in the index, none, correct? If the UA cannot parse application/xhtml+xml, then XHTML should be reserved for behind the scenes applications, correct?
For behind the scenes stuff, or in the event that you do content negotiation. You may very well serve XHTML to UAs that can handle it (such as FF and Op), but should serve HTML to those that can't (such as IE and spiders).
These are the sort of reasons why I still stick to HTML 4.01.
I have never seen the need for XHTML, and still cannot.
|Hello, I am an invalid XHTML document. I know I should put </p> to close this paragraph, but I won't! |
Well, how does your browser handle me? I hope you won't get a BSOD
Call me with a different server header and see what will happen. Each of the following links will give you the same invalid XHTML:
You may want to test those headers for a valid XHTML document:
FF and Opera seem to cope reasonably well
IE seems to have one favourite response:
The XML page cannot be displayed
|The task of undoing XHTML tagging is not one that I look forward to |
A few of us could probably use a good post on how to do that...
Isn't XHTML now the default doctype for pages created with the newer version of Macromedia Dreamweaver?
Interesting thing is if you use application/xhtml+xml in your XHTML doc and attempt to validate it, the validator still states:
Or is that correct?
|When I write compiled code, if there's a syntactical error, it doesn't compile, and thus, doesn't run. Why should it be any different for HTML (and it's peers)? |
HTML is not a programming language. It doesn't need to be precise.
HTML is just a quick and dirty way to mark up content in an absurdly simple presentational structure, which, seemingly by chance, managed to hit upon a syntax that turned out to be one of the most effective and revolutionary bodges of all time. We wouldn't be having this discussion if it wasn't.
|If there is an error, it simply shouldn't render the page at all. |
Steps to implement your idea:
1. Release a new browser that displays only well-formed/conformant code.
2. When your users complain they can't view 90% (or more) of the web tell them your browser only works with "proper" web pages. If they still complain, tell them TBL said it "should" be like this, and he invented the web, so there.
3. Repeat step 2 until all your users have switched to another browser that displays all the web.
Why aren't browsers strict? Because they don't need to be.
Well...yeah, NOW it's too late to make browsers strict. The point is, they should have been from the start. When they allow you to not close tags that should be. Nest opening tags inside another set, and closing it outside. Etc, etc.
I realize releasing a new browser that only rendered strict would be doomed to failure...but it's how it should have been from the start.
We as web developers wouldn't be in this mess of wondering why a page renders fine on FF, crappy in IE, somewhat ok in Opera, and horrible in something else...and when we fix it in one, the browsers play musical chairs and we're left scratching our heads in frustration because the specs show it should work.
|Why aren't browsers strict? Because they don't need to be. |
Yes, they do! If I send perfectly valid code, exactly to the standard -- it better render it exactly to the standard as well!
Now, that's not to say they shouldn't have built-in error handling and fallback routines for broken code. They should. But valid and strict code should not be penalized on behalf of all the broken pages.
And the W3C should not encourage "broken" or less-than-optimal markup.
|And the W3C should not encourage "broken" or less-than-optimal markup. |
There will always be "us few" who continue to promote and adhere to the standards "to the best of our understanding". There are still things that I don't fully understand but I know enough to get me in trouble. ;)
I can't complain about the W3 Validators. I use them day in and day out and it doesn't cost me one cent. I am a W3C Major Supporter but I look at that as my donation back to the cause.
[edited by: tedster at 4:56 am (utc) on July 24, 2007]
Well, let's first start with how many use a doctype in the first place ;)
Less than 5%?
And then of those, less than 10% that validate?
|Why aren't browsers strict? Because they don't need to be. |
There is a subtler reason. The web is built on the Internet, and the Internet is awash with software inspired by Postel's principle: Be conservative in what you do; be liberal in what you accept from others.
What is missing at the web end of that is the being conservative in what you do. Because browsers are liberal in the HTML they interpret, it has led to HTML writers to produce the most sloppy error-ridden code ever deployed on a computer network.
And that's partially because we, as the end users, have not stood up and complained.
That needs to change. What we all should do is check a sample page from any HTML generator we own. If there are bugs in the generated HTML, then treat that as a serious fault in the product: insist it be fixed under warranty [if by chance you should have such consumer protection] or get your money back.
Within a year or so, all major HTML-emiting products would have versions that actually worked, rather than relying on the goodwill and ingenuity of the HTML rendering industry.
I've seen the light!
I'm going to abandon XHTML because browsers can't be relied upon to handle it correctly. Just as soon as I've stripped out all my CSS and gone back to tables for layout. ;)
In all seriousness, without developers pushing standards ahead of the browsers' capabilities we'd have far slower progress than we already have.
Hixie made some valid points when he wrote the referenced article, but most tend to stress that if your team can't code valid XHTML, then you shouldn't be using an XHTML doctype. Kinda like saying that if you can't drive, youi shouldn't be behind the wheel of a car. Isn't that just common sense?
Apologies for taking this further off topic Doc.
I've been holding back to see what everyone else would say first, but there's a very interesting answer to this specific question from the W3C validator team in the (not very) famous Bug #1500:
Bug #1500: XHTML-sent-as-text/html is parsed as XML [w3.org] RESOLVED
From the linked bug report:
|According to the HTML WG, a UA is non-compliant if it handles an XHTML document sent as text/html as XHTML; such a UA must apparently handle the document as HTML regardless of what it looks like. (...) The fact that the validator ignores this means that documents that don't comply to appendix C of XHTML 1.0 are being marked as valid when in fact they aren't conformant and won't be handled correctly. |
I would like to see the validator reject any XHTML-sent-as-text/html as being of the wrong MIME type.
It's an interesting debate (if you like that sort of thing!), and the bug was closed (resolved as invalid) with the following comment:
|Maybe this will all be clarified in a future errata version of XHTML 1.0. In the meantime, I believe the practical course to follow is: |
* to close this bug as "not a bug". There is nothing wrong with parsing XHTML in XML mode
* to keep making progress on integrating the Appendix C checker to the validator - see Bug 4514 - and figure out whether problems raised by the appC checker should be errors or warnings.
To cut a long story short, the argument of the W3C Validator team is that the specification allows the validator to parse XHTML as XML, and therefore show as valid pages which are XML conformant and using correct XHTML syntax. The "blame" is placed not with the validator, but with the XHTML specification, which should be clarified in future errata as the current status of the Media Types section is unclear.
So, to answer DrDoc's specific question, the W3C claims the validator is not broken, but XHTML is broken. Maybe. Perhaps. All clear now? ;)
Aha! The bug you just referred to, encyclo, saved my sanity!
I can live with that reply, although I vehemently disagree. Either the validator or the XHTML standard is broken. One or the other.
I withhold that the validator is broken. XHTML should not be parsed as XML when it is specifically declared to be HTML (through use of the
text/html content type declaration).
While I can see their point in saying that there is technically nothing wrong with parsing XHTML as XML anyway ... then the validator is broken anyway.
I have seen sites validate as XHTML where: there is no XML prologue
attributes are single-quoted
But, of course the W3C validator team is going to want to shift the blame elsewhere. I'm not saying they are wrong. The XHTML spec (especially the content-type section) needs to be updated/clarified/changed. But that does not take away the responsibility from the validator team to do their part. They, are still to blame for incorrectly handling XHTML sent as HTML (just as IE is to blame for not handling application/xhtml+xml properly; not the standard).
So, this leads me to the next conclusion: the W3C validator and the XHTML standard are both broken.
| This 43 message thread spans 2 pages: 43 (  2 ) > > |