Forum Moderators: open
1) h**p:www.alistapart.com/about/
A very popular web magazine. Uses XHTML 1.0 Transitional. (I selected the "About" page as it will change less frequently than the front page).
2) h**p://www.w3.org/TR/xhtml11/
W3C Recommendation for XHTML 1.1. Uses (surprise surprise!) XHTML 1.1.
3) h**p://www.webstandards.org/about/
Group advocating (guess what?!) web standards. Uses XHTML 1.0 Strict. (Again I chose a fairly static "About" page.)
My question is this: are these three pages valid XHTML?
ALA
• The XML prologue is missing. All XHTML pages are supposed to have the XML prologue unless the encoding is UTF-8 or UTF-16.
W3C
• XHTML 1.1 pages should not be served as text/html (even though they may). Page is served as text/html, not application/xhtml+xml (which is of course because of lacking support)
WASP
• The XML prologue is missing. All XHTML pages are supposed to have the XML prologue unless the encoding is UTF-8 or UTF-16.
• HTML style comments are used around inline style sheets. This should not be done since XML parsers are allowed to silently remove the contents of comments. *
* Note that I didn't not check any external style sheets since it was only the markup that was in question
ALAThe XML prologue is missing. All XHTML pages are supposed to have the XML prologue unless the encoding is UTF-8 or UTF-16.
I agree. If you aren't using UTF-8 or UTF-16, there should either be an xml prolog, or the charset should be defined in the http header (before the page is served). This page (and in fact the whole ALA site) is invalid XHTML 1.0 Transitional. However, it validates.
W3CXHTML 1.1 pages should not be served as text/html (even though they may). Page is served as text/html, not application/xhtml+xml (which is of course because of lacking support)
Close, but no cigar ;) XHTML 1.0 should not be served as text/html, but you may do so. XHTML 1.1 must not be served as text/html. This page is invalid XHTML 1.1. However, it validates.
WASPThe XML prologue is missing. All XHTML pages are supposed to have the XML prologue unless the encoding is UTF-8 or UTF-16.
They don't need an XML prolog, as the charset is defined as ISO-8859-1 in the http header. No problem there.
HTML style comments are used around inline style sheets. This should not be done since XML parsers are allowed to silently remove the contents of comments.
This doesn't invalidate the XHTML - ok I agree the inline stylesheets should have no effect, but the comments are valid. So, no problem there either.
They're missing something else - I still haven't found the reference in the specs (the W3C site and documentation is a real nightmare to navigate!), but I think the page is invalid. I'm the least sure about this one, though... ;)
Close, but no cigar
Yeah, I didn't feel like looking it up in the specs :)
But, now that I did... I noticed that I am right after all ;) XHTML Media Types [w3.org]
As for WASP... I didn't bother checking the headers for any of these pages (too much work ;))... And I didn't say that all the things I brought up were errors... just "minor flaws", since I figured that would better describe it :)
I couldn't find any other problems with the WASP site though... Care to enlighten us as to what the problem might be?
inline stylesheets should have no effect
XHTML 1.1 must not be served as text/html.
Actually, it's "should not," so it is still valid.
Of course, with a little PHP, it's easy enough to have the best of both.
if (strstr($_SERVER['HTTP_ACCEPT'], 'application/xhtml+xml'))
header('Content-Type: application/xhtml+xml; charset=iso-8859-15');
Replacing the charset with your page's charset, of course.
It would be nice to have an advisory text, something like "this page is valid xhtml but you should really have an xml prolog" with a link to the discussion on the pros and cons of the prolog.
Interesting to note that both non-compliances are caused by problems with browsers. Even the w3c recognise these practises...
About the xml prolog...
Because XHTML is based on XML, it is common to add an XML declaration at the beginning of the markup...With Internet Explorer, however, if anything appears before the DOCTYPE declaration the page is rendered in quirks mode...
Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML, and therefore may not render the document as expected.
We assume that, because of its tendency to cause Internet Explorer to
render in quirks mode, some people prefer not to use the XML declaration for XHTML served as text/html.
About the content type...
We recommend the use of XHTML wherever possible; and if you serve XHTML as text/html we assume that you are conforming to the compatibility guidelines in Appendix C of the XHTML 1.0 specification.We recognize that XHTML served as XML is still not widely supported, and that therefore many XHTML 1.0 pages will be served as text/html.
There is a lot of good information about practical xhtml and character encodings in that article [w3.org] so it is well worth a read.
Personally I tend to use HTML4 Strict to avoid these issues. It is close enough to xhtml that I can convert fairly easily in the future but it lets me get on with developing pages now.
Declare encoding for your CSS style sheets tooIt is a good idea to always declare the encoding of external CSS stylesheets. (It is not necessary for CSS embedded in a document.) This is done by adding a statement to the top of the file such as:
@charset "utf-8";
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> ...without an XML prolog and without declaring anything in the
<html>. I was declaring charset and language using metadata. I've now converted everything over to...
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> Hopefully I've done the right thing. I did not see any adverse affects when using just the XHTML 1.1 DOCTYPE without the prolog and without declaring anything in the
<html>. Am I on the right track here?
The prolog will put IE into Quirks mode. Hasn't that broken your layout?
I tested a few pages before I converted 300+ pages over. No problems whatsoever with layout and I'm testing in IE, Opera and Moz. I see no difference whatsoever. What exactly would I be looking for? Everything validates just fine too.
Basically Quirks-mode would mean that IE should revert to its old broken box model, so I would expect to see elements appearing as the wrong width and possibly in the wrong place.
However I guess if you have already coded a layout that looks okay in IE5 (which always uses the broken model) then it won't really matter which mode the browser is in.
Re: WASP. The problem with their site is that they do not define the language of the page. While this obviously means that the page is not WCAG Priority 3 compliant (and makes for poor usability), until I can prove otherwise, I say the page is tentatively valid. I suspect that it's a simple oversight in their case.
For ALA, however, I won't concede. I reckon ALA is invalid, and this is the point I want to make about the validator: it can't check everything - not because it's buggy, but because it only looks at document structure compared with the DTD. ALA is structurally valid, but the XHTML 1.0 spec is more than just that. The validator doesn't check for MIME type, and it doesn't check for character encoding - but that doesn't mean that these two issues don't matter.
Would it be suggested to remove the prolog and continue to use the metadata for declaring charset?
If the prolog isn't causing trouble, you can keep it - however, the best way of defining the charset is with a HTTP header - so the browser knows what charset to use before it starts to render the document. If you do that, you don't need either the xml prolog or the meta tag in the document itself.
W3C Recommendation for XHTML 1.1. Uses (surprise surprise!) XHTML 1.1.
Well, that is surprising - considering that the HTML 4.01 specs [w3.org] are written with a transient dtd and include align="center" and stuff ;)
It could easily report the MIME type. A lot of other spiders already do this. Might want to suggest it as a feature.
The validator does check for character encoding. It complains if the charset isn't declared, and fails to parse the page at all if there are invalid characters in it.