Forum Moderators: open
To find out what has changed...
News for the W3C Markup Validator [validator.w3.org]
Changes include:
[edited by: tedster at 4:45 pm (utc) on May 9, 2004]
[edit reason] fix link [/edit]
Then I tried to validate a page created with some blog software. It seems that if the DOCTYPE is HTML, not XHTML, you can't close non-paired tags (i.e. you must have <tag> not <tag />). It seems that, upon encountering this, the parser closes the first unclosed tag, in this case the <head> with disastrous results.
Is this really the correct behavior? It validates okay (actually some unrelated errors much further down the page) as XHTML Transitional.
Vis:
1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2:
3: <html>
4: <head>
8: <meta name="generator" content="Nucleus v2.0" />
9: <meta name="description" content="" />
28: </head>
Note that the <meta> on line 8 is okay, but then it thinks the <head> is closed and it dies thereafter:
Line 9, column 38: document type does not allow element "META" here
The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).
Line 28, column 6: end tag for element "HEAD" which is not open
<meta name="description" content="" />
I've addressed this by adding a closing tag...
<meta name="description" content=""></meta> Works just fine and it validates. I'm reluctant to utilize the
"" /> method as it causes some issues that I am not comfortable with. I've been using the </meta> for over a year now and have not seen any issues.
Hello<br />World should be rendered as
Hello <br>> World However, no browser does this. Nevertheless, the HTML 4.01 specification clearly states that a line break is represented by <br> and not <br />. As you're relying on a browser bug (as a theoretical perfect browser would render the latter), it is definitely discouraged to use XHTML notation in an HTML 4 document. Marking trailing slashes as invalid in HTML 4.01 will clear up the discrepancy.
I hope that the W3 stick with this stricter interpretation. In my opinion it improves the validator's HTML parsing abilities, and makes for better coding practices. It will also stop XHTML documents served with an HTML doctype from validating, avoiding the impression that you can switch back and forth without adapting your code.
Reference: [hixie.ch...]
Hello<br />Worldshould be rendered as
Hello <br>> World
So says Mark Pilgrim, but I couldn't find this anywhere in the Recommendations, DTDs or anywhere else and it certainly not standard SGML for deignating an entity
I've looked through
*HTML DTD: [w3.org...]
*HTML 4 Rec on SGML Types: [w3.org...]
*HTML Rec on Character Refs: [w3.org...]
and many other places including HTML 2 DTD and Recs
Tom
Ergophobe - I agree with you, and what's more, the W3 agrees with you too (now). The shorttag notation was arcane, never implemented, SGML stuff: not anything really to do with HTML (which, like XML/XHTML is a subset of SGML). But my point stands whether or not you think shorttag is an issue.
If shorttag is true, then <br /> is valid HTML 4.01, but you shouldn't use it because it doesn't mean what you think it means (despite what every browser in the world does).
If shorttag is false, then <br /> does not give <br> > - but then you have no valid reason to use trailing slashes at all in HTML 4.01, and as the spec. says that a line break is represented by <br>, then <br /> is invalid HTML 4.01. And, as I said, the W3 now agrees with you because the new validator discounts the possibility that you are using shorttag notation, therefore it reports <br /> as invalid. Quod erat demonstrandum.
If you're using XHTML notation such as trailing slashes, then you should use an XHTML doctype, not an HTML one. As I said, this is a good call by the W3, and I hope they stick to it, despite the fact that many tools are currently shipping with valid XHTML notation, but with an HTML doctype. These are reported as valid in the old validator, but invalid in the new one.
Hello <br>> World
Okay, that was confusing me, but I have it sorted out now. What you are saying is that if in your document you have
<br />
it will appear on screen as though the underlying code were
<br>>
This is because HTML allows shorttags so
<em>emphasized</em>
can be written as
<em/emphasized/
and be totally valid HTML (albeit not recognized as such by browsers). That means that
<br/
is also totally valid HTML for a linebreak and anything coming after that will be sent to the screen (by a fully compliant browser), in this case a ">" character.
I get it now!
I haven't picked it apart yet as I can't find any of my own pages that don't validate. ;)
Very nice new error descriptions, it even gives error descriptions if there is no doctype (it falls back to HTML 4.01 Transitional). I know this from attempting to validate the site I mentioned just now...
hehe you could always have a go with [microsoft.com...] :P
I'm appalled! 164 errors, and NO DOCTYPE?
Sheesh.
I'm appalled! 164 errors, and NO DOCTYPE?
Did you review some of those errors? I'm going to bet that the developer is using an out of the box copy of FP. Based on the signature of those errors, I'm certain they are using FP.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
ergophobe, regarding the minimized tag syntax for empty elements in XHTML 1.0, I would venture to guess that you were thinking it was less troublesome because you were advised to code this way by the standards [w3.org]...
C.2. Empty ElementsInclude a space before the trailing / and > of empty elements, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />. Also, use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.