Forum Moderators: open

Message Too Old, No Replies

New W3C Validator

Beta #2

         

pageoneresults

2:22 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



W3C Markup Validation Service v0.6.5 [validator.w3.org]

To find out what has changed...

News for the W3C Markup Validator [validator.w3.org]

Changes include:

  • More explanations for most of the validation error messages
  • New documentation on installing the Markup Validator locally
  • The "fussy" parsing mode is no longer available and will be drastically improved before it comes back - if ever
  • The W3C Link Checker has now been spun off into a standalone product and is no longer bundled with the Markup Validator
  • Stylesheets have been updated, with a more pleasant look
  • Easier navigation, more accessible documentation
  • More "Quality Tips" have been added

[edited by: tedster at 4:45 pm (utc) on May 9, 2004]
[edit reason] fix link [/edit]

moltar

3:25 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow! Their descriptions ARE way better than before. They actually have whole paragraphs explaining a problem. Very nice!

ergophobe

5:01 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Very odd. I validated one of my own documents (XHTML 1.0 Strict) and everything was fine.

Then I tried to validate a page created with some blog software. It seems that if the DOCTYPE is HTML, not XHTML, you can't close non-paired tags (i.e. you must have <tag> not <tag />). It seems that, upon encountering this, the parser closes the first unclosed tag, in this case the <head> with disastrous results.

Is this really the correct behavior? It validates okay (actually some unrelated errors much further down the page) as XHTML Transitional.

Vis:

1: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2:
3: <html>
4: <head>
8: <meta name="generator" content="Nucleus v2.0" />
9: <meta name="description" content="" />
28: </head>

Note that the <meta> on line 8 is okay, but then it thinks the <head> is closed and it dies thereafter:

Line 9, column 38: document type does not allow element "META" here
The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).

Line 28, column 6: end tag for element "HEAD" which is not open

pageoneresults

5:07 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't picked it apart yet as I can't find any of my own pages that don't validate. ;)

<meta name="description" content="" />

I've addressed this by adding a closing tag...

<meta name="description" content=""></meta>

Works just fine and it validates. I'm reluctant to utilize the

"" />
method as it causes some issues that I am not comfortable with. I've been using the
</meta>
for over a year now and have not seen any issues.

ergophobe

5:18 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmmm. I'd never noticed. But then, on my own pages, whenever I use a /> I also have an XHTML Doctype.

I think I had it backwards and thought the /> option was less troublesome.

Thank god there's always grep!

Tom

encyclo

7:11 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Using the trailing slash (XHTML-style) in HTML documents always used to be considered as valid by the validator. However, the meaning is theoretically different in HTML - for example:

Hello<br />World

should be rendered as

Hello <br>&gt; World

However, no browser does this. Nevertheless, the HTML 4.01 specification clearly states that a line break is represented by <br> and not <br />. As you're relying on a browser bug (as a theoretical perfect browser would render the latter), it is definitely discouraged to use XHTML notation in an HTML 4 document. Marking trailing slashes as invalid in HTML 4.01 will clear up the discrepancy.

I hope that the W3 stick with this stricter interpretation. In my opinion it improves the validator's HTML parsing abilities, and makes for better coding practices. It will also stop XHTML documents served with an HTML doctype from validating, avoiding the impression that you can switch back and forth without adapting your code.

Reference: [hixie.ch...]

ergophobe

10:32 pm on Apr 18, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Hello<br />World

should be rendered as

Hello <br>&gt; World

So says Mark Pilgrim, but I couldn't find this anywhere in the Recommendations, DTDs or anywhere else and it certainly not standard SGML for deignating an entity

I've looked through

*HTML DTD: [w3.org...]
*HTML 4 Rec on SGML Types: [w3.org...]
*HTML Rec on Character Refs: [w3.org...]

and many other places including HTML 2 DTD and Recs

Tom

encyclo

12:00 am on Apr 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think I did my first message in too much of a rush, so my argument is very unclear.

Ergophobe - I agree with you, and what's more, the W3 agrees with you too (now). The shorttag notation was arcane, never implemented, SGML stuff: not anything really to do with HTML (which, like XML/XHTML is a subset of SGML). But my point stands whether or not you think shorttag is an issue.

If shorttag is true, then <br /> is valid HTML 4.01, but you shouldn't use it because it doesn't mean what you think it means (despite what every browser in the world does).

If shorttag is false, then <br /> does not give <br> &gt; - but then you have no valid reason to use trailing slashes at all in HTML 4.01, and as the spec. says that a line break is represented by <br>, then <br /> is invalid HTML 4.01. And, as I said, the W3 now agrees with you because the new validator discounts the possibility that you are using shorttag notation, therefore it reports <br /> as invalid. Quod erat demonstrandum.

If you're using XHTML notation such as trailing slashes, then you should use an XHTML doctype, not an HTML one. As I said, this is a good call by the W3, and I hope they stick to it, despite the fact that many tools are currently shipping with valid XHTML notation, but with an HTML doctype. These are reported as valid in the old validator, but invalid in the new one.

ergophobe

1:14 am on Apr 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Hello <br>&gt; World

Okay, that was confusing me, but I have it sorted out now. What you are saying is that if in your document you have

<br />

it will appear on screen as though the underlying code were

<br>&gt;

This is because HTML allows shorttags so

<em>emphasized</em>

can be written as

<em/emphasized/

and be totally valid HTML (albeit not recognized as such by browsers). That means that

<br/

is also totally valid HTML for a linebreak and anything coming after that will be sent to the screen (by a fully compliant browser), in this case a ">" character.

I get it now!

Purple Martin

5:02 am on Apr 19, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I haven't picked it apart yet as I can't find any of my own pages that don't validate. ;)

hehe you could always have a go with [microsoft.com...] :P

Very nice new error descriptions, it even gives error descriptions if there is no doctype (it falls back to HTML 4.01 Transitional). I know this from attempting to validate the site I mentioned just now...

vkaryl

12:27 am on Apr 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hehe you could always have a go with [microsoft.com...] :P

I'm appalled! 164 errors, and NO DOCTYPE?

Sheesh.

pageoneresults

1:52 am on Apr 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Purple Martin, did you notice that it now displays all errors? The current validator only shows the first instance of multiple errors.

I'm appalled! 164 errors, and NO DOCTYPE?

Did you review some of those errors? I'm going to bet that the developer is using an out of the box copy of FP. Based on the signature of those errors, I'm certain they are using FP.

ergophobe

1:59 am on Apr 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Still better than Yahoo!
- no detectable character encoding
- no DOCTYPE
- 205 errors

This actual page of WebmasterWorld
- no character encoding detected
- 10 errors

Google home page
- no doctype
- 44 errors

Considering that there's hardly anything on that page, that may be worse even than MS.

Hester

10:39 am on Apr 20, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



37 errors in Google's UK home page!

Most of these are attributes not quoted, and ampersands in URLs. I thought those were ok in HTML? (But not Transitional perhaps?)

g1smd

12:10 am on May 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



dmoz.org -- most pages: 0 errors.

bumpaw

2:11 am on May 9, 2004 (gmt 0)

10+ Year Member



Would the following be quirks mode for IE6?

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

It's the first line that has me questioning. It seems a few days ago that I was playing with the W3C validator and it was required, but it seems my old brain is remembering that anything before the DOCTYPE would lead IE 6 to quirks mode.

photon

4:29 pm on May 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes it would. Even the W3C recommends that you not include XML prologue until browser support for it is more widespread.

coopster

6:05 pm on May 10, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I think I had it backwards and thought the /> option was less troublesome.

ergophobe, regarding the minimized tag syntax for empty elements in XHTML 1.0, I would venture to guess that you were thinking it was less troublesome because you were advised to code this way by the standards [w3.org]...


C.2. Empty Elements

Include a space before the trailing / and > of empty elements, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />. Also, use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.

ronin

4:41 pm on May 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Purely from an aesthetic point of view I love the "Quality Assurance" and the new photos. It's long overdue that standards compliance was seen generally as a mark of quality engineering to distinguish itself from websites which are cobbled together in a slapdash manner.