Forum Moderators: mack
First, what is validating HTML, how do you do it, and what are the benefits to validating? And, what are the hazards of using un-validated HTML?
Second, can somebody please provide a basic, beginner-level description of doctypes, how they're used, and the advantages to using them?
I have a feeling that I'm not losing much right now by not using doctypes or by not validating, but I want to keep up with the times . . .
Thanks,
Matthew
Back in the olden days this guy thought up a great way of writing structured documents, SGML. The problem with SGML is that it is difficult to write by hand. First you had to write this DTD thingy, then you need to write your document, then you would validate the document against the DTD. Not to mention the myriad of shortcut ways of writing SGML.
Then this other guy (I really should go look up names) decided to create HTML, which is an application of SGML. (SGML is to HTML as XML is to XHTML.) When they wrote a "web browser", it had to be very forgiving, because they wanted people to use it, rather than fight it, like they did with SGML.
(This is a broad simplification of everything.) Fast forward several years and all this forgiveness by browsers has led to the standards mess we have today, despite having DTDs for HTML1, 2, 3.2, 4.01.
Validating your document is like having a second pair of eyes, or the spell-checker of HTML. It lets you spot possible errors in your markup.
Moving into the future, if we ever start writing XML, (or XHTML served with an XML mimetype) you'll have to have a valid document or the browser won't render it.
You validate your document by going to [validator.w3.org...] and filling in the URI field and hitting submit.
The best reason to validate is to find potential errors in your code, and because its a good habit. Kind of like washing your hands after using the bathroom, even if nobody else is in there to see you not do it.
A doctype and a DTD are related beasts. The doctype comes from the SGML world, and tells you what kind of document this is supposed to be. The DTD says what elements of the document are allowed where and in what order.
For HTML4, there are three doctypes, frameset, transitional, and strict.
Strict is used when you comply completely to HTML4, and don't use any deprecated attributes or elements.
Transitional is when you use deprecated attributes or elements (like center, align=).
Frameset is for when you have a frameset page.
This is the HTML4.01 Strict doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
When you provide the URL, you are saying "I guarantee that my document won't give an error if compared against the DTD at the URL provided." Browsers don't typically check this, that's what the validator is for.
What you do gain by providing a doctype with URL in modern browsers is called 'Doctype Switching'. When a doctype with URL is provided, IE6, Gecko-based browsers (Moz, NS, etc.), and a few others (Opera and Safari, I think) will switch to 'standards-mode' rendering. IE5.5 used an incompatible box model for CSS width and height. In IE6, you will get the standards-mode box model with a doctype and URL. CSS properties also inherit correctly with a doctype, for example the font-size doesn't reset for a <table>.
PS: I've glossed over a bunch of stuff, there are exceptions to what I've written, but I didn't want to get bogged down in too many details
====================
I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are:
The HTTP Content-Type field.
The XML Declaration.
The HTML "META" element.
And I even tried to autodetect it using the algorithm defined in Appendix F of the XML 1.0 Recommendation.
Since none of these sources yielded any usable information, I will not be able to validate this document. Sorry. Please make sure you specify the character encoding in use.
IANA maintains the list of official names for character sets.
====================
What in the world is it talking about!? [grin]
(Thanks for more help, anyone . . . How badly does it show that I'm in over my head? Never mind, don't answer that one! :) )
There are a huge number of character sets out there, think of English vs Japanese. Browsers try to guess which character set you are using; validators expect you to tell them explicitly.
There are two ways of telling the world (web browsers, validators, everyone else) what character encoding you are using:
1. If you can use an .htaccess file put the following line in it:
AddType 'text/html; charset=ISO-8859-1' html
2. Otherwise put the following line in the <head> section of every file:
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
assuming you are using one very common encoding.
> How badly does it show that I'm in over my head?
Almost all of us are, or were at some stage, way over our heads :)
For most people "HTML 4.01 Transitional" is probably the version of HTML to aim for.
I also tick the boxes for "Show Source", "Show Outline", "Show Parse Tree" and "Verbose Output" as well.
On the design side, make your code tidier by exporting all CSS and JS to external files as well.
I'm trying to validate using HTML 4.01 Strict. My first page had 35 errors. Most of them I can fix myself, but I think there are two that I need help with.
First, I got this error before it even began to show errors with my page:
DOCTYPE Override in effect! Any DOCTYPE Declaration in the document has been suppressed and the DOCTYPE for «HTML 4.01 Strict» inserted instead. The document will not be Valid until you alter the source file to reflect this new DOCTYPE.
What does this mean, and what should I do about it?
Second, the validator is complaining about the nowrap in some of my td tags. I need that attribute, though. What CSS can I use to replace the nowrap attribute?
Thanks,
Matthew
I don't think that's quite right. SGML is the mother of all browser MLs. As I understand it, XML, XHTML, and HTML are all simplified versions of SGML and rely on SGML-DTDs for their existence.
Back to the question. Your pages may work fine now but as the browsers become more aligned with standards it will become more imperative that you declare your DOCTYPE so they can render it accurately. Search engines will also rely more heavily on the DOCTYPE (my own theory).
I got this error before it even began to show errors with my page
Thats a warning because you are specifying a doctype for the validator. To get rid of it just include a doctype in your html instead.
the validator is complaining about the nowrap in some of my td tags. I need that attribute, though. What CSS can I use to replace the nowrap attribute?
Something like..
td.nowrap {
white-space: nowrap;
}
There are really only a few differences between xhtml and correct html (closing all tags, tag order, etc) and it may make sense to learn this newer version while you are studying this.
Personally I try to get all new projects to validate xhtml 1.0 transitional (and I have to say DW is doing a great job with this - the two latest projects have just validated without error).