html source code validation

Forum Moderators: mack

Message Too Old, No Replies

html source code validation

Should it form part of doctype or not

bid4abook

9:11 pm on Nov 12, 2005 (gmt 0)

Hello all, I would like to know why the following:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

has ("http://www.w3.org/TR/html4/loose.dtd")included within the text.

The above belongs to a competitors page. My page is as follows:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

Does the mention of the W3c mean it is more readable
or am I just being paranoid?

encyclo

1:26 am on Nov 13, 2005 (gmt 0)

Welcome to the forums bid4abook!

The URL points to the official document type definition (DTD) for the HTML 4.01 Transitional doctype, and including this URL within the doctype declaration is recommended practice by the W3C. Browsers, however, do not actually use this DTD when parsing the page.

The most significant difference when including the URL is that modern browsers such as Firefox, Mozilla, Netscape 6+ and Internet Explorer 6+ use the doctype declaration on a page to determine whether to parse the page in either "quirks mode" (or in a backwards-compatible manner) or in "standards compliance mode" (or more in accordance with the actual specification). The doctype without the URL, as you are using, triggers quirks mode. See tedster's excellent Quirks Mode vs. Standards Mode overview [webmasterworld.com] for more information.

Note that the choice of doctype is only relevant to how a page will be displayed in a browser - it has no direct impact on indexation. The "readability" can better be judged by validating your pages:

[validator.w3.org...]

Error-free markup can help avoid parsing problems which could affect rankings.

bid4abook

10:36 am on Nov 13, 2005 (gmt 0)

encyclo thanks for that. I did as you suggested, this is the W3C result:

I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to reliably validate the document. I'm falling back to the "UTF-8" encoding and will attempt to perform the validation, but this is likely to fail for all non-trivial documents.

Any pointers as to what is wrong?

encyclo

1:54 pm on Nov 13, 2005 (gmt 0)

You're missing a charset declaration (which defines the character encoding of the page). The easiest way to declare the charset is with a meta tag in the head section of each page:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

bid4abook

1:59 pm on Nov 13, 2005 (gmt 0)

encyclo is that the western euro charset?

encyclo

2:18 pm on Nov 13, 2005 (gmt 0)

Yes ISO-8859-1 is the most common charset for western European documents and the most likely actual encoding for English-language content. An alternative to ISO-8859-1 would be UTF-8 [webmasterworld.com].

bid4abook

2:32 pm on Nov 13, 2005 (gmt 0)

Probably a daft couple of questions but what is the difference and which is the preferred for browsers and search engines?

bid4abook

3:49 pm on Nov 13, 2005 (gmt 0)

Encyclo Read the article, wow and a heart felt many thanks.