Forum Moderators: mack
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
has ("http://www.w3.org/TR/html4/loose.dtd")included within the text.
The above belongs to a competitors page. My page is as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Does the mention of the W3c mean it is more readable
or am I just being paranoid?
The URL points to the official document type definition (DTD) for the HTML 4.01 Transitional doctype, and including this URL within the doctype declaration is recommended practice by the W3C. Browsers, however, do not actually use this DTD when parsing the page.
The most significant difference when including the URL is that modern browsers such as Firefox, Mozilla, Netscape 6+ and Internet Explorer 6+ use the doctype declaration on a page to determine whether to parse the page in either "quirks mode" (or in a backwards-compatible manner) or in "standards compliance mode" (or more in accordance with the actual specification). The doctype without the URL, as you are using, triggers quirks mode. See tedster's excellent Quirks Mode vs. Standards Mode overview [webmasterworld.com] for more information.
Note that the choice of doctype is only relevant to how a page will be displayed in a browser - it has no direct impact on indexation. The "readability" can better be judged by validating your pages:
[validator.w3.org...]
Error-free markup can help avoid parsing problems which could affect rankings.
I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to reliably validate the document. I'm falling back to the "UTF-8" encoding and will attempt to perform the validation, but this is likely to fail for all non-trivial documents.
Any pointers as to what is wrong?