Forum Moderators: open
Problem:
The W3C MarkUp Validation Service (validator.w3.org) won't even begin to validate my 'site. It says:
I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are:The HTTP Content-Type field.
The XML Declaration.
The HTML "META" element.
What does this mean?
Here is what I use for my dtd
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
And in the meta tags
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
See if this helps.
HTTP Content-Type fieldThis
If I had to guess, I'd suspect that you aren't using Unicode, but Latin-1. The corresponding code would be
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
When the <meta content-type /> line is missing, the W3C has rules for trying to guess the correct character set; the error message is simply telling you that it wasn't able to fathom a guess.
Yes using the right character set will mean you can use accented characters. But you still must use
& instead of & because the & sign has special meaning in html - likewise you must still use < and > instead of < >
You have added the xml declaration just at the beginning, haven't you?:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es">
<head>
<title>Your title here</title>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=ISO-8859-1" />
... page follows ...
That's fot "ISO-8859-1". You've got to substitute it for "utf-8" if you like and change as well whatever character needed into html entities.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=ISO-8859-1" />
But I didn't have that first line (mainly because I've never seen it before).
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title> The Title Goes Here </title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Language" content="en-gb">
<meta http-equiv="imagetoolbar" content="no">
<meta name="Description" content=" The Description Goes Here ">
<meta name="Keywords" content=" Keyword List Goes Here ">
<meta name="Generator" content="WordPad">
<meta name="Date" content="2003-Mar-01">
<meta name="Author" content="Your Name">
<meta name="MSSmartTagsPreventParsing" content="TRUE">
<meta name="robots" content="index,follow">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css">@import url(styles.css);</style>
</head>
<body>
then you can get away with a lot less
Not quite that much less though ;)
You still need to use a valid doctype [w3.org]:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
now I'm wondering, "strict" or "loose"?
I would say that which depends on how valid, or otherwise, your current code is. If errors are few and far between it makes sense to go directly to strict. If it is full of errors you may wish to do as I did; first validate to transitional, then read up on CSS, then go to strict.
The advantages are . . .?
Using the Strict doctype means you can no longer use any of the deprecated elements and attributes, most of which relate to style and positioning which should now be handled by CSS.
If you can stick to the strict doctype then you will end up with html that is much cleaner, shorter and probably more semantic.
As Mohamed_E points out, Transitional is easier to obtain and is probably a good starting point when converting an existing site. But if you are creating a new site from scratch then I can't see any reason not to go for strict.
But . . .
. . . if you are creating a new site from scratch then I can't see any reason not to go for strict.
One of the organization members said something like: "we shouldn't be using XHTML. Too many people are still using old enough browsers, HTML 4 should be the newest format we use."
Is this a reason not to use strict? How should I respond?