Forum Moderators: open

Message Too Old, No Replies

character encoding labeling

W3C MarkUp Validation Service can't begin to validate my site

         

KeithDouglas

2:07 pm on Mar 1, 2004 (gmt 0)

10+ Year Member



Background:
A non-profit organization I volunteer for has an old, circa 1999, website (frames, lurid colors, etc.). None of the organization's board members are happy with the 'site. I volunteered to create a completely new website for them and they enthusiastically agreed. I wanted to use a table-free CSS layout. I found a good CSS template that I could borrow for free and I used it. Now the creator of the old website is telling the board that my code isn't good. So I would like to run both sites through an HTML validator to see which has the fewer errors.

Problem:
The W3C MarkUp Validation Service (validator.w3.org) won't even begin to validate my 'site. It says:

I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are:

The HTTP Content-Type field.
The XML Declaration.
The HTML "META" element.

What does this mean?

4css

2:10 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Keith,

Here is what I use for my dtd
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

And in the meta tags
<meta http-equiv="content-type" content="text/html; charset=utf-8" />

See if this helps.

choster

2:28 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HTTP Content-Type field
This
This line tells the validator (or browsing device) how to interpret the characters in the document. Some characters are allowed in certain languages-- human or computer-- and others are not, and while the code used by the computer to store a certain keystroke may be represented "a" in the Roman alphabet, it would display an entirely different character in Cyrillic or Katakana.

If I had to guess, I'd suspect that you aren't using Unicode, but Latin-1. The corresponding code would be

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />

When the <meta content-type /> line is missing, the W3C has rules for trying to guess the correct character set; the error message is simply telling you that it wasn't able to fathom a guess.

KeithDouglas

2:45 pm on Mar 2, 2004 (gmt 0)

10+ Year Member



charset=iso-8859-1

I assume this deals with what characters can appear in the text. Correct? For example, do I use just an "&" or a "&amp;"? Likewise for accented characters, and so on? Is there a good source of this sort of information?

grahamstewart

3:12 pm on Mar 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This w3c doc on character encodings [w3.org] does a good job of explaining it.

Yes using the right character set will mean you can use accented characters. But you still must use

&amp;
instead of & because the & sign has special meaning in html - likewise you must still use
&lt;
and
&gt;
instead of < >

KeithDouglas

4:01 pm on Mar 2, 2004 (gmt 0)

10+ Year Member



you still must use &amp; instead of & because the & sign has special meaning in html - likewise you must still use &lt; and &gt; instead of < >

Aside from the cases you mention (above) I can use normal accented characters, like "á" and not need to use "&aacute;", correct?

Gusgsm

5:48 pm on Mar 2, 2004 (gmt 0)

10+ Year Member



Keith,

You have added the xml declaration just at the beginning, haven't you?:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="es">
<head>
<title>Your title here</title>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=ISO-8859-1" />

... page follows ...

That's fot "ISO-8859-1". You've got to substitute it for "utf-8" if you like and change as well whatever character needed into html entities.

Gusgsm

5:50 pm on Mar 2, 2004 (gmt 0)

10+ Year Member



And ...xml:lang="es"> must be xml:lang="en-uk"> or xml:lang="en-us">... or whatever variation of English you choose, of course.

grahamstewart

10:31 am on Mar 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Its usually a good idea to avoid the
<xml ..>
tag at the beginning of the file as it places some browsers (Internet Explorer) into Quirks Mode, which will make your life a lot harder.

w3c mention this in the document above.

Gusgsm

12:42 pm on Mar 3, 2004 (gmt 0)

10+ Year Member



"w3c mention this in the document above"

oops! You're right :( Sorry. I'll have to recheck.

KeithDouglas

2:14 pm on Mar 3, 2004 (gmt 0)

10+ Year Member



My pages start with


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=ISO-8859-1" />

But I didn't have that first line (mainly because I've never seen it before).

g1smd

2:33 pm on Mar 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you are using plain old HTML, rather then XHTML on your site, then you can get away with a lot less:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>

<title> The Title Goes Here </title>

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<meta http-equiv="Content-Language" content="
en-gb">
<meta http-equiv="imagetoolbar" content="no">

<meta name="Description" content=" The Description Goes Here ">
<meta name="Keywords" content="
Keyword List Goes Here ">

<meta name="Generator" content="WordPad">
<meta name="Date" content="
2003-Mar-01">
<meta name="Author" content="
Your Name">

<meta name="MSSmartTagsPreventParsing" content="TRUE">
<meta name="robots" content="
index,follow">

<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css">@import url(
styles.css);</style>
</head>
<body>

DrDoc

5:22 pm on Mar 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



then you can get away with a lot less

Not quite that much less though ;)
You still need to use a valid doctype [w3.org]:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

KeithDouglas

6:54 pm on Mar 3, 2004 (gmt 0)

10+ Year Member



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

now I'm wondering, "strict" or "loose"?

DrDoc

2:36 am on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would say strict...
As long as you're aware of the fact that it is way less forgiving. But it is definitely the way to go.

KeithDouglas

7:56 pm on Mar 4, 2004 (gmt 0)

10+ Year Member



But it is definitely the way to go.

The advantages are . . .?

Mohamed_E

8:39 pm on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



now I'm wondering, "strict" or "loose"?

I would say that which depends on how valid, or otherwise, your current code is. If errors are few and far between it makes sense to go directly to strict. If it is full of errors you may wish to do as I did; first validate to transitional, then read up on CSS, then go to strict.

grahamstewart

8:35 am on Mar 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The advantages are . . .?

Using the Strict doctype means you can no longer use any of the deprecated elements and attributes, most of which relate to style and positioning which should now be handled by CSS.

If you can stick to the strict doctype then you will end up with html that is much cleaner, shorter and probably more semantic.

As Mohamed_E points out, Transitional is easier to obtain and is probably a good starting point when converting an existing site. But if you are creating a new site from scratch then I can't see any reason not to go for strict.

KeithDouglas

3:22 pm on Mar 11, 2004 (gmt 0)

10+ Year Member



Thanks for all of your advice thus far. I now have perfectly valid HTML on the homepage, and just need to do a few simple fixes on other pages. This feels great!

But . . .

. . . if you are creating a new site from scratch then I can't see any reason not to go for strict.

One of the organization members said something like: "we shouldn't be using XHTML. Too many people are still using old enough browsers, HTML 4 should be the newest format we use."

Is this a reason not to use strict? How should I respond?

grahamstewart

4:48 pm on Mar 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is this a reason not to use strict?

Nope - HTML4 Strict is still HTML4, just... umm.. stricter :)

I'm not even sure it's a valid reason not to use XHTML 1.0 since it was specifically designed to be backwards compatible with HTML.