validating html - what is character encoding? - HTML forum at WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

validating html - what is character encoding?

SlowMove

3:02 am on Dec 28, 2003 (gmt 0)

I tried to validate a document at w3.org and got the following message:

I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are:
The HTTP Content-Type field.
The XML Declaration.
The HTML "META" element.
And I even tried to autodetect it using the algorithm defined in Appendix F of the XML 1.0 Recommendation.
Since none of these sources yielded any usable information, I will not be able to validate this document. Sorry. Please make sure you specify the character encoding in use.

I don't really understand it.

keyplyr

4:38 am on Dec 28, 2003 (gmt 0)

Several methods will work. Here's what I use:

Mohamed_E

12:35 pm on Dec 28, 2003 (gmt 0)

I don't really understand it.

For a conceptual discussion of the issues see the W3C discussion of HTML Document Representation [w3.org].

In addition to the http-equiv meta tag you can specify the charset in your .htaccess (assuming an apache server):

AddType 'text/html; charset=ISO-8859-1' html

[added]Note that if you use the meta tag you must include it in every file, while a single .htaccess file will affect all files in that directory and its subdirectories![/added]

keyplyr

6:49 pm on Dec 28, 2003 (gmt 0)

Mohamed_E, I didn't know that and eagerly went to test it. But as written, it does not work - at least with my Apache config. I even tried adding a dot before (.)html.

Could there be a typo in your example?

Mohamed_E

7:18 pm on Dec 28, 2003 (gmt 0)

keyplyr,

Doubt it could be a typo, that is what cut and paste avoids! Also check the WDG's validator hints at [htmlhelp.com...]

I do not understand much about apache, but believe that the server administrator somehow defines what parameters users can set with their own .htaccess files. You could get an authoritative answer to that question on the apache forum.

keyplyr

8:53 pm on Dec 28, 2003 (gmt 0)

As far as I know, I have ALL the modules enabled at the server. I look further into it. Anytime I can remove a line of code across the board, it's a very good thing. Thanks.

<added>
My solution is posted in this thread [webmasterworld.com].

thehittmann

3:16 pm on Dec 29, 2003 (gmt 0)

For your site to be able to be validated at w3c
you need to have the html declaration and character set version.

html declaration should be the very first line of every page before your <html> and <head> tags and look like this.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> or similar and the character set should be inside the <head> tag and look like this.

I found that even though some of my pages were otherwise valid (according to html4.01) they would not validate unless these were both present.

g1smd

8:44 pm on Dec 31, 2003 (gmt 0)

Yep, the validator will choke if either of those are missing.

If they are missing then the validator does not know if your page uses the Latin alphabet, or some other script like Greek, Thai, Chinese, Japanese or Arabic and so on, nor does it know if your code is HTML 3.2 or HTML 4.01, or XHTML, nor does it know if the page is a Frameset page or just a normal page of HTML content.

Those two lines are very important.