Forum Moderators: coopster
<?xml version="1.0" encoding="utf-8"?>
Might have imagined this but I have a feeling some parsers might like to see the encoding attribute too.
Here is what I have:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
Is this incorrect?
Theres actually two separate pages, one is a static xhtml page with the following set up:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
With the other I have a php script that pulls in the xml feed and spits out html (not xhtml), the xml doc that is parsed has the following xml declaration:
<?xml version="1.0" encoding="UTF-8"?>
Do I really need it?
Held ;)
A xml declaration is not neccessarily needed, if all other requirements of the relevant specs are met.
To clarify:
A xml declaration is not needed, if UTF-(8¦16) encoding is used.
A xml declaration is not needed, if any other encoding is used and that encoding is specified in a higher-level protocol such as HTTP.
Reasoning
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.
[w3.org ]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to <a href="http://example.org/">example.org</a>.</p>
</body>
</html>
[w3.org ]
AFAIK the xml declaration before the doctype will cause IE6 to display the document in quirks mode even though it should be displayed in standard conforming mode. You need to remove the xml declaration to do that.
That will limit you to using UFT-8 or UTF-16 causing all sorts of problems. If you want to use Latin-1 you really need an xml declaration.
I presume that is why Nick uses something like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd">
<?xml version="1.0"? encoding="iso-8859-1"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
However, this is not legal XML. The XML spec defines the prolog as follows:
[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'"
¦ '"' VersionNum '"')/* */
[25] Eq ::= S? '=' S?
[26] VersionNum ::= ([a-zA-Z0-9_.:] ¦ '-')+
[27] Misc ::= Comment ¦ PI ¦ S
[w3.org ]
The optional xml declaration is followed by zero or more Misc elements (comments, processing instructions or white space) which is followed by an optional doctype declaration. The XHTML spec requires a Doctype declaration and the xml declaration is neccessary if we want to use an encoding other than UTF-(8¦16).
I can think of only one way to achieve the following:
Such a [xml] declaration is required when [...] no encoding was determined by a higher-level protocol. This quote from the XML spec is the key to solving the above problem.
If the character encoding is set via the HTTP there is no need for an xml declaration.
While for some time it was uncertain whether the character encoding could be specified externally, i.e. in such a higher level protocol, the XML 1.0 Second Edition Specification Errata [w3.org] make this perfectly clear (E23):
Rationale
It was always the intent of the XML 1.0 spec to allow the character encoding to be determined externally. The sentence corrected here was introduced in the second edition.
Andreas
HTTP/1.1 200 OK
Date: Fri, 22 Nov 2002 02:11:01 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.3 FrontPage/5.0.2.2510
X-Powered-By: PHP/4.2.3
X-Content-Parsed-By: WebDev::ContentManagement
Content-Style-Type: text/css
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Fri, 22 Nov 2002 02:11:11 GMT
Cache-Control: must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Language: de
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1
The body is the requested resource, i.e. image, pdf or html document.
The line in red is how you specify the character encoding in HTTP.
In PHP you would use code like this to produce such a header:
header("Content-Type: text/html; charset=ISO-8859-1"); Andreas
[edited by: andreasfriedrich at 2:19 am (utc) on Nov. 22, 2002]
HTTP/1.1 200 OK
Date: Fri, 22 Nov 2002 02:28:32 GMT
Server: Apache/1.3.26 (Unix) mod_log_bytes/0.3 mod_bwlimited/1.0 PHP/4.2.3 FrontPage/5.0.2.2510 mod_ssl/2.8.10 OpenSSL/0.9.6b
X-Powered-By: PHP/4.2.3
Connection: close
Content-Type: text/html
I should use hpche's solution? Thanks for the replys.