Forum Moderators: coopster

Message Too Old, No Replies

php parse error

<?xml version="1.0"?>

         

Birdman

11:45 pm on Nov 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<?xml version="1.0"?>
Hello. This tag is throwing errors since I did the mod_rewrite to parse html as php. I had to remove it. Do I really need it? The pages are XHTML 1 transitional.

Robber

11:53 pm on Nov 21, 2002 (gmt 0)

10+ Year Member



Not sure about transitional but I believe it should be there for strict, the xml declaration should only have the version attribute as being required, although I am currently using:

<?xml version="1.0" encoding="utf-8"?>

Might have imagined this but I have a feeling some parsers might like to see the encoding attribute too.

jatar_k

11:57 pm on Nov 21, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Robber,

are you using this
<?xml version="1.0" encoding="utf-8"?>
in a php parsed document?

Birdman

11:58 pm on Nov 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I thought it was part of the DDT, or whatever it's called. I'm not actually trying to do any php with it, but it is being interpreted as such.

Here is what I have:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

Is this incorrect?

jatar_k

12:00 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I think when you put <? the php parser will try to interpret what follows.

Robber

12:00 am on Nov 22, 2002 (gmt 0)

10+ Year Member



Hi JK,

It sure is - its the xml feed from Amazon web services

Robber

12:03 am on Nov 22, 2002 (gmt 0)

10+ Year Member



Let me clarify a bit.

Theres actually two separate pages, one is a static xhtml page with the following set up:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

With the other I have a php script that pulls in the xml feed and spits out html (not xhtml), the xml doc that is parsed has the following xml declaration:
<?xml version="1.0" encoding="UTF-8"?>

andreasfriedrich

1:23 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Birdman wrote
Do I really need it?

Held ;)

A xml declaration is not neccessarily needed, if all other requirements of the relevant specs are met.

To clarify:

A xml declaration is not needed, if UTF-(8¦16) encoding is used.

A xml declaration is not needed, if any other encoding is used and that encoding is specified in a higher-level protocol such as HTTP.

Reasoning

An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.

[w3.org ]



This is a valid and wellformed XHTML document:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Virtual Library</title>
</head>
<body>
<p>Moved to <a href="http://example.org/">example.org</a>.</p>
</body>
</html>

[w3.org ]

AFAIK the xml declaration before the doctype will cause IE6 to display the document in quirks mode even though it should be displayed in standard conforming mode. You need to remove the xml declaration to do that.

That will limit you to using UFT-8 or UTF-16 causing all sorts of problems. If you want to use Latin-1 you really need an xml declaration.

I presume that is why Nick uses something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"> 
<?xml version="1.0"? encoding="iso-8859-1"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

However, this is not legal XML. The XML spec defines the prolog as follows:

[22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?  
[23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'"
¦ '"' VersionNum '"')/* */
[25] Eq ::= S? '=' S?
[26] VersionNum ::= ([a-zA-Z0-9_.:] ¦ '-')+
[27] Misc ::= Comment ¦ PI ¦ S

[w3.org ]

The optional xml declaration is followed by zero or more Misc elements (comments, processing instructions or white space) which is followed by an optional doctype declaration. The XHTML spec requires a Doctype declaration and the xml declaration is neccessary if we want to use an encoding other than UTF-(8¦16).

I can think of only one way to achieve the following:

  • Use IE6 in standards compliant mode (we cannot use xml declaration before doctype)
  • Use Latin-1 encoding (we need ax xml declaration)
  • Be XHTML compliant (we need to comply to the XML spec and we need the doctype declaration)

Such a [xml] declaration is required when [...] no encoding was determined by a higher-level protocol. This quote from the XML spec is the key to solving the above problem.

If the character encoding is set via the HTTP there is no need for an xml declaration.

While for some time it was uncertain whether the character encoding could be specified externally, i.e. in such a higher level protocol, the XML 1.0 Second Edition Specification Errata [w3.org] make this perfectly clear (E23):

Rationale
It was always the intent of the XML 1.0 spec to allow the character encoding to be determined externally. The sentence corrected here was introduced in the second edition.

Andreas

Birdman

1:48 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A xml declaration is not needed, if any other encoding is used and that encoding is specified in a higher-level protocol such as HTTP.

>>>such as HTTP

I don't understand this part. Sorry, I'm an idiot;)

Excellent post, as usual andreas!

andreasfriedrich

2:16 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When the server answers a request it sends a HTTP message consisting of a header and a body. A header may look like this:

HTTP/1.1 200 OK 
Date: Fri, 22 Nov 2002 02:11:01 GMT
Server: Apache/1.3.26 (Unix) PHP/4.2.3 FrontPage/5.0.2.2510
X-Powered-By: PHP/4.2.3
X-Content-Parsed-By: WebDev::ContentManagement
Content-Style-Type: text/css
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Last-Modified: Fri, 22 Nov 2002 02:11:11 GMT
Cache-Control: must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Content-Language: de
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1

The body is the requested resource, i.e. image, pdf or html document.

The line in red is how you specify the character encoding in HTTP.

In PHP you would use code like this to produce such a header:

header("Content-Type: text/html; charset=ISO-8859-1");

Andreas


added: Sorry that the code is displayed in such a small font but that is due to using [ red ] within the above [ pre ] element. Not my fault.

[edited by: andreasfriedrich at 2:19 am (utc) on Nov. 22, 2002]

hpche

2:17 am on Nov 22, 2002 (gmt 0)

10+ Year Member



Would doing something like this work?

<? echo "<?xml version=\"1.0\"?>";?>

andreasfriedrich

2:21 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Indeed, this would solve the problem of the parsing error. It will not solve IE6īs bug as described above.

Andreas

Birdman

2:31 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So I just used Brett's header check tool and get this:

HTTP/1.1 200 OK
Date: Fri, 22 Nov 2002 02:28:32 GMT
Server: Apache/1.3.26 (Unix) mod_log_bytes/0.3 mod_bwlimited/1.0 PHP/4.2.3 FrontPage/5.0.2.2510 mod_ssl/2.8.10 OpenSSL/0.9.6b
X-Powered-By: PHP/4.2.3
Connection: close
Content-Type: text/html

I should use hpche's solution? Thanks for the replys.

andreasfriedrich

2:37 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you could use it, but then there is still the IE6 problem.

Did you get this header after adding the

header("Content-Type: text/html; charset=ISO-8859-1");

to the php document you requested using the header checker?

Andreas

Birdman

2:51 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops! Missed that. I must be tired. It's now showing the character encoding bit, so thank you again AF.

andreasfriedrich

2:54 am on Nov 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is shown in a rather small font after all. ;)