Forum Moderators: open

Message Too Old, No Replies

UTF-8? Should I be using this, or...?

         

Wlauzon

9:29 pm on Feb 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I currently have my default blank page set up like the snippet below, but not sure if I should be using UTF-8 or something else? Also not sure if there is any good reason to keep or not keep the "script" statement?
------------

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<LINK REL=StyleSheet HREF="fpweb:///windsun.css" TYPE="text/css" MEDIA=screen>
<META http-equiv="Content-Script-Type" content="text/javascript">
<META name="description" content="good stuff here">

<title>New Page 1</title>

</head>

<body>

encyclo

2:10 am on Feb 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are several things wrong with your template - the first is that you should virtually never use an XML prolog for XHTML, and its presence pushes IE6 into quirks mode. IE7 fixes this behaviour, so it is important to be consistent otherwise your site could break in IE7. The doctype should be on the very first line of your page.

The second problem is that you are declaring the charset as UTF-8 with the prolog, then redefining it as windows-1252 in the meta element just below. You need to decide which encoding you are using and declare it once, either in a meta element or via an HTTP header.

Your meta elements are wrong in two ways, firstly you are using capitals whereas XHTML syntax only allows lower-case, and secondly you need to close meta elements with a trailing slash.

Finally, you are not declaring the document language (you should add a

lang
attribute to the
<html>
tag), and your Content-Script-Type meta element is unnecessary as you declare the type when including a script.

You should put your pages through the HTML validator [validator.w3.org] to better identify problems. :)

Wlauzon

8:47 am on Feb 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sigh.. Had 13 errors - mostly the non lower case stuff - before it even got to the <body> part.

OK, let's try this again...
-----------------------------

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html lang='en' xml:lang='en' xmlns='http://www.w3.org/1999/xhtml'>

<!-- #BeginTemplate "main.dwt" -->

<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<link rel="stylesheet" href="stylesheet.css" type="text/css" media="screen" />
<meta name="description" content="stuff here" />

This at least validates now.

JAB Creations

7:35 pm on Feb 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The encoding you should use depends on the needs of what you are creating the page for. If you're writing the page in English then your encoding should look like this...

<?xml version="1.0" encoding="iso-8859-1"?>

You should also check to ensure what encoding the file is saved as when you are at the "Save As..." promt in your editor.

Not trying to undermine encyclo in ANY way but Internet Explorer divides what is right from what is practiced. Your encoding is supposed to be declared in your XML declaration in XHTML. You could use meta data but I believe this method of declaring encoding is been deprecated. Someone may be able to clarify what version though...

XHTML is a subset of XML so you should declar XML first. As a lot of?SGML junk is allowed to get through in to Transitional XHTML you can get away without an XML declaration. Like encyclo said anything before the Doctype in IE6 or earlier throws IE in to quirks mode so it is a headache to say the least...

But if you do not plan on using non-English characters then you should stick to iso-8859-1 or better yet take other people's advice (like encyclo) on the encoding topic more specifically. I just wanted to clarify the save as dialog and where it is proper to declar the encoding on the page itself.

We've been having an interesting dicussion about UTF-8 at this thread...
[webmasterworld.com...]

encyclo

8:26 pm on Feb 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Internet Explorer divides what is right from what is practiced

Agreed, but the problem is that there is often a clash between de-facto versus de-jure standards. Virtually every implementation of XHTML is served as HTML, a context where the XML prolog has no meaning. Secondly, as previously mentioned, the presence of a prolog triggers quirks mode in IE6. It shouldn't, but the real-world browser share means that using the prolog must be considered bad practice. Finally, even if using XHTML with an appropriate MIME type, the prolog is optional anyway when using UTF-8.

The combination of the above reasons makes the use of an XML prolog a bad idea - you should either declare the encoding with a HTTP header sent by the server, or with a meta charset element (when serving XHTML as HTML).

Wlauzon

9:46 pm on Feb 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have tried it with and without the prolog in IE and Firefox, and in Firefox it seems to make no difference, but in IE it adds a tiny 1 or 2 px margin.. not sure what is going on to cause that. I only tried the UTF-8 prolog so not certain if it was the UTF-8 part or the prolog that was doing it.

I have searched the net for the use of the prolog, and opinions seem to be all over on it, but most larger sites do not seem to have it. But then again, it seems that just having a valid DOCTYPE at all puts me ahead of 90% of the websites out there :/

I am trying to get all this "infrastructure" right the first time, because several hundred pages will be based on it, and a lot easier to get it right now than have to go back and change it later.

I have seen some sites that recommend that all new sites should be UTF-8, but it also seems that some browsers don't like it.. so not sure what to do there. All we use is English/Euro so not sure if there is any compelling reason to use it.

You would think that W3C would give more info, but if you look at their pages, some have the prolog, some don't, some use UTF-8, some use nothing, some have the ISO..

This is one header from W3C:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Compound Document Formats (CDF)</title>
<link rel="stylesheet" type="text/css" media="screen" href="/StyleSheets/base.css"/>
<link rel="stylesheet" type="text/css" media="screen" href="style.css"/>
<!-- todo: add handheld, print and alternate styles -->
</head>
<body>

Yet their main page uses UTF-8 and the prolog:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

So anyway, it appears that for the time being.. there really are no standards?