Forum Moderators: open

Message Too Old, No Replies

Character encoding mismatch

         

arachnoid

2:58 pm on Sep 30, 2007 (gmt 0)

10+ Year Member



I'm in the final stages of redesigning and recoding a site I look after, including uprating to XHTML 1.0 Strict. I've been doing this for all my sites with no problem, but with this one I'm getting a character encoding mismatch when checking W3C validation.

The error message states:

The character encoding specified in the HTTP header (iso-8859-1) is different from the value in the <meta> element (utf-8). I will use the value from the HTTP header (iso-8859-1) for this validation.

This is my header:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

and this is from a page on another site which passes validation with no such error:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Either I'm blind or these page headers are absolutely identical. Does anyone have a scooby why this is happening and what I can do about it?

If it's in any way relevant, the old site is still 'live' at the moment with the new site in a sub-folder, and the old site IS iso-8859-1, but that's the only difference I can finger between this site and all the others I've done which have passed validation.

Marshall

7:08 pm on Sep 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there by chance a iso-8859-1 meta tag buried in the <head> in addition to the utf-8 tag?

Marshall

encyclo

7:22 pm on Sep 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In Firefox, install the "Live HTTP headers" extension (if you don't have it already). When you load the page, you will see in the response headers that ISO-8859-1 is defined as the document charset.

Assuming you're using Apache, you can either unset the default charset, or (if all your documents use the same charset) you can set the header for your actual charset in .htaccess or httpd.conf.

[httpd.apache.org...]

AddDefaultCharset Off

AddDefaultCharset utf-8

arachnoid

7:29 pm on Sep 30, 2007 (gmt 0)

10+ Year Member



Is there by chance a iso-8859-1 meta tag buried in the <head> in addition to the utf-8 tag?

Ahhh ... nothing so deliciously simple, I'm afraid. The header is identical in every respect (except for linked files, etc), to all my other sites which I've had no problem with. The pages were created in DW with XHTML 1.0 Strict + utf-8 selected as the defaults for new page creation and I haven't done anything different with the site to what I've done everywhere else that validates.

Unless the W3C validator swings by the index page on its way to the subfolder and picks up a doctype from there (which seems a bit illogical, but I know very little about its workings), I can't figure where it's getting this from.

I searched for this on the forums before posting and someone had a similar problem a while back that people were suggesting had something to do with server configuration? I didn't understand it, hence my posting my problem now.

arachnoid

7:31 pm on Sep 30, 2007 (gmt 0)

10+ Year Member



Assuming you're using Apache, you can either unset the default charset, or (if all your documents use the same charset) you can set the header for your actual charset in .htaccess or httpd.conf.

OK ... this is a bit over my head I'm afraid. Is this something I should take up with the hosting company?

jdMorgan

7:40 pm on Sep 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> The header is identical in every respect (except for linked files, etc)

I'd suggest you re-read encyclo's reply carefully -- You need to check your server response headers - in addition to the HTML page headers you've posted. Follow his instructions and check the server response headers if you have not already done so.

Jim

arachnoid

7:54 pm on Sep 30, 2007 (gmt 0)

10+ Year Member



I'd suggest you re-read encyclo's reply carefully -- You need to check your server response headers - in addition to the HTML page headers you've posted. Follow his instructions and check the server response headers if you have not already done so.

Am installing the Firefox extension now, though I'm unlikely to be able to check the server response headers tonight as the server is so slow the pages are timing out. The hosting company did a server migration last week and there have been problems since -- from about 6pm Dutch time (where the servers are based), the site is pretty much unaccessible, but I can't take this up with them until tomorrow.

Can you explain to me how it is that the server response headers differ from what I'm instructing it to serve via the HTML page headers? This is not something I've come across before so please bear with my ignorance.

encyclo

8:11 pm on Sep 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is this something I should take up with the hosting company?

The hosting company did a server migration last week

If this is a new server, then yes you should certainly take up the issue with your hosting provider. The heart of the problem is a misconfigured server - many "stock" installs incorrectly set a default charset which is the same as the default charset of the server's OS. The hosting company needs to comment out this setting in the Apache configuration, especially if this is a shared hosting environment.

According to the specification, a charset defined via a HTTP header cannot be overridden by a meta charset element.

(Incidentally, your meta charset element should ideally be placed above your title element.)

arachnoid

8:26 pm on Oct 1, 2007 (gmt 0)

10+ Year Member



The heart of the problem is a misconfigured server - many "stock" installs incorrectly set a default charset which is the same as the default charset of the server's OS. The hosting company needs to comment out this setting in the Apache configuration, especially if this is a shared hosting environment.

Thanks very much for this encyclo.

I checked the server response headers this morning and right enough ISO-8859-1 is being served. I spoke to the hosting company and they later emailed the site owner to say that because the old site is ISO-8859-1 they've maintained that for the server migration and I should continue to do the same. I don't think they quite get the fact that the entire site is being replaced, so we need to clear that up. Aside from that, is there any objection they could possibly have to switching to UTF-8? It's going to save me a deal of time and hassle when it comes to the German and French editions of the site. I only wish I'd had it already for the Portuguese -- if I have to type in yet another character entity reference ...