Forum Moderators: open

Message Too Old, No Replies

IE ignores specified content encoding

Bashing my head against the wall...

         

ninja_k

8:04 am on Sep 26, 2005 (gmt 0)



I have a problem and hopefully I'm not breaking the TOS for describing the target audience...

I run a Japanese/English blog on one of those major services XXXXX, it has a fair amount of random viewers. Some of them get updates via RSS reader, some via e-mail (generated by the service), and still others just view the web page every once in a while.

It's the latter case that's become a problem with IE 6.0.29. I'm not sure what changed... but ie has gone belligerent, for lack of a better word. And my end user isn't savvy enough to know how to fix things.

Until recently I specified the encoding of the page with
<content="text/html; charset=utf-8" http-equiv="content-type">
but just recently after said blog provider mucked things up by sticking on a windows-1252 in the server response, I learned that I shouldn't have been using this tag.
according to w3c, I should have been using
<http-equiv="Content-Type" content="text/html; charset=utf-8">
which was actually rather futile since the server's content-type response overrides anything at the page level by default (as I read the spec)
anyway, the server has now been fixed to just return a content type of text/html w/o specifying the charset. Great, we're back in business. Wrong. IE display is broken and I can't for the life of me figure out why.

the start of the page looks like:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<HTML lang="en">
<HEAD>
<http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>Walking another path - 俺の日本でも大冒険談</TITLE>

On my copy of IE, and others as well, the page always comes up as white/no content displayed. The selected encoding, w/o fail, is shift_jis. If I *manually* set the page to utf-8 encoding the page displays without issue. Firefox is fine and chooses the correct encoding.

I'm not sure what the problem is, but I don't think I have any tools to available to diagnose the problem and it's making me a verry unhappy ninja. Has anyone seen this problem before? What am I doing wrong? Are they ar e any tools out there that can help me figure out what is wrong?

and for the kicker, if I view the source of the page into ultraedit, ultraedit correctly detects the character encoding as utf-8. Which means ie loads if loading the same html locally.

big thanks in advance,
all the way from Japan!

JAB Creations

11:31 am on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Ninja...

I'm not big on encodings and such but do you have the option of serving different encodings via serverside based on the useragent? If so that is what I would try as it seems it would be simpler or at least a quick fix. Either way best of luck and have fun in Japan! ^.^

John

tomda

11:50 am on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just looked for more info about the lang statement in the HTML tag (because I never used it before) and saw this in W3C. May be the way you wrote it is incorrect?
When serving XHTML as text/html, you should use both the lang attribute and the xml:lang attribute in the html element. The xml:lang attribute is the standard way to identify language information in XML. The following shows how you would mark up the previous example for XHTML 1.0 served as text/html.

<html lang="zh-CN" xml:lang="zh-CN" xmlns="http://www.w3.org/1999/xhtml">

Not sure it helps.

bill

12:14 pm on Sep 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld ninja_k.

By declaring the language in the HTML tag there as English you may be causing the problems. I was making some kid's blogs in Japanese this weekend, and just declared the charset, but didn't set the language (as I normally do). When testing out the site it looked fine on both Japanese and English OS. This seems to work well with XHTML sites...but I need to do some more browser testing.

ninja_k

11:55 am on Sep 27, 2005 (gmt 0)



nothing works...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<HTML lang="ja-JP" xml:lang="ja-JP" xmlns="http://www.w3.org/1999/xhtml">
<HEAD>
<http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>Walking another path</TITLE>
<LINK REL="alternate" TITLE="Nick's RSS" HREF="http://www.XXXXREMOVEDXXX.com/rss.aspx?user=nicklange" TYPE="application/rss+xml"/>
<style type="text/CSS">

thanks all.
made a couple changes, no dice.
the original TITLE statement (see the OP) being read as shift_jis is what causes the whole thing to garble in IE (HTML is read wrong... yada yada yada). As a compromise to at least make the thing viewable, I've removed the japanese in the title so the page will at least load (albeit interpreted by IE as shift_jis nonetheless).

Is this a bug I should be reporting for IE? this is honestly the wackiest thing I've ever seen...

and sadly this is one of those "big company blogs", so no serving different versions based on the user_agent :(

Why the heck does it ignore


<http-equiv="Content-Type" content="text/html; charset=utf-8">

is there something else I should be putting for IE?

Tidal2

12:35 pm on Sep 27, 2005 (gmt 0)

10+ Year Member



It may be worth erasing those first 6 lines (down to the <LINK ... tag) and then key them back in by hand.

Just in case you have an invisible control character embedded in there.

Long shot though but good luck.

Hester

12:41 pm on Sep 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've always written my code like this:


<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

I think you need to specify the META tag there.

If that doesn't work, try adding the correct charset to the server instead of using none.

Or you could output the exact charset on each page using PHP or ASP. I've taken to doing this recently to force UTF-8 to work. When I'm ready I'll add a rule to make the whole site UTF-8 using the "htaccess" file in the root instead.

Angelis

12:44 pm on Sep 27, 2005 (gmt 0)

10+ Year Member



Try these 3...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; Shift_JIS" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; iso-2022-jp" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; euc-jp" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>

ninja_k

4:28 pm on Sep 27, 2005 (gmt 0)



mystery solved.

As it turns out, hester is correct, the actual META in <meta is required. Go figure.

as I had done a direct C/P job from w3c when I originally put the code in, I didn't even notice that the hosting provider was automatically erasing lowercase "meta", hence leaving an ignored tag.

I don't know why they do that, I'm not inclined to find out. Once I leave this country, I'll start my own server controlled blog again.

uppercase META is not filtered out, and IE happily complies.
This still doesn't explain why perfectly valid utf8 characters are being misinterpreted in IE, but I don't really care at this point either :D (if I ever get that bored to go control-character hunting... maybe)

thank you everyone for all your replies!

cheers,
nick