Forum Moderators: open
I run a Japanese/English blog on one of those major services XXXXX, it has a fair amount of random viewers. Some of them get updates via RSS reader, some via e-mail (generated by the service), and still others just view the web page every once in a while.
It's the latter case that's become a problem with IE 6.0.29. I'm not sure what changed... but ie has gone belligerent, for lack of a better word. And my end user isn't savvy enough to know how to fix things.
Until recently I specified the encoding of the page with
<content="text/html; charset=utf-8" http-equiv="content-type">
but just recently after said blog provider mucked things up by sticking on a windows-1252 in the server response, I learned that I shouldn't have been using this tag.
according to w3c, I should have been using
<http-equiv="Content-Type" content="text/html; charset=utf-8">
which was actually rather futile since the server's content-type response overrides anything at the page level by default (as I read the spec)
anyway, the server has now been fixed to just return a content type of text/html w/o specifying the charset. Great, we're back in business. Wrong. IE display is broken and I can't for the life of me figure out why.
the start of the page looks like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<HTML lang="en">
<HEAD>
<http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>Walking another path - 俺の日本でも大冒険談</TITLE>
On my copy of IE, and others as well, the page always comes up as white/no content displayed. The selected encoding, w/o fail, is shift_jis. If I *manually* set the page to utf-8 encoding the page displays without issue. Firefox is fine and chooses the correct encoding.
I'm not sure what the problem is, but I don't think I have any tools to available to diagnose the problem and it's making me a verry unhappy ninja. Has anyone seen this problem before? What am I doing wrong? Are they ar e any tools out there that can help me figure out what is wrong?
and for the kicker, if I view the source of the page into ultraedit, ultraedit correctly detects the character encoding as utf-8. Which means ie loads if loading the same html locally.
big thanks in advance,
all the way from Japan!
When serving XHTML as text/html, you should use both the lang attribute and the xml:lang attribute in the html element. The xml:lang attribute is the standard way to identify language information in XML. The following shows how you would mark up the previous example for XHTML 1.0 served as text/html.<html lang="zh-CN" xml:lang="zh-CN" xmlns="http://www.w3.org/1999/xhtml">
Not sure it helps.
By declaring the language in the HTML tag there as English you may be causing the problems. I was making some kid's blogs in Japanese this weekend, and just declared the charset, but didn't set the language (as I normally do). When testing out the site it looked fine on both Japanese and English OS. This seems to work well with XHTML sites...but I need to do some more browser testing.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<HTML lang="ja-JP" xml:lang="ja-JP" xmlns="http://www.w3.org/1999/xhtml">
<HEAD>
<http-equiv="Content-Type" content="text/html; charset=utf-8">
<TITLE>Walking another path</TITLE>
<LINK REL="alternate" TITLE="Nick's RSS" HREF="http://www.XXXXREMOVEDXXX.com/rss.aspx?user=nicklange" TYPE="application/rss+xml"/>
<style type="text/CSS">
Is this a bug I should be reporting for IE? this is honestly the wackiest thing I've ever seen...
and sadly this is one of those "big company blogs", so no serving different versions based on the user_agent :(
Why the heck does it ignore
<http-equiv="Content-Type" content="text/html; charset=utf-8">
is there something else I should be putting for IE?
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
I think you need to specify the META tag there.
If that doesn't work, try adding the correct charset to the server instead of using none.
Or you could output the exact charset on each page using PHP or ASP. I've taken to doing this recently to force UTF-8 to work. When I'm ready I'll add a rule to make the whole site UTF-8 using the "htaccess" file in the root instead.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; Shift_JIS" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; iso-2022-jp" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; euc-jp" />
<meta http-equiv="Content-Language" content="ja">
</head>
</html>
As it turns out, hester is correct, the actual META in <meta is required. Go figure.
as I had done a direct C/P job from w3c when I originally put the code in, I didn't even notice that the hosting provider was automatically erasing lowercase "meta", hence leaving an ignored tag.
I don't know why they do that, I'm not inclined to find out. Once I leave this country, I'll start my own server controlled blog again.
uppercase META is not filtered out, and IE happily complies.
This still doesn't explain why perfectly valid utf8 characters are being misinterpreted in IE, but I don't really care at this point either :D (if I ever get that bored to go control-character hunting... maybe)
thank you everyone for all your replies!
cheers,
nick