Welcome to WebmasterWorld Guest from 54.227.157.163

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Japanese text outputs Question Marks

Don't understand what I'm missing

     
7:03 am on Sep 3, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2004
posts: 615
votes: 0


Hello All -

I've been fighting with this on and off for days - and I've been all over the web reading about this apparently typical issue - but I just can't see where I'm going wrong.

So far, here's what I've got:

mysql database properties:
character set: utf8 -- UTF-8 Unicode
collation: utf8_general_ci

table field properties:
character set: utf8 -- UTF-8 Unicode
collation: utf8_general_ci

Page DocType:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">

Content Type Meta Tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

When I put Japanese text into a DB FIELD, it's all good: characters are Japanese (not garbled or "?")

If I simply echo a string of Japanese characters, those show up fine as well.

Gosh, what am I missing? Do I need to tweak my .ini settings?

Any help greatly appreciated

Neophyte

9:55 am on Sept 3, 2008 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3127
votes: 1


When I put Japanese text into a DB FIELD, it's all good: characters are Japanese (not garbled or "?")

If I simply echo a string of Japanese characters, those show up fine as well.

So, is it Japanese text in the HTML page itself that doesn't show up correctly (ie. question marks)? Is your webpage itself actually saved as UTF-8?

11:03 am on Sept 3, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2004
posts: 615
votes: 0


Hi Penders... thanks for the reply.

It's the text (fed from the DB) that shows up all question marks. If I just "manually" echo a string of Japanese characters on a page, that shows up okay. But when database text is displayed, that's the text that shows up as question marks.

11:39 pm on Sept 3, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:June 6, 2005
posts:109
votes: 0


I reckon there's an issue with the way PHP is handling the strings.

Don't think posting the link is allowed, but if you search for "php utf-8 cheat sheet" you'll find an article that has a pretty good explanation of getting PHP working with UTF-8.

Edit: Forgot to say it's also worth trying mysql_query("SET NAMES 'utf8'") after you create the DB connection.

[edited by: MattAU at 11:47 pm (utc) on Sep. 3, 2008]

11:59 pm on Sept 4, 2008 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 20, 2004
posts: 615
votes: 0


Matt -

Thank you, thank you for the "SET NAMES" idea - implemented same and BAM! Japanese text! Wow. Went through the cheat sheet "link" that you mentioned - what a GREAT resource!

From reading that link I've implemented the following so far after my DB connection:

1. mysql_query("SET CHARACTER SET 'utf8'");
2. mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
3. Uncommented extension=php_mbstring.dll in my .ini
4. mbstring.language = Japanese in my .ini

and, as mentioned, all is well now.

BUT HERE'S A FOLLOW-ON to you or anyone else:

Even though everything APPEARS to now be well, I need to get this as bullet-proof as possible as it's part of a framework I'm building for all projects. The link you mentioned did have A LOT of instruction regarding uncommenting many more lines in the .ini (which I haven't done yet) so do you - or someone else here - think that uncommenting these other lines are really necessary?

I'm sure they these other items ARE probably necessary under varying circumstances/languages but I just wanted to ask.

Matt, thanks again!

Neophyte

12:52 am on Sept 5, 2008 (gmt 0)

Junior Member

10+ Year Member

joined:June 6, 2005
posts:109
votes: 0


I'm glad you got it sorted neophyte :)

Not really being an expert in this area, take what I say with a grain of salt...

As far as I can see you have two options to getting UTF-8 working well with PHP. If you do neither of these two things then your UTF-8 support hasn't been fully implemented so you might not always get the results you expect...

1 - Use mb_string and set the required variables in php.ini / .htaccess

2 - Use a third-party tool or library such as PHP UTF-8 [sourceforge.net].

Option 1 is easy, quick (in terms of processing speed), but not supported by all hosts (mb_string is needed as well as setting php values)

Option 2 is harder, slower and doesn't require anything special from your host.

Bonus Option 3 is writing your own conversion tools which is much, much harder and has no real benefit over option 2... :)

I personally use .htaccess to set the mb_string options. If a host doesn't allow it then they're not the host for me. If the framework you're making is intended for wide-spread use you may need to look at option 2.