Welcome to WebmasterWorld Guest from 54.226.133.245

Forum Moderators: bill

Message Too Old, No Replies

Japanese Localization

quoting a website with Japanese audience

     
5:34 pm on Sep 13, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


Hi,

I'm in the process of quoting a project where the primary audience will be in Japan. I know I can code a page using the unicode character set and have it display in the standard set of browsers I test in. Not sure though how it's going to look in Japan. I'm not able to find much information on their operating system or browser usage. And will WAP be more of an issue in Japan? The client will be suppying all content in Japanese language.

Thanks in advance for any help.

1:41 am on Sept 15, 2007 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14487
votes: 49


Probably one of the easiest ways to deal with this is to buy a copy of VMware and run a virtual PC with the target OS. You can also get Virtual PC free from Microsoft.
6:05 am on Sept 24, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2006
posts: 103
votes: 0


That is a great idea. The other thing you could do is by a used cheap laptop with Japanese XP on it. You will find IE6 css bugs in the Japanese version-- that won't appear in other browsers. It will drive you crazy to find them later. It will be worth the $300 to have the Japanese PC available.
3:55 am on Nov 5, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


Thanks for the help. I'm not to the testing phase yet, but have run into a problem. I was asked to develop a simple database ap for the site. Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

-The following after the database connection and prior to the select query:

mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");

I loaded japanese into the database by cutting and pasting the characters from a word document into phpmyadmin. I can see the charcters displayed correctly when browsing the table in phpmyadmin.

When I try to display the data from the database on the web page, I just get question marks (?).

Is there something I'm missing?

8:59 am on Nov 5, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


go to regional and language setting for windows and make sure you have code pages loaded for the target language(s).
check any other relevant settings there.

fyi you might want to look into iconv - a useful tool for converting encodings if necessary.

6:21 pm on Nov 5, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


Thanks. I took a look at regional settings in xp and confirmed that the east asian language packs are installed.

I also took a look at iconv, but it made my brain hurt, so I downloaded an ap called encoding master. It's telling me the encodings of the text in the source documents I'm using are already UTF-8, but I converted anyway, then copied and pasted into phpmyadmin. All the japanese characters still look good in phpmyadmin and everywhere else up to the point where I try to display them on the web page. I even copied some text and pasted into dreamweaver in the html portion of the document, then with the same clipboard pasted into phpmyadmin, where it appears correctly. When I view on the website though, the text in html portion displays fine, dynamic stuff from database is all question marks (?).

It looks like mysql or php is doing something to the characters in the process of retrieving them from the database, unless I'm misinterpreting my testing. The connection collation is set to utf8_unicode_ci, so I'm at a loss to where else to look.

12:51 am on Nov 6, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

have you verified that your document has the following or equivalent in the head?

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

1:03 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2006
posts: 103
votes: 0


One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Word in Japanese has a lot of funky code that screws up a webpage.

1:38 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Yes, sorry, should have been more specific. That's the code I have on the web page to set the character set to utf-8. I can display japanese text on that page until the cows come home, just not from the MYSQL database using PHP.

Here's the php code if it helps shed light.

Database connection function:


function db_connect()
{
$connect = mysql_connect("dbhost", "user", "password");
if (!$connect)
return false;
if (!mysql_select_db('database'))
return false;
return $connect;
}

Connect


if (!($conn = db_connect())){
echo 'database error';
return false;
}

Query and display


mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");
$query="select ID, name from lake";
$result=mysql_query($query);
if (!$result)
echo("no data");
while ($myrow=mysql_fetch_array($result)){
echo "<option value=".$myrow['ID'].">".$myrow['name']."</option>";
}

It displays in a dropdown list within a form, but I didn't include the extraneous html code. I've tested display outside of the form with the same result - all question marks.

2:02 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2006
posts: 103
votes: 0


Is this site at a Japanese hosting company? Sometimes they do not allow utf-8, or the have their system setup with some other charset. Is this on your box or hosted in Japan?
2:05 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Thanks for the tip. I'll do that moving forward. Unfortunately didn't make a difference with this problem

2:21 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


Is this on your box or hosted in Japan?

I'm actually testing on a Westhost account, which I believe hosts Webmasterworld, or did at one time. Also have some space on 1&1 and have tested with same results.

2:45 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 15, 2006
posts: 103
votes: 0


The reason I ask is that I have actually had similar problems hosting in Japan (ironically) because the hosting firm has locked down everything with one charset.

One thing is curious is that if you change servers and still have the same problem, I wonder if it a problem with the site and not the database server. How about a simple Japanese page in HTML--no database. Does it display ok?

3:55 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


How about a simple Japanese page in HTML--no database. Does it display ok?

Yes. The site is actually almost finished. I have about 50 pages of text and a working form that uses PHP to collect data and email results in Japanese. No problems until this. Those pages all use Shift_JIS encoding. I can also display Japanese without a problem using UTF-8 encoding, and I have the results page I'm working with encoded UTF-8 as described previously. I had read that were problems using shift_jis with mysql so I went with UTF-8 for this part of the project. I tried shift_jis too, when I started to have problems with UTF-8, but it wasn't any better. The only encoding that resulted in some asian characters was gb2313(I think), which is for chinese. It wasn't right, but it wasn't just question marks.

4:22 am on Nov 6, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?
do a "SHOW CREATE TABLE tablename;" to find out.
5:31 am on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?

show create table says: CHARSET=utf8 COLLATE=utf8_unicode_ci

6:06 am on Nov 6, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


i wish i could scare up a better suggestion, but you could try this (without the single quotes in your example):
mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");
7:50 am on Nov 6, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 25, 2004
posts:2156
votes: 0


As an experiment, try doing the non-ASCII characters with HTML entity codes, thus all the actual characters are 7-bit, but the end browser can decode and display the (Unicode) entities without anything in between mangling them.

Rgds

Damon

3:13 pm on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");

Tried without the single quotes as I had it, but unfortunately same result.

3:26 pm on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


try doing the non-ASCII characters with HTML entity codes

Thanks Damon. Htmlentities work, so does that tell me there is an issue with multi-byte character support in mysql?

3:43 pm on Nov 6, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


The column in question is VARCHAR at 200 characters, and the text is 10 - 15 characters max if that helps.
8:16 pm on Nov 6, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 25, 2004
posts:2156
votes: 0


There are so many places that can mess up non-7-bit characters on the path to the user's browser that I really can't answer your question... I just continue to finesse the whole issue, and it works for me!

Rgds

Damon

4:51 am on Nov 11, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2004
posts: 60
votes: 2


Ok. Just lame coding on my part. Instead of this:
mysql_query("SET NAMES 'utf8'");

which is the actual the line of code I had in the script, I needed this:
$var=mysql_query("SET NAMES 'utf8'");

Despite setting everything, including the connection to utf-8 in phpmyadmin, it was still coming over as latin1. I found out by running this

$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:


character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

More research needed to wrap my brain around this stuff, but this was a good start. Thanks for all your help.

Troy

7:48 pm on Nov 19, 2007 (gmt 0)

New User

5+ Year Member

joined:Nov 16, 2007
posts:1
votes: 0


I was actually having the exact same problem with Japanese characters appearing as question marks. The following line fixed it though:
mysql_query("SET character_set_results = 'utf8'");
1:29 am on Nov 20, 2007 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:10544
votes: 8


welcome to WebmasterWorld [webmasterworld.com], Captain Goggles!

simplesimon sez:

$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

nice find!
i never knew about this...

4:27 am on Apr 16, 2008 (gmt 0)

New User

5+ Year Member

joined:Apr 16, 2008
posts: 5
votes: 0


Hi there,

I was having almost the same problem when trying the inverse: UTF-8 encoded strings sent with PHP to MySQL resulted in corrupted strings inside the database.

mysql_query("SET NAMES 'utf8'") did the trick and solved this.

Thank you very much!

Everyone note that 'utf8' has to be without the hyphen in order to be a valid character encoding ('utf-8' would be invalid).