Welcome to WebmasterWorld Guest from 54.242.72.36

Forum Moderators: bill

Message Too Old, No Replies

Japanese Localization

quoting a website with Japanese audience

     

simplesimon

5:34 pm on Sep 13, 2007 (gmt 0)

10+ Year Member



Hi,

I'm in the process of quoting a project where the primary audience will be in Japan. I know I can code a page using the unicode character set and have it display in the standard set of browsers I test in. Not sure though how it's going to look in Japan. I'm not able to find much information on their operating system or browser usage. And will WAP be more of an issue in Japan? The client will be suppying all content in Japanese language.

Thanks in advance for any help.

bill

1:41 am on Sep 15, 2007 (gmt 0)

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Probably one of the easiest ways to deal with this is to buy a copy of VMware and run a virtual PC with the target OS. You can also get Virtual PC free from Microsoft.

jeffposaka

6:05 am on Sep 24, 2007 (gmt 0)

5+ Year Member



That is a great idea. The other thing you could do is by a used cheap laptop with Japanese XP on it. You will find IE6 css bugs in the Japanese version-- that won't appear in other browsers. It will drive you crazy to find them later. It will be worth the $300 to have the Japanese PC available.

simplesimon

3:55 am on Nov 5, 2007 (gmt 0)

10+ Year Member



Thanks for the help. I'm not to the testing phase yet, but have run into a problem. I was asked to develop a simple database ap for the site. Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

-The following after the database connection and prior to the select query:

mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");

I loaded japanese into the database by cutting and pasting the characters from a word document into phpmyadmin. I can see the charcters displayed correctly when browsing the table in phpmyadmin.

When I try to display the data from the database on the web page, I just get question marks (?).

Is there something I'm missing?

phranque

8:59 am on Nov 5, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



go to regional and language setting for windows and make sure you have code pages loaded for the target language(s).
check any other relevant settings there.

fyi you might want to look into iconv - a useful tool for converting encodings if necessary.

simplesimon

6:21 pm on Nov 5, 2007 (gmt 0)

10+ Year Member



Thanks. I took a look at regional settings in xp and confirmed that the east asian language packs are installed.

I also took a look at iconv, but it made my brain hurt, so I downloaded an ap called encoding master. It's telling me the encodings of the text in the source documents I'm using are already UTF-8, but I converted anyway, then copied and pasted into phpmyadmin. All the japanese characters still look good in phpmyadmin and everywhere else up to the point where I try to display them on the web page. I even copied some text and pasted into dreamweaver in the html portion of the document, then with the same clipboard pasted into phpmyadmin, where it appears correctly. When I view on the website though, the text in html portion displays fine, dynamic stuff from database is all question marks (?).

It looks like mysql or php is doing something to the characters in the process of retrieving them from the database, unless I'm misinterpreting my testing. The connection collation is set to utf8_unicode_ci, so I'm at a loss to where else to look.

phranque

12:51 am on Nov 6, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

have you verified that your document has the following or equivalent in the head?

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

jeffposaka

1:03 am on Nov 6, 2007 (gmt 0)

5+ Year Member



One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Word in Japanese has a lot of funky code that screws up a webpage.

simplesimon

1:38 am on Nov 6, 2007 (gmt 0)

10+ Year Member



<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Yes, sorry, should have been more specific. That's the code I have on the web page to set the character set to utf-8. I can display japanese text on that page until the cows come home, just not from the MYSQL database using PHP.

Here's the php code if it helps shed light.

Database connection function:


function db_connect()
{
$connect = mysql_connect("dbhost", "user", "password");
if (!$connect)
return false;
if (!mysql_select_db('database'))
return false;
return $connect;
}

Connect


if (!($conn = db_connect())){
echo 'database error';
return false;
}

Query and display


mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");
$query="select ID, name from lake";
$result=mysql_query($query);
if (!$result)
echo("no data");
while ($myrow=mysql_fetch_array($result)){
echo "<option value=".$myrow['ID'].">".$myrow['name']."</option>";
}

It displays in a dropdown list within a form, but I didn't include the extraneous html code. I've tested display outside of the form with the same result - all question marks.

jeffposaka

2:02 am on Nov 6, 2007 (gmt 0)

5+ Year Member



Is this site at a Japanese hosting company? Sometimes they do not allow utf-8, or the have their system setup with some other charset. Is this on your box or hosted in Japan?

simplesimon

2:05 am on Nov 6, 2007 (gmt 0)

10+ Year Member



One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Thanks for the tip. I'll do that moving forward. Unfortunately didn't make a difference with this problem

simplesimon

2:21 am on Nov 6, 2007 (gmt 0)

10+ Year Member



Is this on your box or hosted in Japan?

I'm actually testing on a Westhost account, which I believe hosts Webmasterworld, or did at one time. Also have some space on 1&1 and have tested with same results.

jeffposaka

2:45 am on Nov 6, 2007 (gmt 0)

5+ Year Member



The reason I ask is that I have actually had similar problems hosting in Japan (ironically) because the hosting firm has locked down everything with one charset.

One thing is curious is that if you change servers and still have the same problem, I wonder if it a problem with the site and not the database server. How about a simple Japanese page in HTML--no database. Does it display ok?

simplesimon

3:55 am on Nov 6, 2007 (gmt 0)

10+ Year Member



How about a simple Japanese page in HTML--no database. Does it display ok?

Yes. The site is actually almost finished. I have about 50 pages of text and a working form that uses PHP to collect data and email results in Japanese. No problems until this. Those pages all use Shift_JIS encoding. I can also display Japanese without a problem using UTF-8 encoding, and I have the results page I'm working with encoded UTF-8 as described previously. I had read that were problems using shift_jis with mysql so I went with UTF-8 for this part of the project. I tried shift_jis too, when I started to have problems with UTF-8, but it wasn't any better. The only encoding that resulted in some asian characters was gb2313(I think), which is for chinese. It wasn't right, but it wasn't just question marks.

phranque

4:22 am on Nov 6, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?
do a "SHOW CREATE TABLE tablename;" to find out.

simplesimon

5:31 am on Nov 6, 2007 (gmt 0)

10+ Year Member



was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?

show create table says: CHARSET=utf8 COLLATE=utf8_unicode_ci

phranque

6:06 am on Nov 6, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



i wish i could scare up a better suggestion, but you could try this (without the single quotes in your example):
mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");

DamonHD

7:50 am on Nov 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As an experiment, try doing the non-ASCII characters with HTML entity codes, thus all the actual characters are 7-bit, but the end browser can decode and display the (Unicode) entities without anything in between mangling them.

Rgds

Damon

simplesimon

3:13 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");

Tried without the single quotes as I had it, but unfortunately same result.

simplesimon

3:26 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



try doing the non-ASCII characters with HTML entity codes

Thanks Damon. Htmlentities work, so does that tell me there is an issue with multi-byte character support in mysql?

simplesimon

3:43 pm on Nov 6, 2007 (gmt 0)

10+ Year Member



The column in question is VARCHAR at 200 characters, and the text is 10 - 15 characters max if that helps.

DamonHD

8:16 pm on Nov 6, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are so many places that can mess up non-7-bit characters on the path to the user's browser that I really can't answer your question... I just continue to finesse the whole issue, and it works for me!

Rgds

Damon

simplesimon

4:51 am on Nov 11, 2007 (gmt 0)

10+ Year Member



Ok. Just lame coding on my part. Instead of this:
mysql_query("SET NAMES 'utf8'");

which is the actual the line of code I had in the script, I needed this:
$var=mysql_query("SET NAMES 'utf8'");

Despite setting everything, including the connection to utf-8 in phpmyadmin, it was still coming over as latin1. I found out by running this

$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:


character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

More research needed to wrap my brain around this stuff, but this was a good start. Thanks for all your help.

Troy

Captain Goggles

7:48 pm on Nov 19, 2007 (gmt 0)

5+ Year Member



I was actually having the exact same problem with Japanese characters appearing as question marks. The following line fixed it though:
mysql_query("SET character_set_results = 'utf8'");

phranque

1:29 am on Nov 20, 2007 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld [webmasterworld.com], Captain Goggles!

simplesimon sez:

$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

nice find!
i never knew about this...

runonce

4:27 am on Apr 16, 2008 (gmt 0)

5+ Year Member



Hi there,

I was having almost the same problem when trying the inverse: UTF-8 encoded strings sent with PHP to MySQL resulted in corrupted strings inside the database.

mysql_query("SET NAMES 'utf8'") did the trick and solved this.

Thank you very much!

Everyone note that 'utf8' has to be without the hyphen in order to be a valid character encoding ('utf-8' would be invalid).

 

Featured Threads

Hot Threads This Week

Hot Threads This Month