homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Asia and Pacific Region
Forum Library, Charter, Moderators: bill

Asia and Pacific Region Forum

    
Japanese Localization
quoting a website with Japanese audience
simplesimon

10+ Year Member



 
Msg#: 3450164 posted 5:34 pm on Sep 13, 2007 (gmt 0)

Hi,

I'm in the process of quoting a project where the primary audience will be in Japan. I know I can code a page using the unicode character set and have it display in the standard set of browsers I test in. Not sure though how it's going to look in Japan. I'm not able to find much information on their operating system or browser usage. And will WAP be more of an issue in Japan? The client will be suppying all content in Japanese language.

Thanks in advance for any help.

 

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3450164 posted 1:41 am on Sep 15, 2007 (gmt 0)

Probably one of the easiest ways to deal with this is to buy a copy of VMware and run a virtual PC with the target OS. You can also get Virtual PC free from Microsoft.

jeffposaka

5+ Year Member



 
Msg#: 3450164 posted 6:05 am on Sep 24, 2007 (gmt 0)

That is a great idea. The other thing you could do is by a used cheap laptop with Japanese XP on it. You will find IE6 css bugs in the Japanese version-- that won't appear in other browsers. It will drive you crazy to find them later. It will be worth the $300 to have the Japanese PC available.

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 3:55 am on Nov 5, 2007 (gmt 0)

Thanks for the help. I'm not to the testing phase yet, but have run into a problem. I was asked to develop a simple database ap for the site. Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

-The following after the database connection and prior to the select query:

mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");

I loaded japanese into the database by cutting and pasting the characters from a word document into phpmyadmin. I can see the charcters displayed correctly when browsing the table in phpmyadmin.

When I try to display the data from the database on the web page, I just get question marks (?).

Is there something I'm missing?

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3450164 posted 8:59 am on Nov 5, 2007 (gmt 0)

go to regional and language setting for windows and make sure you have code pages loaded for the target language(s).
check any other relevant settings there.

fyi you might want to look into iconv - a useful tool for converting encodings if necessary.

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 6:21 pm on Nov 5, 2007 (gmt 0)

Thanks. I took a look at regional settings in xp and confirmed that the east asian language packs are installed.

I also took a look at iconv, but it made my brain hurt, so I downloaded an ap called encoding master. It's telling me the encodings of the text in the source documents I'm using are already UTF-8, but I converted anyway, then copied and pasted into phpmyadmin. All the japanese characters still look good in phpmyadmin and everywhere else up to the point where I try to display them on the web page. I even copied some text and pasted into dreamweaver in the html portion of the document, then with the same clipboard pasted into phpmyadmin, where it appears correctly. When I view on the website though, the text in html portion displays fine, dynamic stuff from database is all question marks (?).

It looks like mysql or php is doing something to the characters in the process of retrieving them from the database, unless I'm misinterpreting my testing. The connection collation is set to utf8_unicode_ci, so I'm at a loss to where else to look.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3450164 posted 12:51 am on Nov 6, 2007 (gmt 0)

Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8

have you verified that your document has the following or equivalent in the head?

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

jeffposaka

5+ Year Member



 
Msg#: 3450164 posted 1:03 am on Nov 6, 2007 (gmt 0)

One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Word in Japanese has a lot of funky code that screws up a webpage.

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 1:38 am on Nov 6, 2007 (gmt 0)

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Yes, sorry, should have been more specific. That's the code I have on the web page to set the character set to utf-8. I can display japanese text on that page until the cows come home, just not from the MYSQL database using PHP.

Here's the php code if it helps shed light.

Database connection function:

function db_connect()
{
$connect = mysql_connect("dbhost", "user", "password");
if (!$connect)
return false;
if (!mysql_select_db('database'))
return false;
return $connect;
}

Connect

if (!($conn = db_connect())){
echo 'database error';
return false;
}

Query and display

mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");
$query="select ID, name from lake";
$result=mysql_query($query);
if (!$result)
echo("no data");
while ($myrow=mysql_fetch_array($result)){
echo "<option value=".$myrow['ID'].">".$myrow['name']."</option>";
}

It displays in a dropdown list within a form, but I didn't include the extraneous html code. I've tested display outside of the form with the same result - all question marks.

jeffposaka

5+ Year Member



 
Msg#: 3450164 posted 2:02 am on Nov 6, 2007 (gmt 0)

Is this site at a Japanese hosting company? Sometimes they do not allow utf-8, or the have their system setup with some other charset. Is this on your box or hosted in Japan?

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 2:05 am on Nov 6, 2007 (gmt 0)

One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...

Thanks for the tip. I'll do that moving forward. Unfortunately didn't make a difference with this problem

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 2:21 am on Nov 6, 2007 (gmt 0)

Is this on your box or hosted in Japan?

I'm actually testing on a Westhost account, which I believe hosts Webmasterworld, or did at one time. Also have some space on 1&1 and have tested with same results.

jeffposaka

5+ Year Member



 
Msg#: 3450164 posted 2:45 am on Nov 6, 2007 (gmt 0)

The reason I ask is that I have actually had similar problems hosting in Japan (ironically) because the hosting firm has locked down everything with one charset.

One thing is curious is that if you change servers and still have the same problem, I wonder if it a problem with the site and not the database server. How about a simple Japanese page in HTML--no database. Does it display ok?

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 3:55 am on Nov 6, 2007 (gmt 0)

How about a simple Japanese page in HTML--no database. Does it display ok?

Yes. The site is actually almost finished. I have about 50 pages of text and a working form that uses PHP to collect data and email results in Japanese. No problems until this. Those pages all use Shift_JIS encoding. I can also display Japanese without a problem using UTF-8 encoding, and I have the results page I'm working with encoded UTF-8 as described previously. I had read that were problems using shift_jis with mysql so I went with UTF-8 for this part of the project. I tried shift_jis too, when I started to have problems with UTF-8, but it wasn't any better. The only encoding that resulted in some asian characters was gb2313(I think), which is for chinese. It wasn't right, but it wasn't just question marks.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3450164 posted 4:22 am on Nov 6, 2007 (gmt 0)

was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?
do a "SHOW CREATE TABLE tablename;" to find out.

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 5:31 am on Nov 6, 2007 (gmt 0)

was the table created with CHARACTER SET utf8 and COLLATE utf8_general_ci?

show create table says: CHARSET=utf8 COLLATE=utf8_unicode_ci

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3450164 posted 6:06 am on Nov 6, 2007 (gmt 0)

i wish i could scare up a better suggestion, but you could try this (without the single quotes in your example):
mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");

DamonHD

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3450164 posted 7:50 am on Nov 6, 2007 (gmt 0)

As an experiment, try doing the non-ASCII characters with HTML entity codes, thus all the actual characters are 7-bit, but the end browser can decode and display the (Unicode) entities without anything in between mangling them.

Rgds

Damon

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 3:13 pm on Nov 6, 2007 (gmt 0)

mysql_query("SET NAMES utf-8");
mysql_query("SET CHARACTER SET utf-8");

Tried without the single quotes as I had it, but unfortunately same result.

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 3:26 pm on Nov 6, 2007 (gmt 0)

try doing the non-ASCII characters with HTML entity codes

Thanks Damon. Htmlentities work, so does that tell me there is an issue with multi-byte character support in mysql?

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 3:43 pm on Nov 6, 2007 (gmt 0)

The column in question is VARCHAR at 200 characters, and the text is 10 - 15 characters max if that helps.

DamonHD

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3450164 posted 8:16 pm on Nov 6, 2007 (gmt 0)

There are so many places that can mess up non-7-bit characters on the path to the user's browser that I really can't answer your question... I just continue to finesse the whole issue, and it works for me!

Rgds

Damon

simplesimon

10+ Year Member



 
Msg#: 3450164 posted 4:51 am on Nov 11, 2007 (gmt 0)

Ok. Just lame coding on my part. Instead of this:
mysql_query("SET NAMES 'utf8'");

which is the actual the line of code I had in the script, I needed this:
$var=mysql_query("SET NAMES 'utf8'");

Despite setting everything, including the connection to utf-8 in phpmyadmin, it was still coming over as latin1. I found out by running this
$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:

character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

More research needed to wrap my brain around this stuff, but this was a good start. Thanks for all your help.

Troy

Captain Goggles

5+ Year Member



 
Msg#: 3450164 posted 7:48 pm on Nov 19, 2007 (gmt 0)

I was actually having the exact same problem with Japanese characters appearing as question marks. The following line fixed it though:
mysql_query("SET character_set_results = 'utf8'");

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3450164 posted 1:29 am on Nov 20, 2007 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], Captain Goggles!

simplesimon sez:
$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");

Which resulted in this:
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8

nice find!
i never knew about this...

runonce

5+ Year Member



 
Msg#: 3450164 posted 4:27 am on Apr 16, 2008 (gmt 0)

Hi there,

I was having almost the same problem when trying the inverse: UTF-8 encoded strings sent with PHP to MySQL resulted in corrupted strings inside the database.

mysql_query("SET NAMES 'utf8'") did the trick and solved this.

Thank you very much!

Everyone note that 'utf8' has to be without the hyphen in order to be a valid character encoding ('utf-8' would be invalid).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Asia and Pacific Region
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved