Forum Moderators: open
I'm in the process of quoting a project where the primary audience will be in Japan. I know I can code a page using the unicode character set and have it display in the standard set of browsers I test in. Not sure though how it's going to look in Japan. I'm not able to find much information on their operating system or browser usage. And will WAP be more of an issue in Japan? The client will be suppying all content in Japanese language.
Thanks in advance for any help.
-The following after the database connection and prior to the select query:
mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");
I loaded japanese into the database by cutting and pasting the characters from a word document into phpmyadmin. I can see the charcters displayed correctly when browsing the table in phpmyadmin.
When I try to display the data from the database on the web page, I just get question marks (?).
Is there something I'm missing?
I also took a look at iconv, but it made my brain hurt, so I downloaded an ap called encoding master. It's telling me the encodings of the text in the source documents I'm using are already UTF-8, but I converted anyway, then copied and pasted into phpmyadmin. All the japanese characters still look good in phpmyadmin and everywhere else up to the point where I try to display them on the web page. I even copied some text and pasted into dreamweaver in the html portion of the document, then with the same clipboard pasted into phpmyadmin, where it appears correctly. When I view on the website though, the text in html portion displays fine, dynamic stuff from database is all question marks (?).
It looks like mysql or php is doing something to the characters in the process of retrieving them from the database, unless I'm misinterpreting my testing. The connection collation is set to utf8_unicode_ci, so I'm at a loss to where else to look.
Here is what I have now:
-PHP Version 4.4.7 with both multibyte and japanese support enabled
-MySQL5.0 database with all collations set to utf8_unicode_ci
-Web page charset=UTF-8
have you verified that your document has the following or equivalent in the head?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Yes, sorry, should have been more specific. That's the code I have on the web page to set the character set to utf-8. I can display japanese text on that page until the cows come home, just not from the MYSQL database using PHP.
Here's the php code if it helps shed light.
Database connection function:
function db_connect()
{
$connect = mysql_connect("dbhost", "user", "password");
if (!$connect)
return false;
if (!mysql_select_db('database'))
return false;
return $connect;
}
Connect
if (!($conn = db_connect())){
echo 'database error';
return false;
}
Query and display
mysql_query("SET NAMES 'utf-8'");
mysql_query("SET CHARACTER SET 'utf-8'");
$query="select ID, name from lake";
$result=mysql_query($query);
if (!$result)
echo("no data");
while ($myrow=mysql_fetch_array($result)){
echo "<option value=".$myrow['ID'].">".$myrow['name']."</option>";
}
It displays in a dropdown list within a form, but I didn't include the extraneous html code. I've tested display outside of the form with the same result - all question marks.
One thing I would add is avoid copying and pasting from a word document, add one more step and copy and paste into a wordpad text document, then from the wordpad text document into your database, cms, etc...
Thanks for the tip. I'll do that moving forward. Unfortunately didn't make a difference with this problem
One thing is curious is that if you change servers and still have the same problem, I wonder if it a problem with the site and not the database server. How about a simple Japanese page in HTML--no database. Does it display ok?
How about a simple Japanese page in HTML--no database. Does it display ok?
Yes. The site is actually almost finished. I have about 50 pages of text and a working form that uses PHP to collect data and email results in Japanese. No problems until this. Those pages all use Shift_JIS encoding. I can also display Japanese without a problem using UTF-8 encoding, and I have the results page I'm working with encoded UTF-8 as described previously. I had read that were problems using shift_jis with mysql so I went with UTF-8 for this part of the project. I tried shift_jis too, when I started to have problems with UTF-8, but it wasn't any better. The only encoding that resulted in some asian characters was gb2313(I think), which is for chinese. It wasn't right, but it wasn't just question marks.
mysql_query("SET NAMES 'utf8'");
$var=mysql_query("SET NAMES 'utf8'");
Despite setting everything, including the connection to utf-8 in phpmyadmin, it was still coming over as latin1. I found out by running this
$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");
Which resulted in this:
character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
More research needed to wrap my brain around this stuff, but this was a good start. Thanks for all your help.
Troy
simplesimon sez:
$rs = mysql_query("SHOW VARIABLES LIKE 'character_set_%'");
Which resulted in this:character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
nice find!
i never knew about this...
I was having almost the same problem when trying the inverse: UTF-8 encoded strings sent with PHP to MySQL resulted in corrupted strings inside the database.
mysql_query("SET NAMES 'utf8'") did the trick and solved this.
Thank you very much!
Everyone note that 'utf8' has to be without the hyphen in order to be a valid character encoding ('utf-8' would be invalid).