homepage Welcome to WebmasterWorld Guest from 54.161.214.221
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
UTF-8 input problems with Adminer and phpmyadmin
ianevans




msg:4641812
 11:54 pm on Feb 1, 2014 (gmt 0)

I posted this over at Stackoverflow but since this forum is more conducive to conversation, I thought I'd ask for advice here as well. A real head scratcher.

I recently switched my MariaDB database to UTF-8 from Latin1. Read a bunch of checklists and carefully updated my character set, collation, my.cnf and php.ini. I have php forms for most of my data entry on the site, but sometimes for quick little changes, it's easier to go into a program like Adminer or phpmyadmin.

With the UTF-8 in place, I wanted to change director Alfonso Cuaron's name to Cuarón. I went to his entry in Adminer. Edit. Cuar[alt+0243]n. It showed in the edit box as Cuarón. But when I saved the change, Adminer showed it as Cuarón. Okay. Looked at page info in Firefox. Says the character encoding of the page is UTF-8. So all should be well, right?

I went to one of my php data entry forms and created a Bob Cuarón. It showed up fine. I SSH'd into the server fired up a mysql command line and ran an update sql line with Cuarón. That worked. But trying to change it in Adminer still kept giving me Cuarón. I installed phpmyadmin (which was giving me some issues with my nginx config) but I was able to edit his name and...sigh...it too gave me Cuarón. I installed SQLbuddy and...success...I was able to make the changes, but the program is lacking some of the things I need, like the ability to edit search results.

I'm sure I've nailed everything down:

nginx.conf:

charset UTF-8;

my.cnf:

[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
init-connect='SET NAMES utf8'


/etc/php5/fpm/php.ini

mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.encoding_translation = On
mbstring.http_input = auto
mbstring.http_output = UTF-8
mbstring.detect_order = auto
mbstring.substitute_character = none
default_charset = UTF-8


SHOW VARIABLES LIKE "%character_set%";

+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |


I can't see what I could be missing. Both Adminer and phpmyadmin handle UTF-8 so I don't know why it's not working. It worked right out of the box with SQLBuddy, but as I said it's missing some features.

Any thoughts where I should look?

 

lucy24




msg:4641830
 4:10 am on Feb 2, 2014 (gmt 0)

I can't tell from your description whether you see where the problem is happening: text is getting entered as UTF-8 (C3B3, one multi-byte character) but is then reinterpreted as Latin-1 (C3 + B3, two single-byte characters). So the encoding is fine; something's going wrong at the disencoding end.

There's only one error. If it were happening more than once, you'd end up with more than two letters from your initial ó.

ianevans




msg:4641855
 8:55 am on Feb 2, 2014 (gmt 0)

Did you mean the unencoding end in adminer and phpmyadmin?

Here's where I'm confused. When I enter Cuarón into Adminer/phpmyadmin they both show Cuarón but the mysql command line shows Cuarón too. So there's no encoding/unencoding going on in the display. It's passing it straight through.

If I enter Cuarón directly into mysql, the two programs both show Cuarón. Again passing it straight through.

So it appears both programs are encoding UTF-8 into Latin1 when you enter them but leaving it alone when you display. Except both programs are UTF-8 aware, so why are they encoding the entries?

The adminer author says he can't duplicate it on his end and I can't see where my config would be causing it. And that's why I have a head-shaped dent in my desk.

lucy24




msg:4641869
 10:08 am on Feb 2, 2014 (gmt 0)

So there's no encoding/unencoding going on in the display.

Every time something goes from your keyboard to ... anywhere ... it gets encoded. And every time it goes from ... anywhere ... back to your monitor, it gets disencoded. Possibly several times in transit. But at least once each way.

When I enter Cuarón into Adminer/phpmyadmin they both show Cuarón but the mysql command line shows Cuarón too. So there's no encoding/unencoding going on in the display. It's passing it straight through.

The mere fact that ó (C3B3) is coming through as ó (C3 B3) means that something is getting re-encoded.

So it appears both programs are encoding UTF-8 into Latin1 when you enter them but leaving it alone when you display.

I don't see how the conclusion follows from the premises. I'm also not clear what you mean by "encoding UTF-8 into Latin1". The form Cuarón comes from UTF-8-encoded data being interpreted as Latin-1.

There are two other bits of possible experimental data. One: text that doesn't exist in Latin-1, such as a non-roman letter, or a less common diacritic such as a macron. The other: a character such as a curly quote or œ (oelig) that has two different realizations, one in UTF-8 and the other in Windows-Latin-1.

Try each of those, and see what comes out.

Edit: I have been assuming that what I see in this thread is what you see and what you typed. I've just remembered that these forums don't have a built-in charset, so that may not be a safe assumption.

"When I enter {o-acute} into Adminer/phpmyadmin they both show {A-tilde 3super} but the mysql command line shows {A-tilde 3super} too."

That's what I see.

ianevans




msg:4641935
 5:29 pm on Feb 2, 2014 (gmt 0)

Yeah it's hard to keep straight. I'll go through it with your format:

"When I enter {o-acute} into Adminer/phpmyadmin they both show {A-tilde 3super} but the mysql command line shows {A-tilde 3super} too."


Ok...

Adminer/phpmyadmin:
When I enter {o-acute} into Adminer/phpmyadmin, the web form text box I'm typing into shows o-acute.

When I hit enter and they show the result of what I entered they both show A-tilde 3super.

If I then login to mysql from a linux terminal and run a mysql select it shows A-tilde 3super.

Mysql commannd line:
If I do a terminal command line INSERT INTO or UPDATE with o-acute, both a command line SELECT and Adminer/phpmyadmin will display o-acute.

Self-written web forms:
With the forms I use to enter data into my site, entering o-acute will result in the command line and a web page displaying o-acute.

SQLBuddy:
If I enter o-acute, both SQLBuddy and the command line will display o-acute.

SQLBuddy would be the solution for me (I only need a pgm like this for quick-n-dirty changes) but unless I'm missing something I can only edit individual rows from a full table browse. I can't do a Select from people where first='Alfonso' and then click on the displayed result to edit.

Thanks for lending a fresh pair of eyes to this.

lucy24




msg:4641971
 10:19 pm on Feb 2, 2014 (gmt 0)

:: looking vaguely around for the people who understand databases, leading to realization that this is not actually the Databases subforum ::

Know what? I kinda think that everything is happening exactly as intended. There's just a glitch in one or more of the programs that displays the results back to you. In the database itself, things are correctly encoded. This may or may not be a serious problem, depending on what you use the data for.

ianevans




msg:4641980
 10:37 pm on Feb 2, 2014 (gmt 0)

I'd have to somewhat disagree on the "There's just a glitch in one or more of the programs that displays the results back to you."

If the database contained o-acute but the pgms displayed A-tilde 3super I'd say it's a display problem, but two UTF-8 compatible programs are taking o-acute and _storing_ it as A-tilde 3super and displaying it as A-tilde 3super.

Yeah hard to know where to post this as it's not exactly just a PHP or MySQL/MariaDB or nginx issue.

Is it considered OK to go to the database subforum and say "Hey everyone, can you check out this thread here?"

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved