Forum Moderators: coopster

Message Too Old, No Replies

How is this string getting transformed (utf8)

two strings one from the database.

         

kila_m

9:52 pm on Apr 6, 2015 (gmt 0)

10+ Year Member



I have this string:

"一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石"


and in the database its stored as:

"一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石"


Is this double encoded or triple encoded or just normal UTF8 encoded ?

EDIT: Im not sure why but this website is convering to HTML entitys when I post the code in the first string as normal Kanji.

lucy24

5:57 am on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The immediate bad news is: this site does not-- and will not in the foreseeable future-- support anything outside the Latin-1 character set. So if you're asking about non-ASCII characters, it will take some work just arriving at a way to express the question.

Approaching from the opposite end:

The characters you're working with are in the range 4E00 - 9FFF, unicode E4B880 - E9BFBF. That means that if UTF-8 data is getting interpreted as Latin-1, then each character will come through as three letters, where the first of the three is always one of the group (E4 through E9). The other two will sometimes include non-displaying characters, so there may be hiccups in any visual presentation ... but yup, that sure looks like what you've got there.

UTF-8 encoded data being interpreted as Latin-1. Could be a lot worse ;)

kila_m

8:12 pm on Apr 8, 2015 (gmt 0)

10+ Year Member



Thanks yeah thats how that string is stored in the database. I guess the server configs setup to store in Latin1. Which I dont have root access to.

lucy24

9:02 pm on Apr 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I guess the server configs setup to store in Latin1.

Either that, or the application that's reading the database is interpreting it as Latin-1. This is a real possibility, since the encoding only shifts once in your example. You should be able to change all necessary charset declarations in htaccess. Or via a control-panel setting, if the database lives in a different place than the site that uses it. (I think this is often the case in shared hosting.)

:: detour to look up own host's control panel ::

Darned if I can find it. You may need to ask, unless it's done through phpMyAdmin. (Couldn't confirm this because, uh, I've never opened it and therefore have no idea how to get in. In fact I managed to lock myself out thanks to guessing wrong once too often.)

Readie

1:57 pm on Apr 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can also specify what charset you want to use when connecting to a database with PHP.

With PostgreSQL you add the options when defining your connection string, for example:
pg_connect("host='host' user='user' password='pass' dbname='db' port='5432' options='--client_encoding=UTF8'")


With MySQL you can call a function after connecting (not sure if you can do it as part of the connection, never tried), so for example
$conn = new mysqli("host", "user", "pass", "db");
$conn->set_charset('utf8');