Forum Moderators: coopster

Message Too Old, No Replies

Encoding nightmare

         

humandesigner

2:19 am on Apr 21, 2006 (gmt 0)

10+ Year Member




Hi,

I've just discovered that I'm having quite a big problem.

For years, I've designed Japanese websites with the SHIFT-JIS encoding. Yesterday, on a new project, I've noticed that certain Japanese characters went wrong. After a search, I found the explanation on PHP.net basically saying : "The shift_jis character set includes a number of two-byte code characters that contain the hex-value 0x5c (backslash) which will get stripped by this function thus garbling those characters."

Since I have to continue to use stripslashes(), I have no choice but use a different encoding. I then decided to use : content="text/html; charset=EUC-JP".

The problem is that I get a blank page on IE and fully garbled text on Firefox.

I forgot to do something but what?

Thanks for helping! :)

bill

2:49 am on Apr 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've got one Japanese site that uses PHP but it's using UTF-8 encoding. We had to make sure that our server's PHP had all the multibyte options compiled in. Could it be the way your PHP is setup?

I have never met anyone who recommended EUC-JP encoding. It was a problematic encoding back in the early web days and I've kept away from it myself. Is UTF-8 an option?

humandesigner

3:10 am on Apr 21, 2006 (gmt 0)

10+ Year Member



Hi Bill :)

From what I read everywhere, UTF-8 is problematic as well and the majority of Japanese websites are either shift-ji s or euc-jp. That's why I did choose euc-jp.

Anyway, I've just posted on a Japanese website and I've been told that I simply have to convert the files to the new encoding with the help of a freeware that I've installed now

I'll see how it goes.

[edited by: bill at 5:20 am (utc) on April 21, 2006]
[edit reason] snipped URLs [/edit]

bill

5:26 am on Apr 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, UTF-8 comes with its own problems as well. Shift_JIS is the safest way to go usually.

Let us know how it goes. I'd still be interested to know whether you have the multibyte in your PHP. That's often not part of a standard PHP install unless you did it yourself or have a Japanese host.

humandesigner

9:56 am on Apr 21, 2006 (gmt 0)

10+ Year Member




wow ... things are getting worse.
Since the client is on a host that doesn't permit installations of PHP modules, I won't be able to use the MB functions. And I don't get why I should use those functions anyway.

Also, just for curiosity, I had a look at the encodings of sites of major Japanese companies such as NTT and Sony and they all use Shift-Jis ... which makes me think that perhaps, I'm heading toward the wrong direction.

Maybe I should continue with shift-jis and do something about that stripslashes() function that is currently bothering me.

I admit I've never really fully understood the whole concept of addslashes() and stripslashes(). I know that slashes need to be added during record insertions of special characters in the database. But due to the fact that get_magic_quotes_gpc is on, slashes are applied to any posted data ... and I'm hesitating : should I turn it off and use a function that would add slashes only on specific targeted characters only during database insertions?

As you can see, I'm very confused ;(

I'd be grateful to anyone who would care to give some helpful hint.

jatar_k

5:24 pm on Apr 25, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I would stick with Shift-Jis and look at functions for adding and removing slashes.

I would think there would be some out there.

I worked some with a chinese website that encountered similar problems. They created custom functions for stripping/adding slashes. I didn't really have much to do with it so I can't really suggest any code.

anyone else have any input?

shri

2:48 am on Apr 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is your HTTP_ACCEPT_CHARSET setup correctly?

Mine is --
HTTP_ACCEPT_CHARSET ISO-8859-1,utf-8;