homepage Welcome to WebmasterWorld Guest from 54.196.195.158
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Asia and Pacific Region
Forum Library, Charter, Moderators: bill

Asia and Pacific Region Forum

    
EUC-JP better than SHIFT_JIS?
sleidia




msg:3373675
 4:49 pm on Jun 20, 2007 (gmt 0)

Hello,

Can someone tell me if EUC-JP is better than SHIFT-JIS and why?
I can see that the majority of Japanese sites is using EUC-JP and not SHIFT-JIS and I'm wondering why.

Thanks :)

 

bill




msg:3374092
 12:21 am on Jun 21, 2007 (gmt 0)

I thought it was the other way around. We've had a few threads [google.com] on the topic and I was under the impression from Japanese programmers and site designers that SHIFT-JIS was a better encoding. I was told that it worked better with databases used to power the site's back end. However, that doesn't seem to be as much of an issue these days.

Back in the days of the Netscape browser I had some real problems with older versions being able to handle EUC-JP encoding. None of the modern PC or mobile browsers seem to have issues with EUC-JP that I've heard of.

sleidia




msg:3374772
 3:29 pm on Jun 21, 2007 (gmt 0)


That's weird: I've always been told the contrary.

And I even remember that, after I left a Japanese webagency, the new webmasters quickly changed the encoding of the site to EUC-JP arguing that Shift-JIS can be buggy on mySQL database (is it true?) and that I was very incompetent for using shift-JIS.

And it looks like the majority of Japanese sites use EUC-JP now.

David_M




msg:3374773
 3:29 pm on Jun 21, 2007 (gmt 0)

Actually, I've heard that UTF-8 is the best, but I still use shift-jis.
Shift-jis can cause some display problems, improper kanji being displayed. I've have it happen a couple of times

sleidia




msg:3374781
 3:36 pm on Jun 21, 2007 (gmt 0)

Really?

I've had display issues with UTF-8 (ie: PHP trim function) but absolutely never with Shift-JIS. Japanese clients never reported a single incident or mojibake with shift-JIS.

LifeinAsia




msg:3374792
 3:48 pm on Jun 21, 2007 (gmt 0)

We've always used Shift-JIS for the Japanese versions of our sites. No complaints so far.

sleidia




msg:3374808
 3:59 pm on Jun 21, 2007 (gmt 0)


I just read that Hankaku Kanas aren't supported on EUC-JP.
Still, the majority of sites use EUC-JP instead of shift-JIS.
I don't get it :(

DamonHD




msg:3375027
 7:32 pm on Jun 21, 2007 (gmt 0)

As a stupid gaijin/gawilo that has to produce output in several languages, primarily English and other Latin languages as well as Chinese and Japanese, UTF-8 is the simplest solution for me since it is unambiguous and reasonably standardised and covers everything. And even then I actually encode all the non-7-bit characters as HTML entity codes to avoid them getting mangled between my code and the browser...

As to 'better' or 'worse' I think this is going to be like whether Fuji-san is better or worse than Mt Everest: it all depends on what you mean! B^>

Rgds

Damon

bcolflesh




msg:3375031
 7:39 pm on Jun 21, 2007 (gmt 0)

mojibake

Cool - thanks for the new (to me) word:

[en.wikipedia.org...]

bill




msg:3375263
 12:50 am on Jun 22, 2007 (gmt 0)

Mojibake is now English? Cool. ;)

the majority of sites use EUC-JP instead of shift-JIS.

Do you have stats on this or is this just from your personal experience? I still see a lot of Shift-JIS sites out there.

Unfortunately from what I've heard in the industry UTF-8 is still more problematic than either EUC-JP or Shift-JIS. (That goes for Chinese encoding as well.) There are character display issues with PHP and MySQL for instance that are the bane of developers of Japanese sites. I'm still looking forward to the day when Unicode will truly be the best encoding solution. They're heading in the right direction.

DamonHD




msg:3375507
 8:02 am on Jun 22, 2007 (gmt 0)

The trick of encoding the non-7-bit characters as &nnnnn; entity codes got round a lot of (Java) server problems for me in the early days and still works well. It essentially avoids any broken text-handling component anywhere in the path messing up the text.

Rgds

Damon

sleidia




msg:3375543
 8:57 am on Jun 22, 2007 (gmt 0)


Thanks again guys :)

From what I read, I think I'll choose the following option :
- use UTF-8 for western languages
- use shift-JIS for Japanese

What do you think?

Also, I have another question:
I have developed my own multilingual CMS that use the dedicated ISO encoding for every language. All the PHP files are in ANSI mode, the texts for the interface are taken from flat text files, the websites content is taken from mySQL databases. I've heard that it's necessary to convert the PHP files into UTF-8 for the encoding to work. Is that true? Even if the PHP files contain only code? And I have to convert the flat text files and the mySQL tables, right?

Sorry but I'm getting very confused with all those encoding issues :(

sleidia




msg:3375548
 9:05 am on Jun 22, 2007 (gmt 0)

Do you have stats on this or is this just from your personal experience?

You're right, it's from personal experience.
I went through several sites last day and they all were in EUC-JP. I can't find statistics for websites encodings :(

bill




msg:3376486
 5:49 am on Jun 23, 2007 (gmt 0)

Here are some offical stats. I looked at the source of the top 25 Japanese websites [alexa.com] according to Alexa. (I skipped the English language sites.)

  1. Yahoo Japan = EUC-JP
  2. Google Japan = UTF-8
  3. Mixi = EUC-JP
  4. FC2 = Shift_JIS
  5. Rakuten = x-euc-jp
  6. YouTube Japan = none
  7. Livedoor = UTF-8
  8. Goo = UTF-8
  9. MSN Japan = UTF-8
  10. Wikipedia = UTF-8
  11. Amazon Japan = UTF-8
  12. Infoseek Japan = EUC-JP
  13. Nifty = Shift_JIS
  14. 2ch.net = x-sjis
  15. Nicovideo.jp = UTF-8
  16. Hatena = UTF-8
  17. Geocities Japan = EUC-JP
  18. BIGLOBE = Shift_JIS
  19. Sakura Internet = Shift_JIS
  20. Ameba = UTF-8
  21. Seesaa = UTF-8
  22. OCN = Shift_JIS
  23. Mobile Space = none
  24. Excite Japan = Shift_JIS
  25. Microsoft Japan = UTF-16

And the winner is:

  • UTF-8 = 10
  • Shift_JIS = 6
  • EUC-JP = 4
  • others = 3
  • none = 2

sleidia




msg:3376524
 7:38 am on Jun 23, 2007 (gmt 0)

Thanks!

So, I think I'll have to learn how to handle UTF-8 properly before I can use it seamlessly.

David_M




msg:3379910
 5:56 am on Jun 27, 2007 (gmt 0)

I just remembered, email!
You gotta set the encoding properly for that too!

encyclo




msg:3383233
 1:34 am on Jul 1, 2007 (gmt 0)

From bill's list, the actual character encodings (compared to declared charsets) are slightly different in some cases. For the two sites marked "none" the encodings are UTF-8 for YouTube Japan and Shift_JIS for Mobile Space. Rakuten is actually EUC-JP, 2ch.net is Shift_JIS, and Microsoft Japan is UTF-8.

So the final tally is as follows:

  • UTF-8 = 12
  • Shift_JIS = 8
  • EUC-JP = 5

[edited by: encyclo at 7:25 pm (utc) on July 1, 2007]

sleidia




msg:3383391
 9:52 am on Jul 1, 2007 (gmt 0)

So, nobody knows how to safely transform shift-JIS/ISO sites into UTF-8?
I can hardly find any useful info on the internet :(

encyclo




msg:3383662
 7:30 pm on Jul 1, 2007 (gmt 0)

how to safely transform shift-JIS/ISO sites into UTF-8?

If you're running Linux, the best way is to use the iconv utility:

[gnu.org...]

Also available via PHP:

[php.net...]

I don't know the best way to convert documents under Windows, unfortunately, other than using the same library via PHP.

bill




msg:3383857
 4:26 am on Jul 2, 2007 (gmt 0)

Thanks for the follow up on that one encyclo.

So the final tally is as follows:

  • UTF-8 = 12
  • Shift_JIS = 8
  • EUC-JP = 5

I got a little lazy there. Thanks for keeping me on my toes. ;) I always tell people to check out your excellent thread: Character encoding, entity references and UTF-8 [webmasterworld.com]

sleidia




msg:3384053
 11:02 am on Jul 2, 2007 (gmt 0)

Thanks for the link Bill :)
Too bad all my previous searches never pointed me to it :(

Thanks Encyclo :)
But it will be more time saving for me if I can find a Windows based UTF-8 converter.

I've found one called "Character Set Converter 1.3.7" but it can't convert from Asian character sets :(

sleidia




msg:3388495
 6:51 pm on Jul 7, 2007 (gmt 0)

From what I read here, UTF-8 and PHP doesn't function too well together : [phpwact.org...]

I think I'll wait for a stable PHP6 before moving toward UTF-8.

What do you think?

Olney




msg:3390092
 1:22 am on Jul 10, 2007 (gmt 0)

Shift_Jis is also iMode compatible where all the keitais aren't UTF friendly yet.
For php there is a mod that can change encodes because usually UTF-8 is for RSS but shift-Jis is for iMode. Many Japanese scripts that I've seen just use the same code to change the encode.

A long time ago I used to create websites in Japanese just using shift_JIS & English pages with the normal Western ICO encode. The backlinks from the English pages that were more popular had no affect on the Japanese pages. One day I changed everything to UTF-8 the rankings for all the Japanese pages went up. This was about 3 years ago but I think it might still apply.

jeffposaka




msg:3390141
 2:54 am on Jul 10, 2007 (gmt 0)

Sleidia,

I have mostly problem free experiences with utf-8 for Japanese sites.

The only problems I have had is when moving MySQL databases to different servers. Sometimes all hell broke loose but I think it was from me not setting up the database import correctly for utf-8 on the new server.

I have more problems with email encoding than website encoding....

sleidia




msg:3390446
 11:06 am on Jul 10, 2007 (gmt 0)

Jeff,

You're using MBstring, right?
All the problem I got were from using PHP functions that don't support multibyte encodings like trim() for example.

Also, it can happen that a problem arises only on a certain Kanji which makes it very difficult to spot.

I don't get why all the PHP packages weren't provided with MBstring already included. There are too many shared environments that don't have MBstring.

Just out of curiosity: what makes it so difficult to use UTF-8 on emails?

jeffposaka




msg:3390487
 12:11 pm on Jul 10, 2007 (gmt 0)

It seems like I get a lot of different emails with different encodings so it is a bit of hassle to always change the encoding to read them. If I am not careful when I forward an email, then I send bakemoji email to others. Browsers seem better at sorting out what charset is used but email clients or webmail seem to have trouble automatically understanding what to do.

sleidia




msg:3391467
 1:01 pm on Jul 11, 2007 (gmt 0)

Can someone help me with ISO/JIS charsets on mySQL?

I want to know what the best practice is for storing both ISO and JIS strings in a unique mysql column from a website that uses either ISO or JIS encodings.

Should the mysql server be set to a specific charset?
Should I specify a charset when I store the data?

Thanks a lot for helping.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Asia and Pacific Region
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved