homepage Welcome to WebmasterWorld Guest from 54.145.183.169
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Asia and Pacific Region
Forum Library, Charter, Moderators: bill

Asia and Pacific Region Forum

    
EUC-JP better than SHIFT_JIS?
sleidia

10+ Year Member



 
Msg#: 3373673 posted 4:49 pm on Jun 20, 2007 (gmt 0)

Hello,

Can someone tell me if EUC-JP is better than SHIFT-JIS and why?
I can see that the majority of Japanese sites is using EUC-JP and not SHIFT-JIS and I'm wondering why.

Thanks :)

 

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 12:21 am on Jun 21, 2007 (gmt 0)

I thought it was the other way around. We've had a few threads [google.com] on the topic and I was under the impression from Japanese programmers and site designers that SHIFT-JIS was a better encoding. I was told that it worked better with databases used to power the site's back end. However, that doesn't seem to be as much of an issue these days.

Back in the days of the Netscape browser I had some real problems with older versions being able to handle EUC-JP encoding. None of the modern PC or mobile browsers seem to have issues with EUC-JP that I've heard of.

sleidia

10+ Year Member



 
Msg#: 3373673 posted 3:29 pm on Jun 21, 2007 (gmt 0)


That's weird: I've always been told the contrary.

And I even remember that, after I left a Japanese webagency, the new webmasters quickly changed the encoding of the site to EUC-JP arguing that Shift-JIS can be buggy on mySQL database (is it true?) and that I was very incompetent for using shift-JIS.

And it looks like the majority of Japanese sites use EUC-JP now.

David_M

10+ Year Member



 
Msg#: 3373673 posted 3:29 pm on Jun 21, 2007 (gmt 0)

Actually, I've heard that UTF-8 is the best, but I still use shift-jis.
Shift-jis can cause some display problems, improper kanji being displayed. I've have it happen a couple of times

sleidia

10+ Year Member



 
Msg#: 3373673 posted 3:36 pm on Jun 21, 2007 (gmt 0)

Really?

I've had display issues with UTF-8 (ie: PHP trim function) but absolutely never with Shift-JIS. Japanese clients never reported a single incident or mojibake with shift-JIS.

LifeinAsia

WebmasterWorld Administrator lifeinasia us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3373673 posted 3:48 pm on Jun 21, 2007 (gmt 0)

We've always used Shift-JIS for the Japanese versions of our sites. No complaints so far.

sleidia

10+ Year Member



 
Msg#: 3373673 posted 3:59 pm on Jun 21, 2007 (gmt 0)


I just read that Hankaku Kanas aren't supported on EUC-JP.
Still, the majority of sites use EUC-JP instead of shift-JIS.
I don't get it :(

DamonHD

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3373673 posted 7:32 pm on Jun 21, 2007 (gmt 0)

As a stupid gaijin/gawilo that has to produce output in several languages, primarily English and other Latin languages as well as Chinese and Japanese, UTF-8 is the simplest solution for me since it is unambiguous and reasonably standardised and covers everything. And even then I actually encode all the non-7-bit characters as HTML entity codes to avoid them getting mangled between my code and the browser...

As to 'better' or 'worse' I think this is going to be like whether Fuji-san is better or worse than Mt Everest: it all depends on what you mean! B^>

Rgds

Damon

bcolflesh

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3373673 posted 7:39 pm on Jun 21, 2007 (gmt 0)

mojibake

Cool - thanks for the new (to me) word:

[en.wikipedia.org...]

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 12:50 am on Jun 22, 2007 (gmt 0)

Mojibake is now English? Cool. ;)

the majority of sites use EUC-JP instead of shift-JIS.

Do you have stats on this or is this just from your personal experience? I still see a lot of Shift-JIS sites out there.

Unfortunately from what I've heard in the industry UTF-8 is still more problematic than either EUC-JP or Shift-JIS. (That goes for Chinese encoding as well.) There are character display issues with PHP and MySQL for instance that are the bane of developers of Japanese sites. I'm still looking forward to the day when Unicode will truly be the best encoding solution. They're heading in the right direction.

DamonHD

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3373673 posted 8:02 am on Jun 22, 2007 (gmt 0)

The trick of encoding the non-7-bit characters as &nnnnn; entity codes got round a lot of (Java) server problems for me in the early days and still works well. It essentially avoids any broken text-handling component anywhere in the path messing up the text.

Rgds

Damon

sleidia

10+ Year Member



 
Msg#: 3373673 posted 8:57 am on Jun 22, 2007 (gmt 0)


Thanks again guys :)

From what I read, I think I'll choose the following option :
- use UTF-8 for western languages
- use shift-JIS for Japanese

What do you think?

Also, I have another question:
I have developed my own multilingual CMS that use the dedicated ISO encoding for every language. All the PHP files are in ANSI mode, the texts for the interface are taken from flat text files, the websites content is taken from mySQL databases. I've heard that it's necessary to convert the PHP files into UTF-8 for the encoding to work. Is that true? Even if the PHP files contain only code? And I have to convert the flat text files and the mySQL tables, right?

Sorry but I'm getting very confused with all those encoding issues :(

sleidia

10+ Year Member



 
Msg#: 3373673 posted 9:05 am on Jun 22, 2007 (gmt 0)

Do you have stats on this or is this just from your personal experience?

You're right, it's from personal experience.
I went through several sites last day and they all were in EUC-JP. I can't find statistics for websites encodings :(

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 5:49 am on Jun 23, 2007 (gmt 0)

Here are some offical stats. I looked at the source of the top 25 Japanese websites [alexa.com] according to Alexa. (I skipped the English language sites.)

  1. Yahoo Japan = EUC-JP
  2. Google Japan = UTF-8
  3. Mixi = EUC-JP
  4. FC2 = Shift_JIS
  5. Rakuten = x-euc-jp
  6. YouTube Japan = none
  7. Livedoor = UTF-8
  8. Goo = UTF-8
  9. MSN Japan = UTF-8
  10. Wikipedia = UTF-8
  11. Amazon Japan = UTF-8
  12. Infoseek Japan = EUC-JP
  13. Nifty = Shift_JIS
  14. 2ch.net = x-sjis
  15. Nicovideo.jp = UTF-8
  16. Hatena = UTF-8
  17. Geocities Japan = EUC-JP
  18. BIGLOBE = Shift_JIS
  19. Sakura Internet = Shift_JIS
  20. Ameba = UTF-8
  21. Seesaa = UTF-8
  22. OCN = Shift_JIS
  23. Mobile Space = none
  24. Excite Japan = Shift_JIS
  25. Microsoft Japan = UTF-16

And the winner is:

  • UTF-8 = 10
  • Shift_JIS = 6
  • EUC-JP = 4
  • others = 3
  • none = 2

sleidia

10+ Year Member



 
Msg#: 3373673 posted 7:38 am on Jun 23, 2007 (gmt 0)

Thanks!

So, I think I'll have to learn how to handle UTF-8 properly before I can use it seamlessly.

David_M

10+ Year Member



 
Msg#: 3373673 posted 5:56 am on Jun 27, 2007 (gmt 0)

I just remembered, email!
You gotta set the encoding properly for that too!

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 1:34 am on Jul 1, 2007 (gmt 0)

From bill's list, the actual character encodings (compared to declared charsets) are slightly different in some cases. For the two sites marked "none" the encodings are UTF-8 for YouTube Japan and Shift_JIS for Mobile Space. Rakuten is actually EUC-JP, 2ch.net is Shift_JIS, and Microsoft Japan is UTF-8.

So the final tally is as follows:

  • UTF-8 = 12
  • Shift_JIS = 8
  • EUC-JP = 5

[edited by: encyclo at 7:25 pm (utc) on July 1, 2007]

sleidia

10+ Year Member



 
Msg#: 3373673 posted 9:52 am on Jul 1, 2007 (gmt 0)

So, nobody knows how to safely transform shift-JIS/ISO sites into UTF-8?
I can hardly find any useful info on the internet :(

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 7:30 pm on Jul 1, 2007 (gmt 0)

how to safely transform shift-JIS/ISO sites into UTF-8?

If you're running Linux, the best way is to use the iconv utility:

[gnu.org...]

Also available via PHP:

[php.net...]

I don't know the best way to convert documents under Windows, unfortunately, other than using the same library via PHP.

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3373673 posted 4:26 am on Jul 2, 2007 (gmt 0)

Thanks for the follow up on that one encyclo.

So the final tally is as follows:

  • UTF-8 = 12
  • Shift_JIS = 8
  • EUC-JP = 5

I got a little lazy there. Thanks for keeping me on my toes. ;) I always tell people to check out your excellent thread: Character encoding, entity references and UTF-8 [webmasterworld.com]

sleidia

10+ Year Member



 
Msg#: 3373673 posted 11:02 am on Jul 2, 2007 (gmt 0)

Thanks for the link Bill :)
Too bad all my previous searches never pointed me to it :(

Thanks Encyclo :)
But it will be more time saving for me if I can find a Windows based UTF-8 converter.

I've found one called "Character Set Converter 1.3.7" but it can't convert from Asian character sets :(

sleidia

10+ Year Member



 
Msg#: 3373673 posted 6:51 pm on Jul 7, 2007 (gmt 0)

From what I read here, UTF-8 and PHP doesn't function too well together : [phpwact.org...]

I think I'll wait for a stable PHP6 before moving toward UTF-8.

What do you think?

Olney

5+ Year Member



 
Msg#: 3373673 posted 1:22 am on Jul 10, 2007 (gmt 0)

Shift_Jis is also iMode compatible where all the keitais aren't UTF friendly yet.
For php there is a mod that can change encodes because usually UTF-8 is for RSS but shift-Jis is for iMode. Many Japanese scripts that I've seen just use the same code to change the encode.

A long time ago I used to create websites in Japanese just using shift_JIS & English pages with the normal Western ICO encode. The backlinks from the English pages that were more popular had no affect on the Japanese pages. One day I changed everything to UTF-8 the rankings for all the Japanese pages went up. This was about 3 years ago but I think it might still apply.

jeffposaka

5+ Year Member



 
Msg#: 3373673 posted 2:54 am on Jul 10, 2007 (gmt 0)

Sleidia,

I have mostly problem free experiences with utf-8 for Japanese sites.

The only problems I have had is when moving MySQL databases to different servers. Sometimes all hell broke loose but I think it was from me not setting up the database import correctly for utf-8 on the new server.

I have more problems with email encoding than website encoding....

sleidia

10+ Year Member



 
Msg#: 3373673 posted 11:06 am on Jul 10, 2007 (gmt 0)

Jeff,

You're using MBstring, right?
All the problem I got were from using PHP functions that don't support multibyte encodings like trim() for example.

Also, it can happen that a problem arises only on a certain Kanji which makes it very difficult to spot.

I don't get why all the PHP packages weren't provided with MBstring already included. There are too many shared environments that don't have MBstring.

Just out of curiosity: what makes it so difficult to use UTF-8 on emails?

jeffposaka

5+ Year Member



 
Msg#: 3373673 posted 12:11 pm on Jul 10, 2007 (gmt 0)

It seems like I get a lot of different emails with different encodings so it is a bit of hassle to always change the encoding to read them. If I am not careful when I forward an email, then I send bakemoji email to others. Browsers seem better at sorting out what charset is used but email clients or webmail seem to have trouble automatically understanding what to do.

sleidia

10+ Year Member



 
Msg#: 3373673 posted 1:01 pm on Jul 11, 2007 (gmt 0)

Can someone help me with ISO/JIS charsets on mySQL?

I want to know what the best practice is for storing both ISO and JIS strings in a unique mysql column from a website that uses either ISO or JIS encodings.

Should the mysql server be set to a specific charset?
Should I specify a charset when I store the data?

Thanks a lot for helping.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Asia and Pacific Region
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved