Forum Moderators: open

Message Too Old, No Replies

Need to make page in Japanese

static page, dynamic w/ php/mysql

         

louponne

11:07 am on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have just read back through a year of WebmasterWorld on "Asia and Pacific Region" and although there have been a number of threads on this subject, I'm afraid I haven't found answers to my questions.

We have a client with a site in English who now wants us to:
- make one static page in Japanese, they'll provide the translated texts.
- develop a "news" page in Japanese - so we need to allow input into the site's mysql db in Japanese, and then produce the dynamic page (we have already done that in English)

How to do this? Does anyone have some url resources to read up on this?

Heeellllp!

davidpbrown

11:46 am on Dec 1, 2003 (gmt 0)

10+ Year Member



I know little about mySQL but here suggests support for Unicode..
[mysql.com ]

Encode the page and any includes in UTF-8==Unicode. This is fairly straightforward but make sure the text editor you use isn't saving in UTF-16 as many default to this without reference to UTF-8.

The best resource for Unicode is [alanwood.net ]
and unicode org itself [unicode.org ]

include the lang tag
<html lang="ja">

and ideally UTF-8 charset detail in your headers
<?php
header("Content-Type: text/html; charset=utf-8");
?>

Hopefully that will be everything you need.

Good luck
dpb

louponne

1:28 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know little about mySQL but here suggests support for Unicode..

yes, according to the manual, it looks as if the latest versions of mysql can handle that. hm, I'll have to check to see what version's on the server!

As for the static pages, if I understand what you're saying, I should be able to handle the html pages in a text editor if I check the way it "encodes".

How 'bout fonts? I mean we do
Verdana, Arial, Geneva, sans-serif
What's done in Japanese?

davidpbrown

2:42 pm on Dec 1, 2003 (gmt 0)

10+ Year Member



Yes, I was tempted to include a bit more..

Each character is represented by numbers by the computer.
The tables the computer uses to match each number to a character vary from ASCII to iso-8859-1 to allsorts.

Fonts are similarly matching numbers to glyphs(=~character graphics).

and therein lies confusion.

As ASCII is a subset of iso-8859-1, being an english speaker you would rarely notice problems. However in other languages if you switch font you could well get a whole mess of characters nothing like the original text. You might notice this with European languages such as Finnish where accented characters can get confused with other similar characters if you switch fonts.

This is where Unicode comes in. Unicode is font independant, if the font a computer is using doesn't contain representation for that glyph an empty box will show in place of that character.

Unicode has become the defacto standard for internationalising character representation, (everywhere that is but China which uses GB.) Each number represents a unique glyph.

Conveniently, iso-8859-1 is a subset of Unicode, so english speeakers have an easy time and copying, for instance, from html into text files, doesn't change the appearance of the text. However what is actually happening is that in copying, the editor is filtering the text, as it receives it, through whichever table that editor is using, commonly ASCII or iso-8859-1. Therefore any international characters won't be properly represented or recorded.. often they are replaced by? marks.

(This is my experience on Windows 98, I beleive XP+ has more support for Unicode throughout its software)

Japanese is typically encoded in shift_jis, iso-2022-jp, euc-jp. These may be difficult to convert into Unicode, I couldn't tell you. So, I would suggest, it is important that you request your Japanese translation be sent to you as UTF=Unicode. All this involves is probably looking to the "save as", and just below the filename should be the like of .rtf .doc .txt and in there, if the editor can handle it will be Unicode or UTF-8. When sending HTML emails there is typically an encoding option, in, I think, the 'view' menu, and you can then switch to different encodings.

There are numerous editors you can use if you've not got one already.. [alanwood.net ] has a list. Personally I would recommend 'EmEditor' as it can remove byte order marks, which can sometimes cause problems.

I suggested above that if the font doesn't have representation for a character you will see an empty box. Typically if a Japanese page, for instance, is being viewed a computer with the ability to show Japanese, it will already have fonts that it knows can present Japanese and will use these by default. You can of course suggest one you know to be commonly avaliable. (A common Japanese system font is apparently "MSGothic", I don't know of others. You may see mention of Code2000 but this is a very large font set including most of the Unicode table, ie. most every one of the worlds character in every language and this can slow computers down. Better to find specifically 'Japanese Unicode fonts').

Hope this makes things a little clearer.

dpb

takagi

2:56 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How 'bout fonts? I mean we do
Verdana, Arial, Geneva, sans-serif
What's done in Japanese?

On a Windows PC you have several like:
MS ゴシック (MS Gothic)
MS Pゴシック (MS P Gothic),
MS 明朝 (MS Mincho)
MS P明朝 (MS P Mincho).

On the bottom of this page [tohoho.wakusei.ne.jp] there is a picture where you can see the differences even if you don't have the right fonts on your PC. The picture also includes some Macintosh fonts (like Osaka, Heisei Mincho, etc.) See also this example [www2.inforyoma.or.jp] and this example [masaboo.cside.com].

louponne

3:19 pm on Dec 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wow, many thanks, friends!

Looks like I have some reading to do!

bill

12:11 am on Dec 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Although Unicode sounds like a good idea I have recently heard from Japanese programmers who do web databases that it is still too problematic to implement in their applications. They suggest that Shift_JIS still be used for encoding. I just had this discussion last week, and the word on the street is that it will at least be another year before you'd want to something like that in Unicode. Supposedly there are still some browser issues that will cause difficulties in viewing your page for some people.

whats up skip

3:47 am on Dec 10, 2003 (gmt 0)

10+ Year Member



We have done some basic stuff with MySQL and php. We used Shift-JIS. The trick seemed to be getting the data into the data base in the Shift-JIS format.

We found if you run MyphpAdmin in Japanese (s-JIS) and then do a text import (selecting the Shift-JIS encoding) it works fine.

The only problem I have not sorted out yet is sort order in Japanese. I gather you need to set up the MySQL specifically for Japanese.

davidpbrown

11:34 am on Dec 10, 2003 (gmt 0)

10+ Year Member



This article has more on the difficulty of Japanese and other languages using Unicode.
The secret life of Unicode - A peek at Unicode's soft underbelly [www-106.ibm.com]

Knowing which charset to use, may be less straight-forward than I first thought.. certainly Shift_JIS still appears to be very common for Japanese.

bill

1:06 am on Dec 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Given the large number of users in Japan and China, and the complexity of East Asian character sets, it's not surprising that complaints about CJKV would be voluble. The plain truth is that many Japanese and Chinese users do not trust Unicode. It's hard to believe when watching all the new operating systems, development tools, and technologies move to Unicode as the default character set, but resistance remains high in Japan, particularly among those conducting multilingual software research. There is a sentiment that decisions about Japanese issues are being made by people who do not have a native familiarity with them, despite the fact that the consortium includes a variety of Japanese and other East Asian members.
That's a very interesting article, thanks for posting it. I've seen the resistance to Unicode in Japan firsthand, but never heard the reasons expressed like this. That article is well worth your time if you're considering using Unicode with Asian languages...an interesting perspective.

kazonik

8:57 am on Jan 3, 2004 (gmt 0)

10+ Year Member



FYI:

If you're using php, s-jis (shift jis) can cause you problems. You'd best use EUC or UTF8. Ive been successfully using UTF8 for about 1 year now with no problems what-so-ever on a PHP/MySQL system.
All Japanese browsers released in the last year (at least) can handle UTF8 with out any problem.

Just my preference.

You may need a custom config of your PHP version.