Forum Moderators: open
We have a client with a site in English who now wants us to:
- make one static page in Japanese, they'll provide the translated texts.
- develop a "news" page in Japanese - so we need to allow input into the site's mysql db in Japanese, and then produce the dynamic page (we have already done that in English)
How to do this? Does anyone have some url resources to read up on this?
Heeellllp!
Encode the page and any includes in UTF-8==Unicode. This is fairly straightforward but make sure the text editor you use isn't saving in UTF-16 as many default to this without reference to UTF-8.
The best resource for Unicode is [alanwood.net ]
and unicode org itself [unicode.org ]
include the lang tag
<html lang="ja">
and ideally UTF-8 charset detail in your headers
<?php
header("Content-Type: text/html; charset=utf-8");
?>
Hopefully that will be everything you need.
Good luck
dpb
I know little about mySQL but here suggests support for Unicode..
As for the static pages, if I understand what you're saying, I should be able to handle the html pages in a text editor if I check the way it "encodes".
How 'bout fonts? I mean we do
Verdana, Arial, Geneva, sans-serif
What's done in Japanese?
Each character is represented by numbers by the computer.
The tables the computer uses to match each number to a character vary from ASCII to iso-8859-1 to allsorts.
Fonts are similarly matching numbers to glyphs(=~character graphics).
and therein lies confusion.
As ASCII is a subset of iso-8859-1, being an english speaker you would rarely notice problems. However in other languages if you switch font you could well get a whole mess of characters nothing like the original text. You might notice this with European languages such as Finnish where accented characters can get confused with other similar characters if you switch fonts.
This is where Unicode comes in. Unicode is font independant, if the font a computer is using doesn't contain representation for that glyph an empty box will show in place of that character.
Unicode has become the defacto standard for internationalising character representation, (everywhere that is but China which uses GB.) Each number represents a unique glyph.
Conveniently, iso-8859-1 is a subset of Unicode, so english speeakers have an easy time and copying, for instance, from html into text files, doesn't change the appearance of the text. However what is actually happening is that in copying, the editor is filtering the text, as it receives it, through whichever table that editor is using, commonly ASCII or iso-8859-1. Therefore any international characters won't be properly represented or recorded.. often they are replaced by? marks.
(This is my experience on Windows 98, I beleive XP+ has more support for Unicode throughout its software)
Japanese is typically encoded in shift_jis, iso-2022-jp, euc-jp. These may be difficult to convert into Unicode, I couldn't tell you. So, I would suggest, it is important that you request your Japanese translation be sent to you as UTF=Unicode. All this involves is probably looking to the "save as", and just below the filename should be the like of .rtf .doc .txt and in there, if the editor can handle it will be Unicode or UTF-8. When sending HTML emails there is typically an encoding option, in, I think, the 'view' menu, and you can then switch to different encodings.
There are numerous editors you can use if you've not got one already.. [alanwood.net ] has a list. Personally I would recommend 'EmEditor' as it can remove byte order marks, which can sometimes cause problems.
I suggested above that if the font doesn't have representation for a character you will see an empty box. Typically if a Japanese page, for instance, is being viewed a computer with the ability to show Japanese, it will already have fonts that it knows can present Japanese and will use these by default. You can of course suggest one you know to be commonly avaliable. (A common Japanese system font is apparently "MSGothic", I don't know of others. You may see mention of Code2000 but this is a very large font set including most of the Unicode table, ie. most every one of the worlds character in every language and this can slow computers down. Better to find specifically 'Japanese Unicode fonts').
Hope this makes things a little clearer.
dpb
How 'bout fonts? I mean we do
Verdana, Arial, Geneva, sans-serif
What's done in Japanese?
On a Windows PC you have several like:
MS ゴシック (MS Gothic)
MS Pゴシック (MS P Gothic),
MS 明朝 (MS Mincho)
MS P明朝 (MS P Mincho).
On the bottom of this page [tohoho.wakusei.ne.jp] there is a picture where you can see the differences even if you don't have the right fonts on your PC. The picture also includes some Macintosh fonts (like Osaka, Heisei Mincho, etc.) See also this example [www2.inforyoma.or.jp] and this example [masaboo.cside.com].
We found if you run MyphpAdmin in Japanese (s-JIS) and then do a text import (selecting the Shift-JIS encoding) it works fine.
The only problem I have not sorted out yet is sort order in Japanese. I gather you need to set up the MySQL specifically for Japanese.
Knowing which charset to use, may be less straight-forward than I first thought.. certainly Shift_JIS still appears to be very common for Japanese.
Given the large number of users in Japan and China, and the complexity of East Asian character sets, it's not surprising that complaints about CJKV would be voluble. The plain truth is that many Japanese and Chinese users do not trust Unicode. It's hard to believe when watching all the new operating systems, development tools, and technologies move to Unicode as the default character set, but resistance remains high in Japan, particularly among those conducting multilingual software research. There is a sentiment that decisions about Japanese issues are being made by people who do not have a native familiarity with them, despite the fact that the consortium includes a variety of Japanese and other East Asian members.That's a very interesting article, thanks for posting it. I've seen the resistance to Unicode in Japan firsthand, but never heard the reasons expressed like this. That article is well worth your time if you're considering using Unicode with Asian languages...an interesting perspective.
If you're using php, s-jis (shift jis) can cause you problems. You'd best use EUC or UTF8. Ive been successfully using UTF8 for about 1 year now with no problems what-so-ever on a PHP/MySQL system.
All Japanese browsers released in the last year (at least) can handle UTF8 with out any problem.
Just my preference.
You may need a custom config of your PHP version.