Forum Moderators: open

Message Too Old, No Replies

Japanese encoding for Search Engines

What is the best character encoding?

         

Brownie

10:04 am on Aug 2, 2002 (gmt 0)

10+ Year Member



My previous version of our Japanese website performed well on the search engines. I have just updated the site using a new template, but it now appears that Google cannot read the text, although the actual site displays fine in web browsers.

I thought that I had originally used JIS encoding (charset=iso-2022-jp) and thus used this encoding in the updated site. I have just changed it to Shift JIS (charset=Shift_JIS). Is this the best? Will it solve my problems?

hotice_2002

4:17 am on Aug 3, 2002 (gmt 0)

10+ Year Member



I think that doesn't matter! Goolge always is upgrading, the dropped my website 3 months. In fact 3 months ago its PR=3. I do nothing with it, but it disappeared on google! I gurantee my website online 24*7*365. I think you had better resubmit your website to google every month. Maybe that will be more help!

Brownie

1:58 pm on Aug 5, 2002 (gmt 0)

10+ Year Member



Maybe I wasn't that clear... Google has listed my site, yet it displays the Japanese characters as question marks and other garbage!

Woz

2:06 pm on Aug 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Brownie,

are you looking at Google in English or Japanese. I only view in English and commonly see sites in other languages displayed as "?????????????". If you change your language preference to Japanese it may display your pages correctly.

Try it out and let us know.

Onya
Woz

Brownie

2:36 pm on Aug 6, 2002 (gmt 0)

10+ Year Member



I have done a site search which shows that all my Japanese pages are listed. However, on Google Japan (see link below), the Japanese characters on my site are not displayed... yet everything else is OK when you view the site!

<snip>

<Policy Note - I'll leave the link for a few days to help solve this problem, after which it will be removed as per the TOS. - Woz>

<Snipped Google search as per note above. - Woz>

[edited by: Woz at 12:13 am (utc) on Aug. 14, 2002]

Gorufu

2:37 pm on Aug 6, 2002 (gmt 0)

10+ Year Member



> I thought that I had originally used JIS encoding (charset=iso-2022-jp) and thus used this encoding in the updated site.

charset=iso-2022-jp is causing the ??? problems with indexing in Google. I checked your site in google.co.jp using Japanese Win98 and Japanese IE 5.5 and the ????? are listed in the title.

> I have just changed it to Shift JIS (charset=Shift_JIS). Is this the best? Will it solve my problems?

charset=Shift_Jis or charset=x-sjis should solve the problem. It will still appear as ???? until the next update of Google's database.

I have always used charset=x-sjis with great success.

The ????? appear when viewed using my local Linux box and the same would probably apply with Google because they use Linux servers.

The following discussion has more info about Japanese text and compatability problems

[webmasterworld.com...]

Gorufu

3:04 pm on Aug 6, 2002 (gmt 0)

10+ Year Member



Hi Brownie,

I was posting around the same time and used the business name in Japanese as keywords and the ?????? appeared in the title.

Clicking on the link from your latest post the results were really weird. I checked the source code for your homepage and noticed that charset=iso-2022-jp is there. That is definately creating the problems becuse there is more than one character set in iso-2022-jp

Shift_JIS is best for overall results using Windows

JIS is very limited and not recommended

EUC is used on Unix/Linux boxes for indexing Japanese text in databases.

bill

5:26 am on Aug 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...a little late to the party...:(

Brownie I checked your listings on a Japanese system and came up with pretty much the same conclusions as Gorufu. That iso-2022-jp character set you're using is not one of my favorites. (You should see what it does to Opera 6.x on an English OS...yeech) Like Gorufu I'm a big fan of x-sjis encoding for Japanese pages. It rarely gets mucked up in any of the major browsers.

A few other encoding suggestions for your HTML that aren't absolutely necessary, but won't hurt, are the following:

<html lang="ja">

...and for the head:
<meta http-equiv="content-language" content="ja">
<meta name="language" content="ja">

[edited by: bill at 8:02 am (utc) on Nov. 25, 2004]

Brownie

11:03 am on Aug 9, 2002 (gmt 0)

10+ Year Member



Thanks for all the helpful advice. I have implemented it all and will be back on track after the next Google update! Fingers crossed...