To view Japanese source code

Forum Moderators: open

Message Too Old, No Replies

To view Japanese source code

Difficulties

wolfy

11:51 am on Feb 19, 2003 (gmt 0)

I've this problem:

I found a web site and I really would like to view their source code ( site is just in HTML ) , unfortunately I can't read the Title, Meta Tags etcc.. because I see only numbers. The site is in Japanese but I've tried to view the source both on a normal PC and on a Japanese OS and nothing to do. I downloaded all the possible windows updates and I also download the web site and tried to put inside a charset but still nothing to do ..

Any suggestion?

Damian

12:21 pm on Feb 19, 2003 (gmt 0)

Where you able to determine for sure the problem is caused by the Japanese characters? Maybe the source is protected (encoded) by something like Weblock or Html Guard?

takagi

3:59 pm on Feb 19, 2003 (gmt 0)

Hello Wolfy,

Stickymail me the URL, and I will give it a try.

Takagi.

tedster

5:47 pm on Feb 19, 2003 (gmt 0)

Be sure to let us know if you find out what's going on - I know that I'm curious about this one. Thanks in advance.

bill

8:35 am on Feb 20, 2003 (gmt 0)

I'm a little late on this one wolfy but I'll take a look if you want...

wolfy

1:24 pm on Feb 20, 2003 (gmt 0)

Thanks to everyone,
I'll let you know if I solved or not the problem.

takagi, bill see your sticky mail
I'll give a reward to the faster in solving the problem :)

takagi

2:50 pm on Feb 20, 2003 (gmt 0)

Hello Wolfy,

The numbers you saw are Unicode numbers.

Usually Japanese pages are in Shift-JIS (charset=Shift_JIS), where 2 bytes are used for one character. A similar method is used when pages are written in Unicode (charset=utf-8), which also makes symbols from other languages possible like Korean, Thai, Hebrew, Arabic, Russian, etc.

The page you mentioned has a special way to show a Japanese string. For each character, &#<5-digit-code>; is used. This way each character needs 8 bytes. But every plain ASCII editor can be used to enter the code (if you know the code). And when copy/pasting, no unwanted conversion is done. This coding is similar to the copyright symbol that can be coded as '©' or as '&169;' .

To take the title-tag; the first 16 bytes are: 海の

But on a PC that can visualize Japanese characters, this is automatically converted.

The first character in the title has the meaning: sea and has the code 28023. The second character is a hiragana 'no' and in Unicode the number is 12398.

I hope my explanation is good enough. But if you need more information, please let me know by a message in this thread or by stickymail.

bill

12:51 am on Feb 21, 2003 (gmt 0)

It is obvious that takagi sleeps a lot less than I do ;)

He is right that the page is encoded in UTF-8 Unicode. It can be really difficult to look up all the codes so I suggest you cut and past the source into the tool at the bottom half of this page [kanzaki.com]. That page is in Japanese but has some useful tools for when you get unreadable JIS and Unicode text in your e-mail or source viewer. It will convert the gibberish into Japanese for you.

Hope this helps.

wolfy

8:58 am on Feb 21, 2003 (gmt 0)

Takagi and Bill:

thanks a lot!

I've used the tool on that page and it reall works fine, no I can see what I wanted to see!

wolfy

bill

9:03 am on Feb 21, 2003 (gmt 0)

...so who gets the prize? ;)

wolfy

9:10 am on Feb 21, 2003 (gmt 0)

And the winner is...

Bill and Takagi first place in two , you will share a half pint of beer at next PubConf in London :)

thanks guys!

takagi

9:28 am on Feb 21, 2003 (gmt 0)

I might not be there, next PubConf in London. Anyway, it is a first prize. And I'm only 'New User' to this forum. So looks like a nice start.

wolfy

11:46 am on Feb 21, 2003 (gmt 0)

You should have to come there.

< Anyway, it is a first prize. And I'm only 'New User' to this forum. So looks like a nice start.>

great start really!