Forum Moderators: open
I found a web site and I really would like to view their source code ( site is just in HTML ) , unfortunately I can't read the Title, Meta Tags etcc.. because I see only numbers. The site is in Japanese but I've tried to view the source both on a normal PC and on a Japanese OS and nothing to do. I downloaded all the possible windows updates and I also download the web site and tried to put inside a charset but still nothing to do ..
Any suggestion?
The numbers you saw are Unicode numbers.
Usually Japanese pages are in Shift-JIS (charset=Shift_JIS), where 2 bytes are used for one character. A similar method is used when pages are written in Unicode (charset=utf-8), which also makes symbols from other languages possible like Korean, Thai, Hebrew, Arabic, Russian, etc.
The page you mentioned has a special way to show a Japanese string. For each character, &#<5-digit-code>; is used. This way each character needs 8 bytes. But every plain ASCII editor can be used to enter the code (if you know the code). And when copy/pasting, no unwanted conversion is done. This coding is similar to the copyright symbol that can be coded as '©' or as '&169;' .
To take the title-tag; the first 16 bytes are: 海の
But on a PC that can visualize Japanese characters, this is automatically converted.
The first character in the title has the meaning: sea and has the code 28023. The second character is a hiragana 'no' and in Unicode the number is 12398.
I hope my explanation is good enough. But if you need more information, please let me know by a message in this thread or by stickymail.
He is right that the page is encoded in UTF-8 Unicode. It can be really difficult to look up all the codes so I suggest you cut and past the source into the tool at the bottom half of this page [kanzaki.com]. That page is in Japanese but has some useful tools for when you get unreadable JIS and Unicode text in your e-mail or source viewer. It will convert the gibberish into Japanese for you.
Hope this helps.