Forum Moderators: open

Message Too Old, No Replies

How is non-English alphabet created in web pages?

         

kapow

4:08 pm on Jul 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How is non-English alphabet created in web pages?

I need to convert about 20 web pages into:
Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

The text can be sent to me by PDF so I can select the text for copy and past. Obviously when I try pasting directly into a DreamWeaver page I get a lot of mess. Am I correct that I need to add a meta tag to say what the font/alphabet is being used? Can anyone tell me what tags etc I need to make my copy & pasted text work?

e.g. A Japanese site I worked on has this tag: (is this the kind of thing I need?)
<META http-equiv=Content-Type content="text/html; charset=Shift_JIS">

DrDoc

5:51 pm on Jul 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's exactly what you need ;)
Also, it's preferrable to use Unicode for characters > charcode 256

kapow

5:58 pm on Jul 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks.
Can someone tell the meta tags for

Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

DrDoc

6:50 pm on Jul 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you can always use the UTF-8 character set ;) It will work for anything and everything (provided you use Unicode characters). Otherwise, you can always take a look at this page [iana.org] from IANA [iana.org]. You can also take a look at ISO [iso.org]'s web page.

Finally, here's what W3C [w3.org] have to say about character sets [w3.org] (very informative).

kapow

1:12 pm on Jul 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow! thats a lot of complicated stuff - Thank you!
Its a bit confusing, I've just spent a while trying to get to grips with it.

Does it mean if I put
<meta http-equiv="content-type" content="text-html; charset=utf-8">
in the head tag, and paste my text from a PDF it will render in the browser?

DrDoc

3:07 pm on Jul 7, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only if the text is Unicode... Like &#8456; for example...

kapow

5:15 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What kind of code comes from a PDF? It doesn't look like your example of Unicode. Below is a couple of lines from the Hindi document (I have no idea what it says) In the PDF it looks like Hindi characters but when I copy and paste (say into NotePad or on this WebmasterWorld form) I get this:

‚¢ÃÈ‹Ÿ &#960;Ê&#63743; OE&#63743;ŸÊ •ÊÒ&#8260;U Áª&#8260;U ¬«&#8719;UŸÊ ¬ÊÁ&#8721; ¸§Ÿ‚ã‚ &#8721; §Ë •Ê&#9674;
Áfl‡Ê&#63743;·ÃÊ „UÊ&#63743; ‚&#8721; §ÃË „ÒU, ¡Ê&#63743; &#8721; §ß¸ &#8721; §Ê&#8260;UáÊÊ&#63743;¥ ‚&#63743; „UÊ&#63743;ÃÊ „ÒU–

I need a solution where I can paste this kind of thing into DreamWeaver and get the right characters.

Help! :(

kapow

5:30 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I might be getting somewhere :)
In DW I just changed 'Fonts/Encoding' (in preferences) to one of the available options ie: to 'Simplified Chinese'. One of my PDFs is in Chinese so I then pasted some text into my new DW document AND IT WORKED!
I've got lots of pretty chinese characters. I still don't know what they say, but it looks more hopeful than the other mess.

When I changed 'Fonts/Encoding' in DW 'Simplified Chinese' DW put this tag in my head:

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

SO! that must be the right tag for Chinese. Can anyone tell me the tags for these other languages cos they aint in DW:

Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

Or tell me what this kind of tag is called so I can search for it in Google:
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

tedster

6:15 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I haven't found any one single resource so far, but searching on 'charset content type [insert language]' seems to turn up helpful results.

kapow

6:19 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Am I correct thinking my characters are asci but I need utf?

If so, is there such a thing as an asci to utf converter?

DrDoc

6:56 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Urdu = ISO-8859-6
Bengali = ISO646--Bengali
Hindi = ISO646--Hindi
Punjabi = ISO646--Punjabi
Gujerati = ISO646--Gujerati
Cantonese = big5

DrDoc

7:02 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note that the Bengali, Hindi, Punjabi, and Gujerati only apply to the language as such without Han content.

[edited by: DrDoc at 7:06 pm (utc) on July 8, 2004]

kapow

7:05 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks DrDoc

I just tried your Bengali example as follows - but I still just get mess (IE6 on PC). I did the following:

<html><head>
<title> </title>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO646--Bengali">
</head>
<body bgcolor="#FFFFFF" text="#000000">

U¡¾U¡¾ fl¢®¡¾¡þ©¥¢­AI ¡þU¡¾¡þ©¥¢­©ªfl¢®U¢¬AU .¡þ¡¾¡þ©¥¢­¢¬¡¾.¡¦ ¢©¡î¡¾¡þ©¥¢­¡¾AU¡¾ . ¡þUAI¡þ¢§ ¢ª¡¾.¢ª¢­¡¾
¢´fl¢® ¡¯©ªO¡þ ¡þU©ª¡þ©¥¢­©ª¢¶¡þO¡þ O¡þ©¥©ª¢«¨ÏU¡¦¨¬

</body>
</html>

DrDoc

7:07 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you have a bengali font?

DrDoc

7:38 pm on Jul 8, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can also try this:
ISO646-Bengali-Japanese
or
ISO646-Japanese-Bengali

kapow

11:02 am on Jul 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I tried these with no success: ISO646-Bengali-Japanese and ISO646-Japanese-Bengali

do you have the Bengali font

When I added this tag
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=x-iscii-be">
IE asked me if I wanted to install the font - I clicked ok, something downloaded - but nothing changed (I have since rebooted).