homepage Welcome to WebmasterWorld Guest from 54.163.89.8
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
How is non-English alphabet created in web pages?
kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 4:08 pm on Jul 6, 2004 (gmt 0)

How is non-English alphabet created in web pages?

I need to convert about 20 web pages into:
Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

The text can be sent to me by PDF so I can select the text for copy and past. Obviously when I try pasting directly into a DreamWeaver page I get a lot of mess. Am I correct that I need to add a meta tag to say what the font/alphabet is being used? Can anyone tell me what tags etc I need to make my copy & pasted text work?

e.g. A Japanese site I worked on has this tag: (is this the kind of thing I need?)
<META http-equiv=Content-Type content="text/html; charset=Shift_JIS">

 

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 5:51 pm on Jul 6, 2004 (gmt 0)

That's exactly what you need ;)
Also, it's preferrable to use Unicode for characters > charcode 256

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 5:58 pm on Jul 6, 2004 (gmt 0)

Thanks.
Can someone tell the meta tags for

Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 6:50 pm on Jul 6, 2004 (gmt 0)

Well, you can always use the UTF-8 character set ;) It will work for anything and everything (provided you use Unicode characters). Otherwise, you can always take a look at this page [iana.org] from IANA [iana.org]. You can also take a look at ISO [iso.org]'s web page.

Finally, here's what W3C [w3.org] have to say about character sets [w3.org] (very informative).

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 1:12 pm on Jul 7, 2004 (gmt 0)

Wow! thats a lot of complicated stuff - Thank you!
Its a bit confusing, I've just spent a while trying to get to grips with it.

Does it mean if I put
<meta http-equiv="content-type" content="text-html; charset=utf-8">
in the head tag, and paste my text from a PDF it will render in the browser?

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 3:07 pm on Jul 7, 2004 (gmt 0)

Only if the text is Unicode... Like &#8456; for example...

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 5:15 pm on Jul 8, 2004 (gmt 0)

What kind of code comes from a PDF? It doesn't look like your example of Unicode. Below is a couple of lines from the Hindi document (I have no idea what it says) In the PDF it looks like Hindi characters but when I copy and paste (say into NotePad or on this WebmasterWorld form) I get this:

ȋ &#960;&#63743; OE&#63743; &#8260;U &#8260;U &#8719;U &#8721; &#8721; &#9674;
fl&#63743; U&#63743; &#8721; U, &#63743; &#8721; ߸ &#8721; &#8260;U&#63743; &#63743; U&#63743; U

I need a solution where I can paste this kind of thing into DreamWeaver and get the right characters.

Help! :(

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 5:30 pm on Jul 8, 2004 (gmt 0)

I might be getting somewhere :)
In DW I just changed 'Fonts/Encoding' (in preferences) to one of the available options ie: to 'Simplified Chinese'. One of my PDFs is in Chinese so I then pasted some text into my new DW document AND IT WORKED!
I've got lots of pretty chinese characters. I still don't know what they say, but it looks more hopeful than the other mess.

When I changed 'Fonts/Encoding' in DW 'Simplified Chinese' DW put this tag in my head:

<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

SO! that must be the right tag for Chinese. Can anyone tell me the tags for these other languages cos they aint in DW:

Bengali
Hindi
Cantonese
Punjabi
Gujerati
Urdu

Or tell me what this kind of tag is called so I can search for it in Google:
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 6:15 pm on Jul 8, 2004 (gmt 0)

I haven't found any one single resource so far, but searching on 'charset content type [insert language]' seems to turn up helpful results.

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 6:19 pm on Jul 8, 2004 (gmt 0)

Am I correct thinking my characters are asci but I need utf?

If so, is there such a thing as an asci to utf converter?

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 6:56 pm on Jul 8, 2004 (gmt 0)

Urdu = ISO-8859-6
Bengali = ISO646--Bengali
Hindi = ISO646--Hindi
Punjabi = ISO646--Punjabi
Gujerati = ISO646--Gujerati
Cantonese = big5

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 7:02 pm on Jul 8, 2004 (gmt 0)

Note that the Bengali, Hindi, Punjabi, and Gujerati only apply to the language as such without Han content.

[edited by: DrDoc at 7:06 pm (utc) on July 8, 2004]

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 7:05 pm on Jul 8, 2004 (gmt 0)

Thanks DrDoc

I just tried your Bengali example as follows - but I still just get mess (IE6 on PC). I did the following:

<html><head>
<title> </title>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=ISO646--Bengali">
</head>
<body bgcolor="#FFFFFF" text="#000000">

UU flAI UflUAU .. AU . UAI .
fl O UO OU

</body>
</html>

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 7:07 pm on Jul 8, 2004 (gmt 0)

Do you have a bengali font?

DrDoc

WebmasterWorld Senior Member drdoc us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 8233 posted 7:38 pm on Jul 8, 2004 (gmt 0)

You can also try this:
ISO646-Bengali-Japanese
or
ISO646-Japanese-Bengali

kapow

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 8233 posted 11:02 am on Jul 9, 2004 (gmt 0)

I tried these with no success: ISO646-Bengali-Japanese and ISO646-Japanese-Bengali

do you have the Bengali font

When I added this tag
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=x-iscii-be">
IE asked me if I wanted to install the font - I clicked ok, something downloaded - but nothing changed (I have since rebooted).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved