Forum Moderators: open
i have a question regarding the charset
i was wondering which one is best option between iso-8859-1 and uft-8, did my homework (read the few threads about it and some other references on the web) and i have to say that it is difficult to opt for one as it seems there is not a better solution
however, along that journey, i am not clear regarding the priority among the character entity encoding, the charset in the meta tag and the View>Character option set on the client browser
let me give an examplke to explain what i mean:
what will happen if i declare the charset as iso-8859 but down the code i encode a weird foreign language character using the uft numbering system?
and what is the page is viewed in japan when the user browser is set to another character set?
thanks for your help and i am sorry for my poor english, also this is the first post on this forum
vito
You might want to read this thread..
[webmasterworld.com ]
If you switch to view in another encoding there's a good chance many characters will get messed up. With English you don't notice this so much as many encodings have ASCII base and you continue to see the A-z correctly.
Essentially, if you have characters outside those in the iso-8859-1 table, you need to use another encoding.. there are others for pages which are russian for example. If you have mixed language pages, outside of the normal run of simple accented characters that for example French uses, then utf-8 is best.
If you use utf-8 you need to actually save those files as uft-8, and not simply as pure text files as you might be with English text.
is iso-8859-1 a set of 256 encoded characters?
i found few tables on the web but they all referes to only 256 characters .. with that said, if u use " € " (which is the character encoding for the euro currency symbol) is a utf-8 encoding?
and is there any conflict if i declare on the meta tag content type charset=iso-8859-1?
thanks, vito
a. the euro currency symbol is 20AC in the unicode chart, how to code/write it in the html file (of course it is not AC; )
b. if i declare the charset, for instance, utf-8 and then down the html page i write " € " (which is the character encoding for the euro symbol) what will happen? viceversa, if i have a charset=iso-8859-1 and then down the page i use a unicode encoding, how will the character be represented?
c. i understand it is important the way we save the html files. i often use different computers on different locations to edit my webpages, to avoid troubles, even if i have an editor available i always copy and paste on notepad because i am surei can find in any computer even in internet cafe), then saved it as html and them upload to the webserver. what is the notepad behaviour regarding saving as uft-8?
d. i am italian but living in china since 2 years. i am reading the newspaper online and everytime i have these square box or question marks in place of particular accent or symbols. give you na example: reading an italian newspaper, viewed the source and found out it has charset set as iso-8859-1, but being in china and viewing international websites i set my browser on View>Character>uft-8 (because it supposely support internatioanl languages), troubles, so many "?", i have to switch the browser to iso
e. as i said i am living in china but my webserver and website are in italy, i edit my files here and i view them from a chinese computer, i don't see the "?" on my webpages, but are there chances that viewing the same page from another computer in the west there could be encoding conflit
i am very sorry for being so long and thanks very much for whoever has the time to help
ps:davidpbrown > i read your post on the japanese thing, very helpful and clear, thanks a lot!
a. the euro currency symbol is 20AC in the unicode chart, how to code/write it in the html file (of course it is not AC; )See: Euro Symbol and HTML Validation [webmasterworld.com]
Pages written in Italian are usually read by people with the right default settings to read Italian. Same for pages written in Chinese. So if there is no charset set in the header of the file or in the server header, then the default character for that PC/browser is used. However, just like you I sometimes have problems reading pages written in west-European languages when the browser encounters smart quotes, accented characters etc becuase I live in Japan and the encoding of my browser is set for Japanese pages. These problems would not occur if those pages had the correct charset at the header of the file.
Of course if you DO add the charset in the header, make sure the text is saved in the right encoding.
As for Unicode codes (like the € in your example); they should work OK for every charset if the right font is installed on the PC of the visitor.
These problems would not occur if those pages had the correct charset at the header of the file.
they should work OK for every charset if the right font is installed on the PC of the visitor.