Forum Moderators: coopster

Message Too Old, No Replies

UTF-8 and scripting language

how do you implement

         

henry0

10:57 am on Oct 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In light of a recent thread [webmasterworld.com]
I am wondering about your chosen encoding
and how you implement it

I have only sites in US English
and use
charset=iso-8859-1
it sounds like:
the iso-8859-1 charset is well standardized, and suited for html etc.. and it might be the best choice for English written websites?

Sekka

3:05 pm on Oct 27, 2008 (gmt 0)

10+ Year Member



Maybe I'm jumping on a band wagon here, but as far as I am aware there are no detrimental effects to using UTF-8.

It is the standard encoding for a lot of lanugages, and come PHP 6 (maybe 5.3?) UTF-8 will be the default encoding for that too. This in itself should be a testament to its use. It just expands character compatibility and makes your application a bit safer in case you do go multilingual.

I personally use UTF-8 and have for years. I have no problems with it either.

henry0

4:46 pm on Oct 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You did it :)
Now I am confused
English sites
HTML, PHP and MySQL
what is the correct choice?

Sekka

10:33 pm on Oct 27, 2008 (gmt 0)

10+ Year Member



Bottom line, I use UTF-8 encoding for HTML, PHP and MySQL.

My PHP files are all encoded in UTF-8. Use a text editor that lets you alter file encoding, e.g. Notepad++.

Their output is sent in UTF-8 by using a header,

header('Content-type: text/html; charset=UTF-8');

And using htmlentities() with UTF-8 encoding,

function htmlentitiesutf8($text) {
return htmlentities($text, ENT_QUOTES, "UTF-8", false);
}

As for MySQL, all my tables are encoded in utf8_unicode_ci.

coopster

11:31 pm on Oct 27, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



UTF8 will work fine for English sites and provide you opportunity to expand in the future if you decide to go multilingual.

The MySQL part of creating/storing information is to set the charset to utf8 and your collation can either be utf8_general_ci or utf8_unicode_ci. There is a difference in the collation you choose, along with a tradeoff [dev.mysql.com]. utf8_general_ci is faster, utf8_unicode_ci is more accurate. For what it's worth, the open source package moodle implements utf8_unicode_ci, if I remember correctly. I'm not certain about other packages like WordPress, Joomla, etc. I would have to pull the code. That may give you an idea what mass-implementation packages are using.

The one thing to keep in mind when developing pages for utf8 is the entity encoding, as Sekka has mentioned.

Here is a link to a thread and the very last post has quite a few links in regards to this topic: Unicode Support [webmasterworld.com]

henry0

11:09 am on Oct 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for the details
I use UltraEdit and as is the UE status bar shows file type as DOS
I always used it as is and I am guilty to say that I never paid attention to the encoding;
if this is so important it should be made a priority when starting to learn about scripting and files

but the real point (in my case) is that DOS now add to my confusion.

Where do I go from there?
Needless to state that I never, on any server, found any problem linked to my encoding type.

I am bookmarking this thread, do not thinkthat I'll be rude by not responding, I am right now leaving the States for two weeks.