is it time to move to UTF-8 now?

Forum Moderators: coopster & phranque

Message Too Old, No Replies

is it time to move to UTF-8 now?

any info about agents/servers/editors/language support of UTF-8

Xuefer

8:39 am on Nov 20, 2002 (gmt 0)

how does UTF-8 supported by agents/servers/editors/language currently?

agents:
IE from which ver?
and how about moz/opera?

servers: how mysql support utf-8 now?

editor: your favorite html/code editors, including wysiwyg/text editors, how does them support utf-8

languages: php with mb_string? libiconv? mb regx?

Brett_Tabke

6:56 pm on Nov 21, 2002 (gmt 0)

The browsers are ready. I don't think the tools, servers, or world is ready though. I've looked at it and looked at it in an attempt to move to it right here. There are just too many problems yet with compatability.

Trisha

8:09 pm on Nov 21, 2002 (gmt 0)

Brett - in what ways specifically is "the tools, servers, or world" not ready yet?

I know my version of Homesite doesn't like it, but that doesn't stop me from using it. And so far I haven't seen any problems with MySQL, but I'm far from being an expert.

Xuefer

3:20 am on Nov 23, 2002 (gmt 0)

"not an expert"
that's the same reason why i ask u guys :)
i searched in google, find some info about it, yet not centralized and overall
many tech article about "what is UTF-8", but few about "how is UTF-8 be supported now"

Trisha

8:05 pm on Nov 23, 2002 (gmt 0)

I had always just figured it was well supported, until I saw Brett's post. So how do we get him back here to expand on what he said?

Xuefer

3:09 pm on Nov 25, 2002 (gmt 0)

detail info detail info~
that's what i asked for~~ thx :)
if not all tools/server is ready, then which of them are ready?
maybe i can avoid using them

chris_f

3:18 pm on Nov 25, 2002 (gmt 0)

I've been using it for over a year on a very large scale eccomerce site. I've only every had one problem which was quickly resolved by the client changing their browser encoding.

Chris.

Brett_Tabke

6:02 am on Nov 29, 2002 (gmt 0)

sorry for the belated reply here.

This Usenet Thread [groups.google.com] is an interesting discussion on the topic. That is a serious problem that becomes multiplied in a interactive environment.

What I have decided about the whole thing is that UTF-8 opens more problems than it fixes at this point. If you do not have an interactive site such as a chat room or some type of forum, I would be warry of using it at this point. Default language problems, character sets, forms, and browser support make it a mess.

In order to even begin to make sense of it, I feel it would take a month to even get close to feeling confident about using it on a site full time. The problem is that once you got confident that it was working the way you thought it should, there's a very high probability that you are wrong. What the random user was seeing in the browser would not be what you'd intended and you'd have no way of knowing. Why risk it?

I tried for several weeks to impliment utf-8 here. The problem is with older browsers and editors. The number of browsers returning high ascii characters (as ascii) when utf-8 was requested was astounding. The problem is in deciding just what the users browser was intending on sending back.

Here is an excellent article on the complexities of dealing with UTF-8 forms:
[ppeph.gla.ac.uk...]

According to the HTML4.01 specification, the only characters that you are entitled to rely on in this situation are those of us-ascii, i.e the 7-bit repertoire.
Realistically, however, browsers and other client agents do not enforce this restriction, and will typically handle characters outside of that repertoire by applying the same %xx hex coding that they apply to unsafe characters of the us-ascii repertoire. But this is not unproblematical, as we will see. Nevertheless, as an author, this isn't under your control: readers can and will submit extended characters - there's nothing you can do to stop them - so your server-side scripts need to be able to do something with them.

Thats for GET, but that's still the question even with POST, What to do with them? What data did the user really intend to send?

Throw that question into an ecommerce equation where credit card or personal data is requested. Hello!?

Whether that stems from your editor interpreting a page as non-backward compatible (and forcing encoding), an older browser baulking at your char set choice, or forms that return in a different encoding than you sent, means - it's still a mess.

So, until the fog clears a bit more and we have the tools to work with unicode that don't require a degree in languages, the safest thing to do is stick with pure ascii and non-charset specified pages.

There comes a time to learn a technolgoy and a time to wait. Remember in 98 when all the w3c guys were running screaming their fool heads off that CSS would take over the web in a years time? It's marginally even worth learning yet today.

I think the same is true for unicode. If we started on a long discussion and help thread about unicode right now, it would be a running coversation for the next several years.

Trisha

11:26 pm on Nov 29, 2002 (gmt 0)

Thanks for stopping back by here again Brett! In my case, I'm using it for sites that have relatively little if any interactivity, at least so far. And no shopping carts or credit card submissions.

But I agree it is a mess! I spent quite a bit of time trying to understand the whole thing, and it's still pretty confusing.

Any idea when the major problems will be worked out?

Chris - What was that one problem that you had?