lucy24 - 1:21 pm on Nov 1, 2012 (gmt 0)
I didn't understand how the parts of this line fit together:
the various language encodings (and this is a very complex subject even for UTF8 let alone UTF16 and Non latin languages)
Are you talking about rendering (an action that happens within the browser, text editor or equivalent) as distinct from file encoding (a means of storing data)? I thought each Chinese character was a typographic island. A far cry from Semitic languages where you have both position-based variant forms, and diacritics combined on the fly. I don't have to deal with this much, as most scripts I use are precombined-- same as European languages. But I do remember when my browser learned how to write Devanagari. The mac still can't do Bengali though; maybe it's in some later cat. (I'm in 10.6 and really don't want to change.)
Google seems to be adding one language at a time. I can remember when searching in polytonic Greek took a flying leap upward and suddenly became very good. And they must be able to do Arabic and Japanese, because I've met both in logs.* But it is still impossible to search in any language that uses Devanagari or UCAS. I can understand about UCAS because it's such a small linguistic community. But Devanagari is used by a huge chunk of the world's population, and I've never seen any evidence of a Indian search engine picking up the slack. In the meantiem you'd think g### would at least ask someone if their present defaults are really the best approach.
* My log wrangling includes a few lines to decode percent-encoded scripts. Turns out you have to use different functions for ASCII and non-ASCII.