Robert_Charlton - 2:37 am on Apr 9, 2012 (gmt 0)
souffle (or should it be soufflé?)... to touch on several of your questions, and probably do only a very incomplete job of it....
How does Google treat Unicode Languages in its Algo?
I remember that some while back I'd posted that Caffeine supported Unicode. The thread (from Nov 2010), which I found via site Search here, only touches on what you're asking about, but it does suggest the difficulties you may have finding information...
Matt Cutts' Answer about Special Characters: "I Don't Know"
To find more about Google and Unicode support, I tried this search (on Google)....
None of the articles returned by the search get into the nitty gritty of how Unicode works in the Google algorithm and in mixed language environments, but here are the two most recent, which do provide partial answers to some of your questions about what searches can find via Google....
Unicode nearing 50% of the web
Official Google Blog
January 28, 2010
...Unicode is growing both in usage and in character coverage. We recently upgraded to the latest version of Unicode, version 5.2 [unicode.org] ...We're constantly improving our handling of existing characters... after extensive testing, we just recently turned on support for these and thousands of other characters; your searches will now also find these documents....
Unicode over 60 percent of the web
Official Google Blog
February 3, 2012
...We’ve long used Unicode as the internal format for all the text Google searches and process: any other encoding is first converted to Unicode. Version 6.1 just released with over 110,000 characters; soon we’ll be updating to that version and to Unicode’s locale data from CLDR 21 [cldr.unicode.org]....
As you can see, just getting the standards for the infrastructure in place is a long process. The fact that Google uses the word "internal" in the above paragraph suggests that not everything has yet been followed through within the search algorithm or interface.
You ask several questions about mixed languages, which is probably the most difficult area that Google encounters. Increasingly, the algorithm depends on context, and when the languages are mixed, the proper determination of context becomes many orders of magnitude more complex.
When I last reviewed it, the suggestion was to try to avoid mixing of language on a page if possible. Here's a discussion on a mixed language problem that ultimately brought together lots of issues....
Translate problem in Google SERP - not always ranking right language
If you're a Supporter, I highly recommend checking out the link at the end of the above discussion to the "SEO for multi-language sites" thread, which is in the Supporters section. It's one of the more complete discussions on the topic we've had in WebmasterWorld. Site search should lead you to some other threads as well.
The treatment of language in search is a very broad topic. Location, hosting, linking, and TLDs related to language issues are another subset of the question.
If you can focus on an immediate issue, please advise, as your question will then become much easier to answer. That is the nature of search and of language. ;)