lucy24 - 10:07 pm on Apr 7, 2012 (gmt 0)
Ooh, my favorite question ;) Or, ahem, eight questions.
My own site includes material in a language written in a non-Roman script which google does not know. That is, the script is unique to the language, so it isn't a case of seeing Urdu and mistaking it for Arabic. Here's what I can say from direct personal experience. It's pretty basic, but it's a start.
google's keyword list includes words in languages google does not know-- but only as exact matches. If you have variant forms like "cats, catty, cattish, cattery, cat's" etc. each one will be listed separately.* If you have the identical word in Roman and non-Roman script, those too will be separate. I don't know how this works if google does know the language.
google search will similarly only bring up exact matches. I don't simply mean that it won't offer synonyms. I mean that it won't offer fragments: if you search for the equivalent of cdefg, search will not include results for abcdefg or cdefghijkl. This is a serious problem if the language in question is inflected and/or compounding and/or agglutinative, so you never do get exact matches except for a handful of isolated common words.
I don't know how keyword lists work in non-Roman languages that google does know. I do know that Search can be excellent. I can remember when it became possible to search in polytonic Greek, because it was a night-and-day change from useless to really good.
* Granted, English isn't 100% either. I've got a strings like "present, presented, presence, presents" or "states, state, stated, stately". But they're pretty darn close.