lucy24 - 8:15 pm on Feb 16, 2012 (gmt 0)
Uh-oh, that didn't quite work. Mouse-hovering on your examples shows four identical sets of "menü", so something is getting re-encoded in transit :)
The remainder of this thread will come through as garbage if your browser is not set to UTF-8. If it is set to UTF-8, the first three posts will say "men�" in place of the intended "menü". Ajurnaqtuq, c'est la vie, et cetera.
The first three links come through as %FC in the, er, menu bar, while the page title has "menü". The fourth link says "menü" in the menu bar while the page title says "menãœ". This is definitely not UTF-8 being reinterpreted as 8859-1. (I know this without looking it up because the œ character is not in Latin-1.)
Further investigation of page source, plus detour to IANA [iana.org], tells me that all four are encoded as ISO-8859-15 (described here [iana.org]) (!) alias Latin-9 (double !).
The key difference (I'm quoting) is:
BC CAPITAL LIGATURE OE U0152
BD SMALL LIGATURE OE U0153
Does that BC sound vaguely familiar? It should.
But wait! If you put the word "menü" into UTF-8 and reinterpret as Latin-9, you do not get "menãœ". You get menÃŒ; with capital letters. In fact you can't get to lower-case "menãœ" from UTF-8 at all, so the page must be running a script to put everything into lower case, even if it is lower case garbage.
In my previous post I thought I was just being snippy when I said it's the translation site's problem. But if they are going to go around encoding themselves in 8859-15, it really is their problem. (Here I detoured to another browser to make sure it wasn't just reading an existing 8859-15 cookie.) If the site used 8859-1 you could probably deal with it, but 8859-15 is simply not going to work.
So we'd better backtrack to the original problem and find a different solution. Does your own site have features that require you to send non-ASCII text over the Internet to a translation source? You'll need to find a site that either uses UTF-8 encoding, or accepts information about the source file's encoding so it can do the conversion at its end.