Uh-oh, that didn't quite work. Mouse-hovering on your examples shows four identical sets of "menĂ¼", so something is getting re-encoded in transit :) Advance warning:
The remainder of this thread will come through as garbage if your browser is not set to UTF-8. If it is
set to UTF-8, the first three posts will say "menï¿½" in place of the intended "menĂ¼". Ajurnaqtuq, c'est la vie
, et cetera.
The first three links come through as %FC in the, er, menu bar, while the page title has "menĂ¼". The fourth link says "menĂ¼" in the menu bar while the page title says "menĂ£Å“". This is definitely not UTF-8 being reinterpreted as 8859-1. (I know this without looking it up because the œ character is not in Latin-1.)
Further investigation of page source, plus detour to IANA
[iana.org], tells me that all four
are encoded as ISO-8859-15 (described here
[iana.org]) (!) alias Latin-9 (double !).
The key difference (I'm quoting) is:
BC CAPITAL LIGATURE OE U0152
BD SMALL LIGATURE OE U0153
Does that BC sound vaguely familiar? It should.
But wait! If you put the word "menĂ¼" into UTF-8 and reinterpret as Latin-9, you do not get "menĂ£Å“". You get menĂƒÅ’; with capital letters. In fact you can't get to lower-case "menĂ£Å“" from UTF-8 at all, so the page must be running a script to put everything into lower case, even if it is lower case garbage.
In my previous post I thought I was just being snippy when I said it's the translation site's problem. But if they are going to go around encoding themselves in 8859-15, it really is
their problem. (Here I detoured to another browser to make sure it wasn't just reading an existing 8859-15 cookie.) If the site used 8859-1 you could probably deal with it, but 8859-15 is simply not going to work.
So we'd better backtrack to the original problem and find a different solution. Does your own site have features that require you to send non-ASCII text over the Internet to a translation source? You'll need to find a site that either
uses UTF-8 encoding, or
accepts information about the source file's encoding so it can do the conversion at its end.