lucy24 - 8:32 am on Oct 22, 2012 (gmt 0)
Some web editing programs create these attributes automatically, and therefore they arenít very reliable when trying to determine the language of a webpage.
Yeah: they put in <lang = "en">. So if it says <lang = "something else"> shouldn't that be taken as a pretty strong indicator that the page is in some other language?
All the more so when you've got <lang> tags around small discrete sections of the content. I've grumbled elsewhere about g###'s translation of the single line "grazie a tutti" into Italian-- happily ignoring the <lang="it"> tag and therefore making, let us not put too fine a point upon it, utter fools of themselves.
Not long ago, I found a log entry telling me that google had attempted to translate a particular page into Italian. Problem is, the page in question is already in Italian.
Sample. I assure you I am not making this up.
"Original English [sic] Text":*
Questa pagina ha sempre avuto un insolito numero di visitatori provenienti dall'Italia.
This Pagina Semper ha avuto delle Nazioni Unite insolito Numero di Visitatori provenienti DALL'ITALIA.
Clearly google's definition of "obvious" is different from yours and mine.
* Created by a multi-stage process: Run the English text through :: cough-cough :: Google translate. Go over it myself and fix the blatant errors. Find a kind Italian to fix my fixes. (Being several thousand miles away, I could not hear her laughing hysterically.) Check some obscure technical terms and run it past the Italian again.