alt131 - 7:50 pm on Jun 19, 2012 (gmt 0)
Me too, and I found that still specified somewhere .,. but I can't find it now :)
I thought you were supposed to use two-letter codes by preference.
This was such an interesting issue for me. Our language is spoken only within our national boundaries. There are definite iwi (tribal) variations, but they are so easily understood we've been able to create an accepted official "national" language. To the point that many of the names/spellings/pronunciations in the south are northern because original translators and map makers never bothered to ask anyone in the south for the correct word. Never irritate a southerner then ask for driving directions. They'll be 100% accurate, but you can guarantee they won't match the road signs ;) So in the end, few off-shore people care, too few people are directly affected and an official national language means we ignore variations.
So this was a great opportunity to "catch up" with the detail and developments. Best I can tell the Library of Congress is using version 2 of ISO 639 - which has been superceded by version 3. Not forgetting language tag syntax is now consolidated into BCP (Best Current Practise) 47, and the latest RFC (Request for Comments) is 5646. I think that reflects the recent explosion of languages seeking recognition, and accommodating the numbers seems to have required a move to 3 letters. I was reading examples of 40+ letters to represent the micro/macro/regional/country/source/etc components.
Anyways, although the IANA keeps the register, the short way is through an officially recognised (private) lookup tool [rishida.net...] that alerted me to the change. Each entry is also linked to a helpful ethnologue that provides more information. In summary, suggesting use of the 3-letter codes where they exist, including the more specific regional variations, unless the older 2-letter macro-language is required for compatibility with legacy applications.
Yea, what I found super-intriguing is the background issue of the increasing number of recognised languages. On the one hand, coming from a country that almost lost it's unique language during my life-time I support this. On the other, in some parts of the world it is normal to be fluent in 3 or 4 languages because language is a tool for communication, not a definitive component of cultural identify.
All of this is only for transliterated content.
So I wonder what this means for coders of the future: Writing everything in a single language that has become the global tool for communication, or writing multiple versions to accommodate increasingly localised variations?