homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / CSS
Forum Library, Charter, Moderators: not2easy

CSS Forum

languages and CSS

 10:23 pm on Jun 11, 2012 (gmt 0)

This is not a question about the :lang pseudo-class.

I want to associate a particular language with a particular class (which already exists) so I don't have to put in that ### "lang='xyz'" HTML declaration every single time. That is, ahem, I can do it all with one fell RegEx-- but I'd still end up with a file that's distinctly fatter.

Am I overlooking something embarrassingly obvious, or is this simply not possible?



 1:05 pm on Jun 15, 2012 (gmt 0)

Hi Luce', so far as I am aware it isn't possible to set language using css and Internationalization Best Practices 4.1.6 CSS [w3.org] says the same thing. Makes sense really - language goes to the fundamental stucture of the document not the style, and is really useful for assistive technology that doesn't use css. The common eg being penchant which reads as "pen-chunt" in english, "pon-chont" if properly marked up as french.

So that takes me back to the fat document issue. Can you approach this from the reverse? Do you really need a class plus a language? That is, rather than associating a language with a style, associate the style with the language. Would still take a search and replace, but that would put the lang into the HTML where it is most helpful, connect the style to the function it serves, and reduce mark-up. An eg in case that doesn't make sense:

<p>Sentence using <span class="myclass">german</span> and <span class="myclass2">french</span></p>
.myclass {color:red}
.myclass2 {color:green}

Replace with
<p>Sentence using <span lang="de">german</span> and <span lang="fr">french</span></p>
:lang(de) {color:red}
:lang(fr) {color:green}

Edit reason: Better example


 9:19 pm on Jun 15, 2012 (gmt 0)

Hm, yeah, that's an idea. It's the computer equivalent of Dr Who's all-purpose remedy: Reverse The Polarity.

Incidentally, while doing some mass-replacing-- I'm adding <lang> tags bit by bit as I edit pages-- I realized that

:: cough, cough ::

I really am using <i> for two different things. One's basically presentational ("the letter <i>p</i>") while the other is semantic ("as Goethe famously said, <i>Donnerwetter!</i>"). One goes with <lang> tags, the other doesn't. This tells me I need to attack some of those docs at a lower level. At a minimum, <em> vs. <i>. But for now I compromised by saying <i lang = "iu"> which looks droll but does the job.


 12:56 am on Jun 16, 2012 (gmt 0)

The would work provided you really did want the words <em>phasised when spoken.

I'm not sure you do, so this is starting to sound like structure being used to convey style. Ok, I know <i>, <em>, etc are presentational elements in HTML4, but that's a legacy implication best avoided for the future. If being used to convey the meaning "this word isn't english" to visual users they are taking on a structoral function, so I think useful to think about them in that way. That is reinforced in HTML5:
the i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, or a ship name in Western texts.
which is how I think it is being used here.

I'd suggest not to attack from the lower level, but as usual, from the highest possible. I'm missing information because you initially asked about classes which suggest you have classed <i>'s serving a different purpose as well. However, so far <i>'s are being used as a presentational <i>italics on elements with no other meaning, and as structoral elements when language changes to convey <i>talics to visual users and language information to others. If your content looks like this:
<p>Sentence <i>u</i>sing <i lang="de">german</i> and <i lang="fr">french</i></p>
Then the css above still works because it only selects the <i>'s with the language attribute. The attribute selector would work just as well if you prefer. No need to add any other elment to distinguish them.

Then I'd ask how many presentational <i>'s are necessary. For eg, if used as a hook to create a fancy first letter:
<p><i>S</i>entence using <i lang="de">german</i> and <i lang="fr">french</i></p>
Then you can reduce the html by deleting <i> and using p:first-letter instead.

This is fun! Any chance of seeing enough content/html to figure the patterns?


 8:08 am on Jun 16, 2012 (gmt 0)

Heh. Well, in the particular page I was playing with today, anything italicized and longer than one letter would also get a <lang> attribute. Excluding the ones that, er, don't, due to the "foreign-italics" vs. "emphatic-italics" thing. And the times I got lazy, figuring there's only so many ways for a speech reader to mispronounce <i>-up</i> ;) Some of the <i>s happen to have a class "locked", defined as {white-space: nowrap} because there's a leading hyphen. That has to remain independent.

The obvious selling point of <i> and <em> -- and also <u> and <tt>, which I use, dammit, so where does html5 get off dumping it? -- is that they're just one or two letters. Go to <span class = "blahblah"> and you're in a whole different level of bulkiness.

Sadly in this file-- and a bunch of similar ones-- I can't take it the other way around, using only <lang> and then styling with a pseudo-class. The non-English stuff uses two different physical formats. Italic if it's within body text-- standard practice for foreign languages-- but sans-serif it it's a free-standing line. Mm, yeah, I might be able to do something like

:: detour to look up exact wording ::

p :lang(iu)

but ooh, that looks iffy. Besides, sometimes I have consecutive lines and one of them's English.

But the material that originally prompted the question would work well with a pseudo-class. Currently I've got <td class = "translit"> formatted as sans-serif. (Yes, a table is the most appropriate format, thank you for asking.) Those could all be replaced globally with <td lang = "iu"> and then the css would say

td:lang(iu) {and then all the style stuff that currently goes with the "translit" class}

w3c says:
The pseudo-class ':lang(C)' matches if the element is in language C. Whether there is a match is based solely on the identifier C being either equal to, or a hyphen-separated substring of, the element's language value, in the same way as if performed by the '|=' operator.

I have tried reading this upside-down, sideways and backward, but still can't get that second sentence to be English.


 12:07 pm on Jun 16, 2012 (gmt 0)

ooooo.... this is fun.

I don't like spans because of the number of letters either, but lang is one scenario when it's hard to argue against them. Thankfully we italicis foreign words in the main text so I can use <i> with a clear conscience :)
p :lang(iu)but ooh, that looks iffy. Besides, sometimes I have consecutive lines and one of them's English.
Are you sure you have a paragraph then? A list, quotes, definitions .....?
Anyways, p:lang(iu) selects p's with the language attribute, p :lang(iu) catches children of p, so there should be a way to make it work. Something like:

<p>Single <i lang="iu">Inuktitut</i> word<br>
<i lang="iu">Inuktitut line</i>
:lang(iu) {color:blue}
br + :lang(iu) {color:red}
... or whatever variation of nth-child/child will do it. If this is tables have you tried row/col grouping, and/or nth-child on the tr/td?

Whether there is a match is based solely on the identifier C being either equal to, or a hyphen-separated substring of, the element's language value, in the same way as if performed by the '|=' operator.
Drink more coffee and read the recs at 2am :)
:lang(C) works just like the |operator, so C could exactly equal the language value, (eg :lang(iu) exactly equals lang="iu")
Or, if you are using a hyphen-separated string (say fr-CA, for French spoken in Canada), then C will also match the language part (eg :lang(fr) matches lang="fr-CA" )
... couldn't use an Inuktitut example because it is the macro language. Off topic, but ISO 639 is suggesting ike (Inuktitut ) and ikt (Inuinnaqtun)for non-legacy uses ... and that iu is now iku ...
(Yes, a table is the most appropriate format, thank you for asking.)
Gosh ... as if anyone here would do that ;)

 8:32 pm on Jun 16, 2012 (gmt 0)

Off topic, but ISO 639 is suggesting ike (Inuktitut ) and ikt (Inuinnaqtun)for non-legacy uses ... and that iu is now iku ...

Matter of fact I went looking for a two-letter code for inuinnaqtun now that they've done a complete about-face and decided it's a language after all, but I must have the wrong list. Well, it's only the Library of Congress [loc.gov], what do they know. Good thing I didn't say ik, which turns out to be Iņupiat, off at the western end of the dialect continuum. They do say iku = iu. (Was "ike" a typo?)

I thought you were supposed to use two-letter codes by preference.

All of this is only for transliterated content. If a language has its own script, the reader can jolly well figure it out on its own. We will not talk about, say, the diaries of John Dee, where he used Greek script to record-- in English-- anything especially juicy.


 7:50 pm on Jun 19, 2012 (gmt 0)

I thought you were supposed to use two-letter codes by preference.
Me too, and I found that still specified somewhere .,. but I can't find it now :)

This was such an interesting issue for me. Our language is spoken only within our national boundaries. There are definite iwi (tribal) variations, but they are so easily understood we've been able to create an accepted official "national" language. To the point that many of the names/spellings/pronunciations in the south are northern because original translators and map makers never bothered to ask anyone in the south for the correct word. Never irritate a southerner then ask for driving directions. They'll be 100% accurate, but you can guarantee they won't match the road signs ;) So in the end, few off-shore people care, too few people are directly affected and an official national language means we ignore variations.

So this was a great opportunity to "catch up" with the detail and developments. Best I can tell the Library of Congress is using version 2 of ISO 639 - which has been superceded by version 3. Not forgetting language tag syntax is now consolidated into BCP (Best Current Practise) 47, and the latest RFC (Request for Comments) is 5646. I think that reflects the recent explosion of languages seeking recognition, and accommodating the numbers seems to have required a move to 3 letters. I was reading examples of 40+ letters to represent the micro/macro/regional/country/source/etc components.

Anyways, although the IANA keeps the register, the short way is through an officially recognised (private) lookup tool [rishida.net...] that alerted me to the change. Each entry is also linked to a helpful ethnologue that provides more information. In summary, suggesting use of the 3-letter codes where they exist, including the more specific regional variations, unless the older 2-letter macro-language is required for compatibility with legacy applications.

All of this is only for transliterated content.
Yea, what I found super-intriguing is the background issue of the increasing number of recognised languages. On the one hand, coming from a country that almost lost it's unique language during my life-time I support this. On the other, in some parts of the world it is normal to be fluent in 3 or 4 languages because language is a tool for communication, not a definitive component of cultural identify.

So I wonder what this means for coders of the future: Writing everything in a single language that has become the global tool for communication, or writing multiple versions to accommodate increasingly localised variations?


 4:10 pm on Jun 22, 2012 (gmt 0)

Quick follow-up: Oops, sorry Alt, I see it wasn't a typo after all. It's "ike" for the language, "iku" or "iu" for the Macrolanguage. (In this case the first is the one spoken in Iqaluit, the second is the whole dialect continuum from, I guess, just west of Iņupiat to just east of Kalaallisut.) I have now changed my bookmark to SIL [sil.org]:

This is the official site of the ISO 639-3 Registration Authority and thus is the only one authorized by ISO.

(and I only had to scroll past about ten google hits to find it ;))

And then I detoured to ScriptSource. See y'all next year.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / CSS
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved