|Who or what are semantic tags for?|
The past few months I have participated in the WHATWG mailing list regarding HTML5. The last period off-list with one of the members of the WHATWG.
The reason I entered the discussion was because I felt that semantics and accessibility were not attended to fully in the new spec. Sometimes when you enter a discussion you get to a point where you think...ah, I approached this from the wrong assumption/angle. In this case, it took me a few months to come to the conclusion that *everybody* approaches HTML from the wrong angle; myself included. I'll explain.
The current state of affairs has us all thinking about separating style from content and about using semantics (even to the point of using semantic class-names for styles). It always made sense to me, and in a way it still does.
I entered the discussion from a designer's/author's point of view. As such I have always seen semantics as a way of helping User Agents to make sense of the content in order to deliver the content as intended to the user. ("UA" and "user" should be thought of as almost anything that can "read" HTML and receive information resp., so a user could also be a database for example, it doesn't have to be a human).
As such, I always found HTML to lack in certain areas when it comes to semantics. For example, what if I want to mark up a dictionary-website. I can define the definitions with the <dfn> tag, but what about defining the type of word? (verb, noun, etc.), or an example? Or a translation? Yes, we can do it with the use of existing tags, but they are not semantically unambiguous.
Now obviously we don't want a huge amount of tags to cover all the semantic bases, so I thought it would be best to come up with another approach. I thought that as UA's became smarter over time Artificial Intelligence would take over, but until then we should have a some sort of construct that enables us to mark-up as semantically as possible without having to define all possible types of semantic-needs in the spec.
As the discussion went on, I slowly realized that the HTML-spec is aiming for technical problem-solving. It is trying to solve the designers problems we run into when creating web-pages. Moreover HTML5 used to be called "Web-apps". But what need do web-apps have for semantics? None. At that point I wondered: "who (or what) are semantic tags for in HTML?"
I asked around, and all the answers I get are illogical.
1) For users or authors? If that were the case, we would have tags for every possible type of text ranging from fiction to scientific, from blog to encyclopaedia. But we don't. And we never will since neither the WHATWG nor the W3C think we do (they told me so). This leaves us with a small set of semantic tags that we have to stretch in meaning according to our needs (so why define their meaning so strictly in the spec?).
2) For user agents? The only use for user agents is to see HTML as a "hooking language" onto which they hang their styles. Visually, aurally, etc. (think accessibility). But as far as the user agent is concerned it need not be semantic at all. One could name the tags A1, A2, A3...ZZ for all they care.
3) For the designers? This is a possibility, but designers usually think in terms of layout and style and have little need for semantics (they leave that to the user/author.) So designers would be much more helped in terms of non-semantic tags. And true, that is something the new HTML5 spec caters for. But it still does not explain who/what semantics are for.
4)data-extraction? (or equivalent stuff). Again, the current set of semantic tags are not adequate enough for this.
I still think we need to separate style from content, but I am at a loss as to explain what semantics are for in HTML? And if so, why not leave the semantic tags as they are? Why create new ones like "article", "section" etc.? And then why not "footnotes", "translation" etc.? (the argument here is according to the WHATWG that it does not solve a problem).
In fact, I did a little test. The code below works in all today's browsers except (guess?)...IE.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
border: solid 1px #066;
Hello, I am wrapped in a non-existant <fictionalinline>tag</fictionalinline>.
Hello, I am wrapped in a non-existant <fictionalinline fictarg="value">tag</fictionalinline>.
Hello, I am wrapped in a non-existant <fictionalinline rel="mystyle">tag</fictionalinline>.
Now I am not saying we should all start making up our own tags, but it does give thought to two things:
a) We could leave out all semantic tags from the spec and only use divs and spans.
b) no matter how many (or few) new semantic tags will be put in the spec, it would always be compatible because of the above code example.
I was always very strict in my application of the specs, standards and practices, but I am seriously beginning to question this because I cannot find a satisfying answer to a simple question:
Who or what are semantic tags for in HTML?
Anyone care to help me out here please?
[edited by: Bert36 at 12:57 pm (utc) on Mar. 22, 2009]
|Who or what are semantic tags for in HTML? |
There are no semantic tags in HTML.
Unfortunately the term "semantic (HTML) markup" has entered widespread use. It's misleading. It makes people think the semantic in semantic HTML is the same as the semantic in the Semantic Web. A better, more accurate term to use would be structural markup [diveintomark.org]. But it's too late [1997.webhistory.org] to try to change this now.
The semantic in semantic HTML refers to syntax: using the most appropriate presentational-structure markup for the content expressed. That's all.
The semantic in the Semantic Web refers to communicating meaning: encoding terms and defining relationships. Or at least that's what it originally meant. What the Semantic Web theoretically is has changed over time.
Understandably, people confuse the two meanings of the term semantic. Which is quite ironic, since this is all supposedly about meaning... :)
Here's a quote from an interview with Sir Tim Berners-Lee, 2005 [consortiuminfo.org]:
|CSB: Did that [2004 announcement about the emergence of the Semantic Web] mean that you expected people to start encoding Webpages semantically from that point forward? Have they? |
TBL: Itís not about people encoding web pages; itís about applications generating machine-readable data on an entirely different scale. [...]
The Semantic Web is not about the meaning of English documents. Itís not about marking up existing HTML documents to let a computer understand what they say. Itís not about the artificial intelligence areas of machine learning* or natural language understanding -- they use the word semantics with a different meaning.[!]
It is about the data which currently is in relational databases, XML documents, spreadsheets, and proprietary format data files, and all of which would be useful to have access to as one huge database.
*Note TBL was the person who said the Semantic Web would use AI, knowledge representation and inference rules in 2001 [sciam.com], and it's probably why he called it the Semantic Web in the first place. He's (rightly) been pulling back from that suggestion ever since, emphasising a Data Web instead.
|... but what about defining the type of word? (verb, noun, etc.), or an example? Or a translation? |
Short answer: this is what XHTML is for, in combination with a DTD that describes your tag elements.
I will agree with you that the term "semantics" is tainted when it come to the web. When I refer to "semantic tags" I mean things like <abbr>, <code>, <kbd>, <article>, <section> etc.
We can call them anyway we want, but that still does not answer my question, and neither does using XHTML with a custom DTD. Both your arguments are valid, but it does *not explain* why HTML (5 or any other version) has things like <code> but not things like <example>.
The only valid explanation -I think- would be if such tags were to eb interpreted very loosely (so <code> could include <example>. But the spec(s) outright refuse such use. In other words, the specs define very specific tags for very specific mark-up and leaves no room for flexibility. But why? Who or what benefits from that?
In other words. For who are these tags (whether we call them semantic or not).
The only answer I can think of is for accessibilities sake. I mean, I can mark-up a <p> like a <pre> and a <pre> like an <h1> and only users of screenreaders would hear/see the difference.
But if thios is true, why not extend HTML with far better constructs for accessibility? And why have things like <var> and not <word> (to give but a silly example).
I do not wish to discuss the validity of HTML or the spec, I want to discuss the validity of using semantic tags (or textual tags) in a strict way. Because I fail to see the purpose of that (in fact I think we all just assume it because we are told to do so, but there is no actual reason for doing it).
Please prove me wrong.
As you initially suggested, they're just hooks.
We have the tags we currently have due mainly to TBL's initial decisions when designing a markup language for physics papers. He did not imagine it would be used for forums, ecommerce, social networking etc. But it turns out that HTML is pretty flexible.
AIUI the new HTML5 tags came from analysis of commonly used class names. More could be added in future versions. The new tags just add a bit more potentially useful structure.
HTML is a primitive, quick and dirty markup language. It works because of this, not in spite of this IMHO. Use the appropriate tag if it exists, and just shoehorn in everything else ;)
Ok, I can live with that explanation. I just can't live with the fact that some things are not allowed to be "shoehorned" while others are. An example from the new HTML5 spec:
<p>To make George eat an apple, select
i think this has nothing to do with clarity, structure or semantics, in fact:
<p>To make George eat an apple, select the <i>Eat Apple</i> option from the <i>File</i> menu.</p>
seems much more "semantic" and clear to me.
(I am using the i-element here as described in the spec (http://www.whatwg.org/specs/web-apps/current-work/#the-i-element))
The problem I am having with this is, why do we need(?!) to adhere to a "standard" regarding defined tags if semantics and accessibility are sometimes better served when we don't?
compared to (e.g. a grammar example):
George has eaten an apple.
Who benefits from following the spec to the letter and/or who gets handicapped when I don't?
For some reason (unknown to me) "code" gets to be described in the spec, while "example" is not. So even if there is a good reason for using <code> shouldn't that equally apply to an <example>? Still, when it comes to example we are left in the dark and are "alowed" to mark it up any which way we want (shoehorn it), while when it comes to codes, we must follow a strict rule. It simply isn't logical, and therefore I propose that semantic/textual tags (as described in the spec) are pretty much useless in a semantic kind of way (as a hook for AJAX or style is another story, but in those cases who cares if I use <span> instead of <small>?)
In other words, isn't the spec in regard to *these tags* not simply bloated and a waste of time for both the writers, browser vendors and designers? Or am I oversimplifying matters?
You seem to be experiencing an epiphany: HTML markup is a means to an end, not an end in itself. :)
I do think things like <code>, <abbr>, <kbd>, <section>, <article> have valid use cases. That doesn't mean you must use them wherever possible, just use them where it's useful to do so - to you or your users.
HTML is a primitive, simplistic markup language. If you're looking for some deep unifying theory of why it is the way it is, rest assured there isn't one. Does seem to work quite well though, doesn't it? ;)
Oh, I totally agree with you. The thing is, that we are bombarded on the net in blogs, forums and discussions, as well as in education, that using HTML strictly (and I don't mean the dtd) is a good thing. It is coupled to best practice and doing it according to web standards. And it is that attitude that makes HTML into something more than it is.
As to your advice to use them where it is useful... I now think it is never useful to use things like code or section (abbr is another story as it is rendered slightly different in screen readers and therefore has a valid, useful use)
Perhaps it is an epiphany. I think I will adopt a "standard" of ignoring textual tags and start making use of semantic classes instead.
Thanks for sharing your views.
If anyone thinks I am making a huge mistake here, please explain.
|The thing is, that we are bombarded on the net in blogs, forums and discussions, as well as in education, that using HTML strictly (and I don't mean the dtd) is a good thing. |
People who only do HTML have a self-interest case for making HTML appear as complex as possible. Using HTML pragmatically is a good thing. Some parts of the "Web Standards" movement tend to favour dogma over intelligence.
|I think I will adopt a "standard" of ignoring textual tags and start making use of semantic classes instead. |
Don't throw the baby out with the bath water. Using <div class="code"> instead of <code> is as mad as using <span class="italic"> instead of <i>. When HTML5 is widely supported <section> could be used to generate a complex document's outline [blog.whatwg.org].
I wouldn't use <div class="code"> but <p class="code"> and then probably in terms of using a bunch of semantic classes like "function", "methode", "class" etc. that way I can make my own outlines.
this is just what I mean, why can I make complex outlines with section for articles, but not for code or stories, encuyclopedia's etc. etc?
Perhaps I am looking at it a bit black and white, but either give me full flexibility or nothing. At least this would eliminate the need for discussions, work-arounds, etc.
Sorry, I meant <span class="code"> is mad not <div...>. <pre> is the best tag for a code listing as it preserves whitespace and line breaks, and you could use syntaxhighlighter [code.google.com] to autoformat it.
Not to keep this discussion going on forever, nor to deviate from the topic, but... I always thought <pre> to be a bit "out of place". Technically it is a styling-tag, isn't it? So if you want to be a real purist about separation, you shouldn't use <pre>?
(I am not saying you are a purist, just in general) ;-)
[edited by: Bert36 at 1:57 pm (utc) on Mar. 24, 2009]
IMHO it's pointless trying to be a purist about HTML, it's not a pure language.
It's impossible to completely separate style from content.
There isn't a hard boundary between a semantic markup language and a formatting markup language, there's a spectrum, from more abstract to more presentation-specific. HTML is somewhere in the middle.
I think this is as old as html exists (read: prior to the addition of <font>, <center>, <blink>, <table>, ...)
I remember discussions about the planned (but never materialized) support for mathematical expressions (somewhat TeX like was the idea, for those into that) so that it could be used without the need to convert every formula from a descriptive content into a gif (at the time the only img format).
I think there have been reference implementations, but the globalization of the "www" and the browser wars seems to have wiped it out.
In the end: keep markup out of it and get as much description as you can and you'll be good for every day use.
As they introduce tags and browsers actually implement it widely (remember IE6 is still out there in significant numbers, even after the introduction of IE7 many years ago and the recent introduction of IE8) I think it's wise to use them what they we designed for, but you'll always find certain areas where there is no specific tag to help you describe your content and you'll be forced back to images (hopefully one day SVG), or <p class="stuff"> constructions.
Hmm, let's do a hypothetical.
Suppose one could create a webpage that renders perfectly in all UA's (browsers, screenreaders, iphones, braille, etc. etc) BUT, it would be one big tag-soup, a complete semantic mess at the code level. It wouldn't validate or anything.
Hypothetically: would/could someone like that be considered a good web designer? (let's keep maintainability out of the picture just for arguments sake)
[edited by: Bert36 at 3:56 pm (utc) on Mar. 24, 2009]
The customer only looks at the short term "view" (looks nice == good), so they'd probably agree that it is good.
- Till somebody looks at the code ...
- Till somebody brings out a new gadget that becomes popular that's not rendering the tag soup
- Till maintainability shows it's head ... (this is the big one IMHO)
You can bring it to other domains:
Suppose you have a civil engineer who builds bridges that are a "tag soup" equivalent, but somehow don't fall apart. Would anybody find him a good engineer. Nope, it's even extremely unlikely it'll ever get built. The reason is that this stuff has a number of acceptance steps where these things are ruled out. Bridges shall be well designed or they never get beyond the "idiot"'s drawing pad.
Suppose you have an artist who makes a statue that's filled with crappy stuff inside but looks really nice. Will it be accepted ? Sure! Looks is all that counts.
So is web development more like engineering or more like artistry ?
[begin academically trained engineer myself I'm biased]
If you rule out maintainability it's easy to argument it's like artistry. If you want to score high on future-proof and maintainability, it's relatively easy to argue standards, validation, code review, acceptance criteria, ...
Speaking with my pride and self-esteem, I wholeheartedly agree with you. But if I try to look at it objectively (if one can do such a thing) I wonder.
Obviously I have never checked every big web designing companies work, but the ones I did "right-click>view source" make me sort of sad.
Seeing how their "perfect" tag-soup seems to withstand standards and best practices, and add to that a new version of HTML on the horizon that delivers us not much hope in terms of accessibility and maintainability, I really just want to throw in the towel.
Maybe I am just to anal, lol (can I say that on here?)
|a new version of HTML on the horizon that delivers us not much hope in terms of accessibility and maintainability |
HTML5 will offer improved accessibility: ARIA support, more structural elements that improve in-document navigation, strongly-typed input controls, <legend>, better fallback instructions etc. With the new structural elements, it's arguably more maintainable than previous versions.
|I started this thread in the hopes someone would give me a good reason to keep doing this job |
Seems like a good reason to me :)
Taking pride in your work isn't dependent on other people taking pride in theirs. Take pride in building successful websites - markup is arguably the smallest, easiest part of that.