| This 51 message thread spans 2 pages: < < 51 ( 1  ) || |
|Semantic HTML: Does mark-up provide enough meaning in web documents?|
Msg#: 11300 posted 6:46 pm on Nov 6, 2005 (gmt 0)
You've got a div which contains a single peice of data: the serial number of a product, for example.
Options for mark-up include placing the data in a paragraph tag, or leaving it as just the raw text with the div acting as it's element.
Which is, in your opinion, semantically correct?
Things to consider:
(a) Technically, it's not a paragraph, as a paragraph is by definition a collection of sentences. This is only a 14 digit number.
(b) Technically, you can't just put text (it's numbers, but that's still text) uncontained into a block level element. I.e., it's not semantically correct to place it in the container without an appropriate block level parent element.
What would you do? Which do you think is more in line with the ideals of semantic mark-up? Do you buy into the idea that a div, as a generic block level container, is a good enough container for text, or does text need to be contained in an element with semantic value, even if the semantic value doesn't precisely match the purpose of the text? What about the idea of nesting the text in a span set to display:block? How semantic, or non-semantic, is that?
This question is similar to an earlier discussion about semantic markup in poetry [webmasterworld.com]. I know what I think, but I'm not entirely convinced that I'm right. I'd like to know what opinion others have.
Msg#: 11300 posted 5:45 am on Nov 10, 2005 (gmt 0)
lexipixel, the question in the title of this topic is "Does mark-up provide enough meaning in web documents?". You are right that it is a good idea to save bytes - just as long as there is still enough meaning. It's up to the author to decide on where to draw the line in the trade-off between file size and meaning. If we always went for the smallest possible file size, all our documents would look like .txt files with no tags at all!
Msg#: 11300 posted 6:02 am on Nov 10, 2005 (gmt 0)
The semantic web is a load of BS. Markup doesn't mean anything to a human unless he or she is a HTML author. In that case the meaning, i.e. the semantics of a tag is of purely technical nature and has nothing to do with the meaning of the words that it contains.
By saying <div class="serial">123456789</div> you are telling
1) the computer to look up the CSS rules for a div with class .serial,
2) the web author that you want serial numbers to look a certain way and
3) the reader virtually nothing unless you provide some textual context as in <div class="serial">Serial: 123456789</div>.
So "meaning" has different meanings depending who or what is interpreting a markup document and at which level.
In that sense, a <p> tag carries no meaning to the reader. It is thus irrelevant if you use <p> or <div> as far as semantics is concerned.
If the Google ranking algo were stupid enough to assign greater weight to a keyword occurring in a <p> than one in a <div>, I would replace all my div's with p's, refactor my style sheets and get on with life.
Msg#: 11300 posted 6:16 am on Nov 10, 2005 (gmt 0)
|The semantic web is a load of BS. |
I'm not sure the makers of screen readers or small screen rendering software would agree with you there.
Semantic mark-up allows all manner of user-agents to present data appropriately in different formats (aural, small screen etc.).
And there's nothing semantic about the use of <div>s.
Semantic tags like <td>, <q>, <ul>, <h2> indicate to the user-agent the nature of the data.
Msg#: 11300 posted 6:51 am on Nov 10, 2005 (gmt 0)
|Is it a unit of typography, or a unit of meaning? It was alluded above by bedlam that the two are the same, but I disagree. |
Read it again -- that's exactly the opposite of what I said. The closest I came to what you said was that markup and typography serve similar purposes, and that they accomplish said purpose in different ways ;-)
|The semantic web is a load of BS. Markup doesn't mean anything to a human unless he or she is a HTML author. In that case the meaning, i.e. the semantics of a tag is of purely technical nature and has nothing to do with the meaning of the words that it contains. |
To me, this sounds like you've completely misunderstood one or both of this discussion or the purposes of HTML in the first place. No one has said that markup is directly applicable to humans' use of marked-up content.
According to the w3c at least (and this is very apparent upon even a cursory reading of the specs), the point of marking up documents in html is so that they will be interoperable between different useragents and 'future proof' in that modern versions of [x]html are actually xml...
|SGML is a system for defining markup languages. Authors mark up their documents by representing structural, presentational, and semantic information alongside content. HTML is one example of a markup language. |
|To publish information for global distribution, one needs a universally understood language, a kind of publishing mother tongue that all computers may potentially understand. The publishing language used by the World Wide Web is HTML (from HyperText Markup Language). |
Why Part 2
|HTML has been developed with the vision that all manner of devices should be able to use information on the Web: PCs with graphics displays of varying resolution and color depths, cellular telephones, hand held devices, devices for speech for output and input, computers with high or low bandwidth, and so on. |
What's more, as I have repeatedly tried to point out, html documents frequently need to be interactive; adding meaningful structure in the markup makes this possible, but HTML's base elements are inadequate to this task.
|rjohara, the idea you've descibed is fascinating. Make each controlled vocabulary publically available, then link to it from each page using that controlled vocabulary. Unfortunately this system is unreliable: if I make a page that uses tags from a controlled vocabulary that is owned by someone else and is not tied down by a W3C standard, then my page is at risk of breaking whenever the owner of the controlled vocabulary changes something or, worse still, moves the vocabulary to a different location. |
You may be right that it's potentially unreliable, but this is precisely how xml is supposed to work (and how [x]html actually works):
|The function of the markup in an XML document is to describe its storage and logical structure and to associate attribute name-value pairs with its logical structures. XML provides a mechanism, the document type declaration [w3.org], to define constraints on the logical structure and to support the use of predefined storage units. |
Without some formal definition of its grammar, xml cannot be validated and is of rather limited use.
It's worth repeating that xml and xhtml can work very well together via xslt [w3.org] -- xslt can output html [w3.org] or xml [w3.org], and since xhtml is xml, it can transform xml to or from html. This is itself a good reason for marking-up any HTML document that might need to be converted to a full-fledged XML document with all the structure available -- including ids and classes.
Msg#: 11300 posted 10:28 am on Nov 10, 2005 (gmt 0)
|We all know the old standard about DIV and SPAN allowing authors to add their own structure to a document, but such structure holds no semantic meaning. |
DIV does offer some semantic meaning. The tag refers to a DIVISION on a page. It is clearly designed to separate groups of other elements, such as paragraphs. It is also flexible enough to be used on its own, such as placing the serial number in question on a page.
I see no problem with just having text inside a DIV. I use DIVs myself for columns and anything that needs to be positioned accurately. I tend to give each DIV an ID.
Let's not forget speech readers here. If they see content marked up in paragraphs, they might well pause between each one. Whereas if divisions were used, the text might flow continuously.
So one solution to the serial number problem would be to mark it up as a paragraph, without a DIV. Don't forget that paragraphs can also be positioned like DIVs! Try the following code if you don't believe me:
<p style="position:absolute; top:100px; left:100px; border:1px solid #f00; width:100px; height:100px">Hello</p>
Non-English use of HTML
What about languages like Japanese, which can be written vertically? Is the P tag any use at all in that language? Traditional 'sentences' may be written in flow, not like English with a gap between paragraphs. I wonder if they find various HTML elements useless or not.
This is the most accurate solution. By creating a <serial> element we can clearly define the meaning of the content. But before dismissing this, as debated above, the idea is then to convert the XML to HTML before displaying it. Thus you can decide later what you want to do with the serial element. Convert it to a DIV or a P or a list, whatever. Yet your original XML file retains the full meaning.
I might even link to your XML file and style it as a different element, depending on my needs.
The only problem is when someone who can't read English looks at the file. Then all meaning is lost. But then surely that also applies to HTML.
People mention creating a special set of tags for their needs. In XHTML the doctype allows you to define your own list of custom elements. So you can have the usual HTML ones, but also a <serial> tag. This is why they put the X infront of HTML, to make it eXtensible.
The Literary Moose has defined such a system in order to mark up literature.
The browser can then link to the custom doctype you create each time the page loads, to check how to display the custom elements. So you would need to define how to display the <serial> element - inline? Block? etc.
The problem with this approach is obviously going to be lack of browser support. Also it can slow the page down as the doctype must be parsed first before the page. But it's definitely possible.
Semantics from Classes
This is a new idea to me, and on the face of it, slightly suspect. Consider the example given earlier:
<dd class="weight">1 ton</dd>
You're saying that each definition has a style applied to it! But what if it doesn't need one? Aren't you adding extra markup? If each definition is fine displayed in the default style, then you would normally leave out the classes altogether.
Yet this is a clear solution to the problem of semantics. I can read the markup and know what each definition is. However, the semantics are lost when a speech reader reads out the file. It ignores the classes!
Obviously HTML isn't specific enough to give us tags for each class used above. Only XML and custom doctypes will solve that one.
Interesting approach though.
Msg#: 11300 posted 11:22 am on Nov 10, 2005 (gmt 0)
|You're saying that each definition has a style applied to it! But what if it doesn't need one? Aren't you adding extra markup? If each definition is fine displayed in the default style, then you would normally leave out the classes altogether. |
Done mainly for clarity, and to illustrate that the serial number could be part of a hierarcy, because of course class names add no *actual* semantic weight. I'm just following good practice by naming classes after their purpose.
However, I *would* tend to add redundant classes if I thought I might need to style them in the future, much the same way as CSS Zen Garden has redundant 'hook' elements to hook graphics onto. It makes the XHTML a bit more future-proof and adaptable, at the expense of a few wasted bytes.
Msg#: 11300 posted 12:48 pm on Nov 10, 2005 (gmt 0)
Personally, until I'm ready to jump into XML, I just go for the least possible amount of markup in situations like this. A span or other tag may technically be "more correct," I'm not sure, but to me the answer isn't clear enough to justify the extra bytes of an additional start and end tag. So I'll choose to keep it simple every time.
It should be noted (I haven't seen anyone point this out yet, but I may have missed it <edit> Yep, Hester pointed it out already </edit>) that CSS provides a good "stepping stone" from HTML to XML. Although it won't make any difference to screenreaders or other environments where real semantic meaning is important, assigning a class name or ID to the serial-number-containing element (whatever you choose) at least helps keep the data types straight in the mind of the webmaster. So maybe just a
<div class="serial"> would be enough. (After all, is there really a special way a serial number should be read anyway?)
Just my two cents, maybe it's only worth that much, take it or leave it! ;)
I was half-tempted to just jump into this discussion with a "My Dad can beat up your Dad" comment; that would seem to be about as good a way as any to help us reach a consensus on an issue like this! ;)
Msg#: 11300 posted 4:09 pm on Nov 10, 2005 (gmt 0)
I would go with using paragrapgh for the serial number or Driver's Licence, SSN etc. Divs are often (mainly) used to separate formatting - paragraph implies something to be read- content.
Msg#: 11300 posted 4:35 pm on Nov 10, 2005 (gmt 0)
What does "semantic" actually mean in this context? The W3C's Semantic Web imagines a web "...in which information is given well-defined meaning" to enable "...the ability to reason, query, [and] express logical relations" - in other words, a queryable, massively interlinked, distributed database.
By using consistent CSS class names we can treat a website's pages as a collection of database records. Unfortunately, these "semantic HTML" pages do not make particularly good database records. They have lots of unrelated information in them: global navigation, local navigation, small print, user help, branding, adverts etc: the human friendly stuff. To complicate things further, some of the pages aren't really data records at all: homepages, category pages, about pages, contact pages etc etc. Again, these are the human friendly bits that make the website work.
So, treating a "semantic HTML" website as a database of records immediately runs into problems: all the pages have to be downloaded/cached and parsed to extract the data. Somehow, the actual data record pages have to be separated from the non-data record pages. Then the actual data in each identified record page has to be separated from the human-friendly elements. All this is doable of course. But is it an effective way of providing/accessing the underlying data?
Imagine you want to publish all the product information on your ecommerce website in a structured format for use by other people. You want to encourage this. Which method would be easier: requiring everyone to download, filter and parse every page on your website OR providing a simple list of every product in a structured format (XML/CSV/Excel/whatever) at a single location? How do RSS feeds work?
Effective web pages give human-friendly views into the underlying data (they're presentational by definition), and using basic structural elements can improve human-friendliness (structural HTML). But the "parsing view" into the data can be delivered much more effectively in a separate, data-only format.
I'm still not sure whether the Semantic Web actually involves (X)HTML web pages at all, and for me this makes the whole "semantic HTML" concept largely pointless. Imagining the Semantic Web as a separate data layer on top of the current Web turns the Web Standards movement's accepted wisdom on its head: web pages are only for presentation! ;)
Msg#: 11300 posted 5:33 pm on Nov 10, 2005 (gmt 0)
|I'm still not sure whether the Semantic Web actually involves (X)HTML web pages at all, and for me this makes the whole "semantic HTML" concept largely pointless. |
Cart before the horse ;-)
My understanding of this -- subject always to correction -- has always been that your 'repositories' of information are actually xml, and that that information can be displayed as web pages (or, for that matter, other kinds of documents) by transforming that xml.
|Unfortunately, these "semantic HTML" pages do not make particularly good database records. They have lots of unrelated information in them: global navigation, local navigation, small print, user help, branding, adverts etc: the human friendly stuff. |
Lots of the 'human friendly' stuff is not necessarily unrelated information. Navigation, for instance, is a way of representing a document hierarchy -- which is a very proper use of xml.
Msg#: 11300 posted 5:49 pm on Nov 10, 2005 (gmt 0)
No Jet...I was saying if the choice was left between those two generic tags which would CEM need (as it would now be a presentation issue, misunderstanding).
lexipixel has a good point, WebmasterWorld for example is not optimized (2,500 a month to pay for bandwidth)? It could easily be optimized though how much one wants to optimize/or not is going to be another factor.
CEM - I think after the discussion about XML kicked in that this would be a good example of XML's usefulness over HTML. I agree that there will probably be a multitude of issues when XML is used widespread across the net. Perhaps an XML transitional doctype of some sort where all the XHTML 1.1 code is transitioned in to the doctype and an example where a piece or two of information requires new elements/attributes could be manually added to the document and DTD?
If Hester is correct about XHTML I think this would be your best solution. Go with XHTML 1.0 Strict (keep the text/html mimetype is it?) while being able to add your serial element and adding that serial element to your custom DTD.
If that could be a solution then a few interesting questions come to my mind. What do custom elements gain for a default display (in terms of block/inline)? Well one question right now.
Msg#: 11300 posted 10:31 pm on Nov 10, 2005 (gmt 0)
I'm still not sure whether the Semantic Web actually involves (X)HTML web pages at all, and for me this makes the whole "semantic HTML" concept largely pointless.
Cart before the horse ;-)
...your 'repositories' of information are actually xml, and that that information can be displayed as web pages (or, for that matter, other kinds of documents) by transforming that xml.
With respect, I think a better analogy here would be horse before the *car* ;)
If we have a semantically rich XML format for the data as you suggest, then why force Semantic Web agents/parsers to go through the presentational HTML delivered to browsers to get to it? Just give them the XML!
Msg#: 11300 posted 12:36 am on Nov 11, 2005 (gmt 0)
|The semantic web is a load of BS. Markup doesn't mean anything to a human unless he or she is a HTML author. In that case the meaning, i.e. the semantics of a tag is of purely technical nature and has nothing to do with the meaning of the words that it contains. |
Hanu - on the contrary, markup and the semantic web means a great deal to almost every web user, including those who can't even spell "HTML". Search engines use the semantics of tags to sort results in the most useful way (for example they differentiate between a heading and a paragraph). This means that every person who has ever used a search engine has benefitted from the semantics of tags. As has also already been pointed out in this thread, screen readers and other kinds of user agent also benefit greatly from semantics. If you have been ignoring semantics in your web pages, almost all of your intended audience are suffering in one way or another.
Msg#: 11300 posted 3:08 am on Nov 11, 2005 (gmt 0)
I've just found this fascinating June 2005 article: THE SEMANTIC WEB: AN INTERVIEW WITH TIM BERNERS-LEE [consortiuminfo.org]
|CSB: Did that [2004 announcement about the emergence of the Semantic Web] mean that you expected people to start encoding Webpages semantically from that point forward? Have they? |
TBL: Itís not about people encoding web pages; itís about applications generating machine-readable data on an entirely different scale. Were the Semantic Web to be enacted on a page-by-page basis in this era of fully functional databases and content management systems on the Web, we would never get there.
What is happening is that more applications Ė authoring tools, database technologies, and enterprise-level applications Ė are using the initial W3C Semantic Web standards for description (RDF) and ontologies (OWL).
|TBL: The Semantic Web architecture does not involve HTML browsers as we know them. There is a new breed of generic Semantic Web browser, but they are more like unconstrained database viewing applications than hypertext browsers. |
|TBL: It's not as if every page on the Web will be retrofitted with Semantic information. What we are likely to see though, is the wrapping of existing data stores, such as data in relational databases. We could anticipate a "View Data" feature in much the same way some of us "View Source". It's also worth noting that the new work in XHTML 2 is looking to include RDF capabilities. |
There will always be on the web documents to be processed by people, and data to be processed mainly by machines. This is a feature, not a bug.
This certainly clarifies things, and yet, confuses me slightly more.
Msg#: 11300 posted 7:26 am on Nov 11, 2005 (gmt 0)
|rjohara, your post deals directly with this and brings the cryptic information in the Semantic Web stuff right into perspective. When you say that controlled vocabularies will be defined, I take this to mean within various areas of study? Is the intent that such vocabularies be stored in a central location, like the DOCTYPE definitions at the W3, or will fields of study establish their own local locations for these things? |
Most likely they will be hosted by individuals and/or organizations. If you and I are president and vice president of the International Widget Federation then we might host the standardized widget description vocabulary on our site, houseofwidgets.com, and anyone who follows the standards (members of a professional organization, say, or enthusiastic amateurs) would link to it there.
|rjohara, the idea you've descibed is fascinating. Make each controlled vocabulary publically available, then link to it from each page using that controlled vocabulary. Unfortunately this system is unreliable: if I make a page that uses tags from a controlled vocabulary that is owned by someone else and is not tied down by a W3C standard, then my page is at risk of breaking whenever the owner of the controlled vocabulary changes something or, worse still, moves the vocabulary to a different location. I wouldn't want to link my page to someone else's CSS file, so there's no way I want to link my page to something that provides actual meaning to my page rather than just look'n'feel. It's too risky. |
But if this were true, then no one would be running AdSense, no one would be an Amazon affiliate, no one would run an RSS feed, and no one's sites would have any of the innumerable weather widgets, talkback widgets, counter widgets, or anything similar, all of which pull their content from a third-party URL.
And the way CSS is supposed to work (and does - just not too many people do it this way) is that you indeed can link your page to any CSS file on the web. The W3C Core Styles [w3.org] are meant to be used just that way: try linking your page to one and see how it looks. (This can actually point up bad page design, because anything that's too "far out" in terms of HTML will look pretty poor, but anything that uses basic HTML will look fine.)
The web is more and more approaching Ted Nelson's original vision of transclusion [en.wikipedia.org], with individual pages being assembled out of fragments drawn from many differnet locations. Yes, links can break, but that's one reason why cool URLs shouldn't change [w3.org].
Msg#: 11300 posted 9:11 pm on Nov 11, 2005 (gmt 0)
Going back to cEM's original question:
Semantic HTML: Does mark-up provide enough meaning in web documents?
I think the answer is "No".
Semantic in HTML can go about as far as discerning a HEADING from BODY text from SMALL (footnote / caption) tyep text.
There just aren't enough tags to properly and easily and universaly label (or mark up) content to the extent that XML can.
CSS was not meant to replace XML.
HTML for structure
CSS for style
XML for classification of data
My mention of economy of data (saving a few bytes wherever you can) was a sideways answer to the original question --- why waste markup when it won't do what you want...
If you don't want to code XML, why bother trying to label your markup behind the scenes with CSS or HTML... you'll get more bang for the buck putting the label in the content;
For pure SEO value;
<div><b>Serial Number:</B> 1234567890</div>
<div class="serial number">1234567890</div>
(the first example put the text "Serial Number" in proximity to the number, and the BOLD may get you a few brownie points).
... and that's my $0.03
Msg#: 11300 posted 4:46 pm on Nov 12, 2005 (gmt 0)
|I think knowing that intended use of the paragraph in the Semantic Web would go a long way toward settling this. |
Well, now we know: the W3C's Semantic Web does not use paragraphs or any other (X)HTML syntax. The Semantic Web uses RDF [w3.org] and OWL [w3.org].
|TBL: The Semantic Web is not about the meaning of English documents. Itís not about marking up existing HTML documents to let a computer understand what they say. |
Agonising over whether to use a div or paragraph is mostly pointless. This trend in modern web design has only emerged due to confusion about what the Semantic Web actually is, fuelled by the myth that XHTML is a step towards the Semantic Web. The only connection is that both XHTML and RDF use XML-based syntax (OWL doesn't...?).
XHTML2 may add a way to embed Semantic markup (i.e. RDF) in web pages. Until the time when XHTML2 (or its replacement) becomes widespread, just use (X)HTML tags to define a basic document structure. Anything beyond the simple hooks required for accessibility, SEO and CSS styling is a waste of time.
Msg#: 11300 posted 10:12 am on Nov 13, 2005 (gmt 0)
Use xhtml and extend the doc type with custom elements.
Of the existing html elements, I'd say that the correct format would be:
Msg#: 11300 posted 7:30 pm on Nov 14, 2005 (gmt 0)
My English degree makes me wonder what you mean by semantic?
Who is reading the semantics? The browsers? They clearly don't care what you name a tag.
You, the developer, is reading the semantics in the server page source code? Then if DIV offends your intellectual reasonableness, isn't that a personal matter?
Or are you interested in other developers being able to read your source page and understand the content based on the tag names?
And then attributes?
Here is a suggested semantic that works on the client side as it has the expected DIV and the server developer side becuase it is self documenting:
The server developer recognizes that this is a structured element about a serial number.
The client web designer recognizes that this is a layout element with the expected behavior of a DIV.
Msg#: 11300 posted 12:26 am on Nov 15, 2005 (gmt 0)
Great thread ...
It's good to see the historical models being brought into the discussion, as they may provide the best clues for determining the "answer" to the original question.
SGML was/is not intended to be a presentation language at all, and will always be accompanied by a DTD which explains its syntax to the interpretive agent. It is the precursor to XML in that it's syntax is used exclusively to describe data ... not layout.
HTML expanded and diminished the concept of markup by including a limited subset of the SGML markup syntax that any WEB browser manufacturer could include in their program as a default reference, minimizing or eliminating the need for a DTD. The result was limited, commonly-acknowledged presentation rules, and the birth of the visual "web".
CSS is strictly for presentation and is an adjunct to HTML, expanding it's presentation capabilities. Like DTDs, an embedded or external definition of the property values must be included with each document in order for the interpreting client to render its instructions as the author intended. So we're back to DTD inclusion.
XML combined with XSL (or XHTML or any of the current flavors of the month markup concepts) provides the data-organization facility of SGML with the presentation facility of CSS. And it requires a DTD and a stylesheet reference in order for the interpreting client to be able to render the document as the author intended.
Of course, if you're still using Erwise or Viola as your web browser then you can see very quickly that the markup makes little difference to a client that cannot understand the syntax in the absence of a DTD. Hence the importance of standards that may be used by client manufacturers to improve the interpretation of the documents.
It seems clear to me that the demands of this and a future age of document sharing necessitate the inclusion of a DTD-type set of instructions both for interpreting the data and its internal and external relationships and to allow for appropriate rendering of the information within the client which is capable of integrating the wildly variable potential instructions.
HTML simply does not allow for full truly appropriate application of any but a very limited semantic philosophy. That's not its intent nor does it have the capability for it.
I think it's a terrific exercise to get down to brass tacks with whether HTML is capable of providing semantic relevance, as it helps us all to understand why such things are important now and will be much more important to our work in the future.
Msg#: 11300 posted 2:36 am on Nov 15, 2005 (gmt 0)
Just want to say how fun this thread is.
bedlam, holding your own and doing a great job.
Purple Martin, greasing the skids ... nice.
jetboy, working hard.
Hester, always very interesting.
Hanu, you devil! ;)
createErrorMsg, thanks for the kickoff and continuing direction!
| This 51 message thread spans 2 pages: < < 51 ( 1  ) |