Draft HTML 5 spec released

Forum Moderators: open

Message Too Old, No Replies

Draft HTML 5 spec released

W3C released the draft spec for the next version of HTML

swa66

12:02 am on Jan 23, 2008 (gmt 0)

For those interested in draft specifications.
I particularly like the video and audio tags, but dislike the tag soup that remains an option in HTML5. Note that the document describes HTML5, XHML5 and the corresponding DOM5.

I've not yet read through it all.

[w3.org...]

mattur

1:23 am on Jan 23, 2008 (gmt 0)

Well spotted swa66. But it's misleading to say tag soup remains an option; browsers are required to recover gracefully from errors in HTML5 but the fail-on-error option is still available with XHTML5. Note both syntaxes build the same DOM.

What HTML5 uniquely does is define error-handling. Like it or not (and remember most of the web depends on it), all browsers implement error-handling and yet there is/was *no* standard for error-handling, just IE's undocumented de facto standard.

For the first time, HTML5 *precisely* describes how to handle HTML - something no other HTML spec has done.

This debate tends to overshadow all the new functionality: enhanced form controls (finally!), the canvas element, offline client storage API, captions for images, context menus, contenteditable, drag and drop, getElementsByClassName etc. See HTML 5 differences from HTML 4 [w3.org]

tedster

4:09 am on Jan 23, 2008 (gmt 0)

On the downside, in my opinion, is this:

4.12.3.12. Link type "noreferrer"
The noreferrer keyword may be used with a and area elements.
If a user agent follows a link defined by an a or area element that has the noreferrer keyword, the user agent must not include a Referer HTTP header (or equivalent for other protocols) in the request.

I get the inclusion of nofollow, prefetch and many other innovations. But analytics and referrer information are already problematic enough. What is the up side of putting this in the spec?

swa66

11:46 am on Jan 23, 2008 (gmt 0)

But analytics and referrer information are already problematic enough. What is the up side of putting this in the spec?

I see it as very positive from a security perspective.

Suppose you have e.g. stock trading app. If you link to the companies' website and the referrer gets sent, it yield info to others of impending stock attention.
It gets worse when e.g. session id's get stored in URLs, when intranets reveal their URLs to the outside etc.

swa66

11:49 am on Jan 23, 2008 (gmt 0)

But it's misleading to say tag soup remains an option

In the examples they show a lot of unclosed tags, of mixing upper and lowercase etc. It made me sick.

mattur

2:10 pm on Jan 23, 2008 (gmt 0)

Tag soup is widely understood to mean non-conformant HTML (eg mismatched tags) and/or presentational HTML.

Leaving out optional closing tags in HTML does not make code non-conformant; some prefer lean and mean code, others prefer to add redundant closing tags for readability. The end result is exactly the same. HTML has different syntax rules to XML, but they're still rules. It shouldn't make anyone who understands this "sick" :)

Uppercase or lowercase element identifiers are allowed in HTML. Perhaps it would be better if the spec's examples used all lowercase tags. It may be an oversight in the current draft or maybe it's just to demonstrate it doesn't matter in HTML. It's really not that important. HTML allows you to use your own coding conventions, or you can use XHTML5 if you prefer to enforce XML syntax rules on your code.

IanKelley

11:50 pm on Jan 23, 2008 (gmt 0)

some prefer lean and mean code

Which makes a lot of sense if you're writing HTML for low bandwidth situations like cell phones. Closing tags and redundant use of /> might make the more tense <grin></grin> among us feel better but they also add a lot of overhead.

swa66

12:53 pm on Jan 24, 2008 (gmt 0)

Although it gets rather far from the subject, the implicit closing of tags leads to people not knowing where the tags get closed. And that leads to questions in e.g. the CSS forum out here of why their CSS doesn't get applied on their text that got (unknowingly) left out of the tags. Typical case is a <ul> closing a <p>.

As for bandwidth on phones ... I'm seeing offerings out here of 3.6Mbps downstream and even talk of 7.2Mbps down and 2Mbps upstream. I guess that's plenty to have the one byte extra for a "/", or the 4 bytes for a </p>. If you care that much about the bandwidth, use gzip.

mattur

3:10 pm on Jan 24, 2008 (gmt 0)

There are always going to be beginners writing HTML who don't understand HTML. The syntax rules of the language can't fix that. I don't think requiring people to learn HTML in order to use it is particularly onerous, in fact I think it's unavoidable ;) If you feel strongly about this issue you could make your case on the public-html list. However Hixie has previously explained the rationale:

There are some members of the group... who want a limited allowed syntax that doesn't give much flexibility in how you express a DOM. I understand that. However, there are other members of the group who want to be able to use a different syntax.
Both of these desires are, IMHO, reasonable, and both have, IMHO, good reasons behind them. However, I can't make both groups happy, since the two syntaxes are mutually exclusive.
The compromise solution is to make the HTML language allow both, and allow Web authors to add further constraints on themselves in the production process... That way, you get to write documents how you want them, and the other group gets to write documents how _they_ want them.

IMHO maintaining backward-compatibility with existing HTML content outweighs any potential advantage of enforcing a particular code-style. A valid HTML4 document will not become non-conforming in HTML5 merely because it uses the leaner syntax. As above the XHTML5 option is available for people who need/prefer it.

Like other code-style debates (eg tabs vs spaces, curly brackets on same vs new line), leaving the decision to the coder appears to be the best option.

httpwebwitch

4:16 pm on Jan 24, 2008 (gmt 0)

fascinating - but this is going to take me a few days to digest. A question for those who have already pored through it: Is HTML5 a radical departure from 4? What will the average developer need to know, and how will HTML5 change they way they mark up content or manipulate the DOM?

My major concern is whether libraries like Mootools or Jquery will be affected by a DTD upgrade.

pageoneresults

5:07 pm on Jan 24, 2008 (gmt 0)

3.3. Documents and document fragments
3.3.1. Semantics
[w3.org...]

The above is where most SEO's would probably want to start as it discusses semantics, structure and all the "content" you'll be working with.

These are the new buzzwords for HTML5 from an SEO viewpoint...

3.3.3. Kinds of content
3.3.3.1. Metadata content
3.3.3.2. Prose content
3.3.3.3. Sectioning content
3.3.3.4. Heading content
3.3.3.5. Phrasing content
3.3.3.6. Embedded content
3.3.3.7. Interactive content

You'll want to familiarize yourself with the new classifications of content as it all relates back to Semantics. There is a place for everything and everything should be in its place. :)

I'm not real fond of all the optional closing elements. I've never taken that path and will always "fully" close my elements. I prefer Strict Mode all the way.

tedster

5:31 pm on Jan 24, 2008 (gmt 0)

Some older browsers used to act up when "optional" end tags were omitted. The minute I discovered that, I started my strict habits too. That's not something I would give up lightly, either. Every time I need to work on code that is not written like this, it adds more time to the process. And in some rare cases, it still seems like the end tag can fix a browser bug.

mattur

5:41 pm on Jan 24, 2008 (gmt 0)

Is HTML5 a radical departure from 4?

It's an evolution rather than a revolution (see the HTML5 Design Principles [w3.org]). Theoretically a valid HTML4 document will also be conformant HTML5, though some presentational attributes (eg border on images) are removed.

My major concern is whether libraries like Mootools or Jquery will be affected by a DTD upgrade.

That's a good question! I think the DOM will be backwards compatible (ie it will generate the same standard DOM tree as in HTML4 so existing DOM manipulation scripts will not need to be re-written) but I suspect it will depend more on the way a certain browser implements HTML5...

The main thing HTML5 will mean for developers is a richer toolset (eg canvas) and fewer workarounds for basic functionality (eg an input type for email).

pageoneresults

5:29 am on Jan 25, 2008 (gmt 0)

I've been reading most of the day and HTML 5 is fascinating to say the least. From an SEO perspective, those in the know will have embraced HTML 5 and be light years ahead of their competition. It is all about Semantics and Structure. Actually, it always has been. This is just the next level.

I'm getting ready to print that document so I decided to run some of my own tests on it. Whew, you ready for this?

HTML 1 2,032,139 
CSS 1 2,259
Images 3 57,828 
CSS Images 1 1,059
Total 6 2,093,285
Text to HTML Ratio: 50.78%

To view the source of that document is to view a true HTML masterpiece.

220,426 Words
1,651,382 Characters (No Spaces)
1,985,222 Characters (With Spaces)
46,917 Lines
957 Longest Line

And...

This Page Is Valid HTML 4.01 Strict!

I was hoping to see an override and it might have read...

This Page Is Valid HTML 5.0 Strict!

pageoneresults

3:01 pm on Jan 25, 2008 (gmt 0)

I'm going to keep this one alive!

It's an evolution rather than a revolution.

It sure is. I'm still reading. Got about 2.0 miles to go and I should be done and fully conversant on HTML 5. ;)

From a Webmaster perspective, I'm referring to this as SEO 5.0. If you are an SEO and have not fully understood the markup you are working with, the future may be grim for you. As technology evolves, and it does at a rate that is phenomenal, semantics become an even greater part of the equation.

HTML 5 is almost like a "back to basics" approach but with a greater depth of refining the meaning and structure of documents. Get ready folks, your SEO landscape may be changing soon. You'll need to familiarize yourself with all those neat little buttons in your WYSIWYG interfaces. And, you're going to need to customize them to reflect the elements that you'll be working with out of the box.

Content writers will now need to be "strict" in their document structure. Writing an article will become an exact science. You won't just cut and paste something from Word and plop it on the web, that may not work anymore. You'll need to understand each and every element you have available to you and when to use them. CMS will take on a whole new meaning. Text Editors will now have buttons for all sorts of things. When finished with a document, you may have used over 50 different elements to describe the meaning of that document, the semantics of it all. :)

Note: Do not attempt to print the HTML 5 Working Draft unless you have a commercial grade printer and a couple of reams of paper. That document is an absolute monster just waiting to fill up your print queue!

No, bookmark the main document and then bookmark the various fragment identifiers that are of interest to you, the sections. There is a right way to reading the W3. The pages are constructed in a way that you can easily browse the document using those fragment identifiers.

A Note to the Authors of the Working Draft

I'm not real jazzed that the authors would use examples that have no closing elements. And then use examples that have closing elements. That should confuse quite a few people that don't understand that those closing elements are optional. I would have been strict in using closing elements in the visual examples always pointing out that it was optional. Although I think optional closing elements open up a can of worms for many. One improperly closed element within a nest that are not using the closing elements is going to wreak havoc! ;)

For those of you wondering what the primary differences are between HTML 4 and HTML 5, go here...

HTML 5 differences from HTML 4
[w3.org...]

httpwebwitch

6:14 pm on Jan 25, 2008 (gmt 0)

And in some rare cases, it still seems like the end tag can fix a browser bug

case in point:

<script src="myscript.js"></script>

behaves differently than

<script src="myscript.js" />

Anyone who didn't start closing all their tags a few years ago has some catching up to do. I won't accept work that wasn't developed using strict DOCTYPE, and if I see an unclosed <br> or <img> I send it back to the kitchen.

httpwebwitch

6:28 pm on Jan 25, 2008 (gmt 0)

Loving the new set of tags! <nav>, <aside>, <section>... all useful. I also got a chuckle out of the new "irrelevant" attribute which hides content.

However I'm puzzled by the usefulness of this:


<input list=browsers>
<datalist id=browsers>
 <option value="Safari">
 <option value="Internet Explorer">
 <option value="Opera">
 <option value="Firefox">
</datalist>

Is it different from <select> + <option>? I guess I'll wait to see how it gets used... perhaps I can bind the same <datalist> to more than one <input>!

bidding a fond farewell to <center>, <font>, <strike>, and <u>... Why get rid of <u> and not <b> and <i>? I'll need to take some time off to grieve, then I'll change <strike> to <span class="strikeout">

And if they're getting rid of <table> attributes cellpadding and cellspacing, then browsers had better start applying CSS properly for those properties!

My favourite change of all: the classList accessor for HTMLElement, with methods has(), add(), remove(), and toggle().

JAB Creations

7:58 pm on Jan 25, 2008 (gmt 0)

<!DOCTYPE HTML>

Lacking a version of HTML implies it is the last version of HTML as it can no longer be identified: Rejected. I'll spend time looking at the rest of the specification after they fix this. Until then I'll use XHTML 1.1 and add XHTML 5 support via the correct media type if anything interesting turns up.

- John

pageoneresults

8:33 pm on Jan 25, 2008 (gmt 0)

8.1.2.4. Optional tags

Certain tags can be omitted.
An html element's start tag may be omitted if the first thing inside the html element is not a space character or a comment.
An html element's end tag may be omitted if the html element is not immediately followed by a space character or a comment.
A head element's start tag may be omitted if the first thing inside the head element is an element.
A head element's end tag may be omitted if the head element is not immediately followed by a space character or a comment.
A body element's start tag may be omitted if the first thing inside the body element is not a space character or a comment, except if the first thing inside the body element is a script or style element.
A body element's end tag may be omitted if the body element is not immediately followed by a space character or a comment.
A li element's end tag may be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element.
A dt element's end tag may be omitted if the dt element is immediately followed by another dt element or a dd element.
A dd element's end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element.
A p element's end tag may be omitted if the p element is immediately followed by an address, blockquote, dl, fieldset, form, h1, h2, h3, h4, h5, h6, hr, menu, ol, p, pre, table, or ul element, or if there is no more content in the parent element.
An optgroup element's end tag may be omitted if the optgroup element is immediately followed by another optgroup element, or if there is no more content in the parent element.
An option element's end tag may be omitted if the option element is immediately followed by another option element, or if there is no more content in the parent element.
A colgroup element's start tag may be omitted if the first thing inside the colgroup element is a col element, and if the element is not immediately preceded by another colgroup element whose end tag has been omitted.
A colgroup element's end tag may be omitted if the colgroup element is not immediately followed by a space character or a comment.
A thead element's end tag may be omitted if the thead element is immediately followed by a tbody or tfoot element.
A tbody element's start tag may be omitted if the first thing inside the tbody element is a tr element, and if the element is not immediately preceded by a tbody, thead, or tfoot element whose end tag has been omitted.
A tbody element's end tag may be omitted if the tbody element is immediately followed by a tbody or tfoot element, or if there is no more content in the parent element.
A tfoot element's end tag may be omitted if the tfoot element is immediately followed by a tbody element, or if there is no more content in the parent element.
A tr element's end tag may be omitted if the tr element is immediately followed by another tr element, or if there is no more content in the parent element.
A td element's end tag may be omitted if the td element is immediately followed by a td or th element, or if there is no more content in the parent element.
A th element's end tag may be omitted if the th element is immediately followed by a td or th element, or if there is no more content in the parent element.
However, a start tag must never be omitted if it has any attributes.

I had to paste the entire section as it really brings to light a very volatile editing environment if you decide to choose optional end tags. Look at the list above. And then review when you can use an optional end tag. Imagine relaying those specifications to your editing team? And then envision just one person misinterpreting the use of an end tag somewhere and bringing down the whole house of cards. :)

IanKelley

9:06 pm on Jan 25, 2008 (gmt 0)

Although it gets rather far from the subject, the implicit closing of tags leads to people not knowing where the tags get closed.

Nevertheless, for those people who DO know how the language works it's a resonable consideration.

As for bandwidth on phones ... I'm seeing offerings out here of 3.6Mbps downstream

This would be basically the same as ignoring dial up connection speeds a few years ago when broadband was starting to take over. Just because higher connections speeds are available does not mean that everyone has those speeds.

gzip

Cell phones have tiny little CPUs ;-)

httpwebwitch

11:31 pm on Jan 25, 2008 (gmt 0)

<blink>
question: as they hash out these specs, do the W3C folks have a user agent in vitro they use to test whether their ideas are hot or not?
</blink>

mattur

5:58 pm on Jan 26, 2008 (gmt 0)

<!DOCTYPE HTML> Lacking a version of HTML implies it is the last version of HTML as it can no longer be identified: Rejected.

CSS lacks a version identifier. Have you "rejected" CSS? ;)

I'll spend time looking at the rest of the specification after they fix this.

The only reason there is a doctype at all in HTML5 is to trigger standards compliance mode in browsers. Browsers do not have separate layout engines for HTML2.0, HTML3.2, HTML4.01 etc they have just one layout engine for "HTML". The HTML5 spec aims to define how this "HTML" layout engine works.

pageoneresults

6:08 pm on Jan 26, 2008 (gmt 0)

They've eliminated the <acronym> element...

The following elements are not included because they have not been used often, created confusion or can be handled by other elements:
acronym is not included because it has created lots of confusion. Authors are to use abbr for abbreviations.

I think I am capable enough to determine what is an Acronym and what is an Abbreviation. But, I don't mind one less tag for this functionality.

And, it's nice to see that they've finally provided distinct separation between <b> vs <strong> and <i> vs <em>...

The b element now represents a span of text to be stylistically offset from the normal prose without conveying any extra importance, such as key words in a document abstract, product names in a review, or other spans of text whose typical typographic presentation is emboldened.

The strong element now represents importance rather than strong emphasis.

The i element now represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose whose typical typographic presentation is italicized. Usage varies widely by language.

I've been using all four elements when appropriate. Many of the latest versions of WYSIWYG Editors have it wrong out of the box. Everything is getting wrapped in <strong> or <em> when pressing the B and I buttons. That's not right.

mattur

6:20 pm on Jan 26, 2008 (gmt 0)

<input list=browsers>
<datalist id=browsers>
<option value="Safari">
<option value="Internet Explorer">
<option value="Opera">
<option value="Firefox">
</datalist>
Is it different from <select> + <option>?

It's a combo box - select from the drop down list or enter text that isn't on the list.

do the W3C folks have a user agent in vitro...?

The driving force behind HTML5 remains the WHATWG which includes people from Mozilla, Opera and Apple, and all three have started implementing HTML5 features [en.wikipedia.org] - Opera has implemented all the new web controls and leads the pack.

Chris Wilson from Microsoft has said:

"Microsoft strongly supports the effort within W3C to develop HTML5 and we plan to continue to be deeply involved in this work. While we support the release of a working draft of HTML5 at this time, we note that the current draft covers a number of deliverables that are outside the scope of the current HTML WG charter..."

and this is the nearest we've come to an official statement from Microsoft on its approach to HTML5.

The plan is to develop javascript/CSS shims to provide (some) backwards compatibility for older browsers (i.e. IE) in a similar way to Dean Edwards' IE7 compatibility library. So theoretically we won't have to wait until IE7 dies out before using new features.

daveVk

11:51 pm on Jan 26, 2008 (gmt 0)

gzip

Cell phones have tiny little CPUs ;-)

They can handle compressed images, why not compressed text?

IanKelley

12:08 am on Jan 27, 2008 (gmt 0)

I can't speak for all cell phones everywhere but my $500 smartphone doesn't have the extra processor power to unzip while rendering a webpage without taking more time than would be saved by the compression.</offtopic>