W3C Validation: Paragraphs, Lists, and Tables

Forum Moderators: open

Message Too Old, No Replies

W3C Validation: Paragraphs, Lists, and Tables

Proper use of paragraph tags in the body of your document

synergy

5:17 am on Aug 11, 2004 (gmt 0)

I'm taking the step towards absolute validation of all of my HTML documents.

In my document, I use several lists within the content, along with a table of data.

When validating my page as XHTML 1.0 Transitional, the validator kicks back errors when I try to encompass lists or tables with paragraph tags.

For instance:

<p>On this site you can learn about:
<ul>
<li>This; and</li>
<li>That</li>
</ul>
</p>

Does not validate. It wants the paragraph tag to come before the list, when the list is obviously a part of the first paragraph of the document.

What is the proper use of the <p> tag? I guess a list is not part of the paragraph that describes/leads to it? This makes no sense to me.

Please enlighten me :)

Hester

3:52 pm on Aug 18, 2004 (gmt 0)

This makes sense as I imagine the writers of the early W3C specs were imagining a lot of people converting documents from paper on to the web. Having said that it's a shame HTML is still so poor in relation to print. There's still a lot you can't do easily, if at all.

Gusgsm

4:14 pm on Aug 18, 2004 (gmt 0)

Yes, it makes sense as you explain it. I thought they were thinking more in a linguistic / meaning direction.

Thanks :)

ergophobe

4:37 pm on Aug 18, 2004 (gmt 0)

I'm not an SGML guy at all. I also don't know how much the html recs owe to pre-existing markup languages derived from SGML, which would have been designed for an ink-on-paper world in the first place.

SGML allows for SHORTTAG and OMITTAG to be set to yes (in XML and XHTML, this is set to no, but that is not true for all SGML-derived markup). That means that you can have

- unclosed start and end tags
- empty start and end tags

By SGML definitions, a block-level element with no end tag is closed by the next block-level element.
So that means that the parser has to either look ahead or risk leaving orphaned text. For example,

<p>text block 1
<ul><li>some list item
text block 2
</ul>
text block 3

Clearly, the "text block 2" belongs to the list item, but what does "text block 3" belong to? In XML/XHTML, it belongs to the <p> element whose end tag should come at some point, but in an SGML-derived language with SHORTTAG yes and OMITTAG yes, it is orphaned because the <p> element was closed by the <ul> tag.

So I think that basically what you have are style manuals, which record the practice of editors and compositors, then SGML which drew on the practice at a given time (1960s and 1970s) to make a language for document description and presentation, and then HTML is derived from SGML. So going from the CMS definition to the SGML implementation to the HTML Recs, the paragraphing rules make more sense, no?

Tom

Beau

4:56 pm on Aug 18, 2004 (gmt 0)

I'm very interested in this.

As someone who has previously only ever built sites in WYSIWYG mode this is my first attempt at coding (and using CSS for positioning).

I thought it was going ok(ish) until I went to validate a page. The CSS validated fine but the HTML - oops!

So I worked through until I only have 1 error coming up (numerous times but the same error):

<div id="centrebox">
<p></p>
<H2>Beds</H2>
<H3>
<span class="pictures">
<a href="my.htm" class="img" onClick="popitup(this.href,'console',360,680);return false;"target="_blank"><img src="my_thumb.jpg" name="my_thumb" width="110" height="110" border="0" alt="Click for more details"></a>
</span>
<span class="centrerightbox">
<a href="my.htm" onClick="popitup(this.href,'console',360,680);return false;"target="_blank">Description</a>
</span>
<div style="clear: both;"></div>
/*this is then repeated x times down the page for different products*/
</div>

The problem that the validator has is the:
<div style="clear: both;"></div>
It says "document type does not allow element "DIV" here" etc etc

So what should I replace it with?

Also after reading tedster's comments:

Excessive use of <span> tags is often a sign of "street code" - and yes, misuse of <p> or   or <br> are often rampant, as well as nesting errors, completely redundant mark-up and so on.

have I've gone from one street code to another lot (even if it mainly validates)...

After all - the <span> tags in there were originally <div> tags but the validator didn't like them.

And lack of <p> mark-up. I had loads but the validator didn't like that either so I've replaced them with <br> - not exactly the same I know but it validates. example:
<H3>
<p></p>
<p>text<a href="link.htm" target="_top">link_text</a>text</p>
</H3>
<H5>
<p>Text</p>
<p>Text</p>
<p>Text</p>
</H5>
became:
<H3>
<br>
<br>text<a href="link.htm" target="_top">link_text</a>text</H3>
<H5>
<br>Text
<br>Text
<br>Text
</H5>

So is the second one correct or how should I be writing it?

My apologies if these seem like basic questions/errors. I am not a profesional web designer - could you guess ;-) - and the pages all load / work fine but I would prefer to learn how to write correct code.

TIA
Beau

bedlam

6:15 pm on Aug 18, 2004 (gmt 0)

I find that slightly odd in that a div isn't declaring the text to be a specific thing like a paragraph or an address, which there are tags for. It just says it's a 'division' on the page.

Well HTML is, in a way, a very limited markup language. It really can't hope to adequately describe every kind of content that an author or developer might need to mark up. The solution, with HTML is to resort to the most generic relevant element available. A good example of this is the way css menus are being built these days; they're often marked up as lists and styled appropriately. Is a navigation menu a list? Arguably, yes. But if they are, they're an extremely specialized species of list - and it doesn't matter because <ul> is the closest you can get.

So from there, it's simple to imagine an element on a page that - for example - needs to contain text and a list, or needs to contain text, but also can't work within the limitations of the <p> tag...

-B

tedster

6:57 pm on Aug 18, 2004 (gmt 0)

Beau, in the code you posted, the <H3> tag is open - that's why a <div> is not allowed in that spot. Certain tags cannot contain other tags, like an Hn tag cannot contain a <div>

One thing you probably want to study up on is the difference between a block-level tag, like <div> or <h3> or <p>, and an inline tag, like <span> or <a> or <strong>

Here's a place to start:

[htmlhelp.com...]
[htmlhelp.com...]

[w3.org...]

<added later>
Looking at that mark-up, I wonder if that H3 or H5 are really headings at all - they look more like content, rather than headings.

H tags are the structural outline or framework of a document. In almost all cases, each H tag would be followed by pure content (back to your lists, paragraps and tables topic). The H tags do not CONTAIN the content, they just "introduce" it with a short line of text - a heading.

Beau

8:58 pm on Aug 18, 2004 (gmt 0)

Thanks Tedster:

I now understand what I was doing wrong (quite a lot!) and my pages are validating fine.

Just got to work through and apply all the changes now.

Cheers
Beau

HarryM

10:45 pm on Aug 18, 2004 (gmt 0)

H tags are the structural outline or framework of a document.

I agree with Tedster. H tags were originally a sensible method of indicating that the contained text was a heading, with H1 defining the main heading, H2 the next sub-heading, etc. Unfortunately HTML will validate if you include whole paragraphs of text within H tags which rather defeats the object. (Although to be fair, some headings in academic documents can be quite lengthy.)

From a visual point of view H tags are redundant because you can achieve the same visual effect with P or SPAN. However they have become an industry standard because some Search Engines use them to help define what the page is about. In fact the SEs probably apply stricter criteria than W3C by penalizing their misuse.

What would be good would be to see W3C pick up on this and exclude the use of Tags such as <br /> within H tags. But as they have frozen further HTML standards, this will probably never happen.

Josefu

7:21 am on Aug 19, 2004 (gmt 0)

A very interesting read, thanks guys. One note: would someone care to define "street code"? It's not in your glossary yet : )

... and while I'm here: How can the <span> tag be 'abused' as you say?

g1smd

10:12 am on Aug 19, 2004 (gmt 0)

>> I meant what about divs with no other tags inside them - pure text. I even gave a clear example later on, which I will repeat here: <<

>> <div>Some text here.</div> <<

I would call that tag soup. I would use <p>Text Here</p> instead. I would only use a <div> if it were to enclose several paragraphs and apply a style or positioning to them all together as one.

<div>
<p> first </p>
<p> second </p>
<p> third </p>
</div>

g1smd

10:21 am on Aug 19, 2004 (gmt 0)

>> How can the <span> tag be 'abused' as you say? <<

Just find any typical FrontPage authored document and see the typical markup like:

which I would code by hand simply as:

<p class="whatever">Fax: +1 111 315 0441</p>

See the difference?

[edited typo]

[edited by: g1smd at 10:33 am (utc) on Aug. 19, 2004]

Hester

10:25 am on Aug 19, 2004 (gmt 0)

g1smd: I would call that tag soup. I would use <p>Text Here</p> instead. I would only use a <div> if it were to enclose several paragraphs and apply a style or positioning to them all together as one.

What if you wanted to place a line of text somewhere using absolute or fixed positioning? I would just use a div tag around the text - no need for paragraph tags?

HarryM: What would be good would be to see W3C pick up on this and exclude the use of Tags such as <br /> within H tags. But as they have frozen further HTML standards, this will probably never happen.

Why would they exclude the break tag? It makes sense if you have a long header in a short space and want to make sure it breaks at the right place.

Also, HTML standards have not been frozen, they have been replaced by XHTML. The latest draft for XHTML 2 has some interesting new ideas, such as a generic <h> tag for headers (no longer <h1> to <h6>, though those tags are still able to be used). It has a new <l> tag to define lines - breaks are no longer used. Also a new navigation list tag, and so on.

g1smd

10:39 am on Aug 19, 2004 (gmt 0)

My documents consist of headings, paragraphs, lists, tables and forms. I use CSS to style them. I style the html and body element with the default style for the page and I only use classes for any elements that are going to have a different style to the rest of the page.

By using headings, paragraphs, lists, tables, and forms my document has a structure.

I would only use a div or span to apply styles over and above that semantic structure.

Josefu

10:43 am on Aug 19, 2004 (gmt 0)

LOL I understand, though normally I like my soup with bigger chunks in it : )

Gusgsm

10:57 am on Aug 19, 2004 (gmt 0)

g1smd,

I agree. As a matter of fact, I try to use the W3C validator with the option that stresses that structural approach (the "Show Outline" option).

I find it quite useful to apply see an (structural) outline of a page (mine's and others'). Too many times I make 'structural' mistakes :(

g1smd

11:04 am on Aug 19, 2004 (gmt 0)

Yes, tick the box for show outline at [validator.w3.org...] to get a heading summary.

Very useful. I use that all the time too.

py9jmas

11:10 am on Aug 19, 2004 (gmt 0)

The "Semantic data extractor" at
[w3.org...]
is also good. (When it works - it's giving me Java errors at the moment)

HarryM

10:35 pm on Aug 19, 2004 (gmt 0)

Why would they exclude the break tag? It makes sense if you have a long header in a short space and want to make sure it breaks at the right place.

I use it too for the same purpose, although maybe I shouldn't. Maybe my example was wrong. What I was intending to show was that W3C places little restrictions on the H tag to prevent it being used as just another text markup such as P.

>> <div>Some text here.</div> <<
I would call that tag soup

Why? It validates and is perfectly logical. The great thing about divs is that all browsers implement the tag correctly - i.e., to partition a document at the pixel level. With a CSS style for the div specifying the font and font size any contained text is defined with the minimum of html.

Putting unnecesary P tags in Divs just because it is believed to be the correct thing to do - that is tag soup.

HTML standards have not been frozen, they have been replaced by XHTML

If you take the 'not' out of that sentence it would be correct.

I use XHTML 1.0 Transitional. But I doubt if I would ever got to Strict, let alone XHTML 2. Why would I want to commit internet suicide? There are enough problems with elderly browsers out there without compounding it. Whether we like it or not HTML 4 is becoming the industry standard, and in the commercial world, if it ain't broke why fix it?

Sorry, guys. Feeling testy tonight. Probably because I have just spent all day wrestling with a blockage in my drains. Whatever our great ideas, reality is always there... Anyone know a cheap plumber?

PatrickDeese

10:54 pm on Aug 19, 2004 (gmt 0)

This is my personal theory/philosophy.

I use <p> tags etc simply because I want the SE's bots to understand that the content is for human consumption. I use <p>'s for the same reason that I use <h1> - I want to label particular sections of the HTML document in a manner that I believe that the bots are most likely to interpret correctly.

Just like keyword stuffing in comment tags used to work, and no longer doesn't - I apply the standard, traditional mark-up in my web pages in the hopes that they will be interpreted in the manner that they are intended to be interpreted.

Just my 2 cents.

tedster

11:05 pm on Aug 19, 2004 (gmt 0)

...HTML 4 is becoming the industry standard, and in the commercial world, if it ain't broke why fix it?

Agreed - but why not learn to write HTML4 strict, instead of transitional? You'll be amazed at how that deepens your understanding of all html. And how clean your mark-up becomes, with a nice high content to file size ratio.

synergy

11:25 pm on Aug 19, 2004 (gmt 0)

but why not learn to write HTML4 strict, instead of transitional?

Tedster, would you consider HTML4 strict the closest thing to XHTML? I've been using transitional XHTML for 1 year now. I'm almost afraid to go back because I'm so used to coding XHTML style.

The "Semantic data extractor" at
[w3.org...]
I would LOVE to try this tool out, too bad it's broke. Any other alternatives to this?

tedster

2:43 am on Aug 20, 2004 (gmt 0)

would you consider HTML4 strict the closest thing to XHTML?

It's not really accurate to say it's the closest thing to xhtml -- because it's not xhtml. But strict mark-up, with or without the "x" is really where the cutting edge lies.

I consider learning strict mark-up to be the most important shift I went through as web author. Transitional mark-up, whether html or xhtml, is only a sneeze away from the tag soup that we had with html 3.2

I'd say if you need xhtml, then fine...but write strict xhtml. There's no virtue in writing xhtml transitional and thinking you've got the latest and greatest. It's not that hot, it's still a major compromise in many, many ways.

This 52 message thread spans 2 pages: 52