Forum Moderators: open

Message Too Old, No Replies

One man's journey towards valid code

         

Mohamed_E

3:45 pm on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A couple of days ago my entire site (minus one page) validated to 4.01 Strict. It is clear that many of us are interested in validation; I would like to share the steps that I took, over a couple of years, to get there. Given the long time span of this process I have obviously forgotten many of the details.

I realize that few people use the set of tools that I use, but I hope that the main ideas will apply in any environment.

Getting Started

Initially my site was very simple. Each page had a title, then a series of headers, with paragraphs, lists and tables (of tabular material). A few bold and italics tags, and the odd superscript on dates. I used emacs as an editor, with the old (no longer available on the web) html-mode.el, which puts out most tags in pairs, so that unclosed tags should not occur.

Early in my reading on HTML I found the Web Development Group site. Not only does it have an online validator, it also has an excellent page on links to other validators, through which I discovered Tidy. Since I am very comfortable with command line programs I do most of my validation with it.

Even with my simple structure I found errors. After modifying headers with global replace commands the validator found several lines like:


<h[b]3[/b]>Header</h[b]4[/b]>

Browsers had no problem with them, but that was obviously "wrong".

I also got lots of complaints that I did not (at the time) understand along the lines of:


Unknown entity whatever

which I soon learned were the result of unescaped ampersands. When these were fixed I soon found it easy to produce consistently valid Transitional code.

Adding Some Complexity

Soon I started adding some minimal complexity to my site, adding margins and a background color. Obviously I did this directly with HTML code, it did not justify learning about CSS. But soon I wanted to add a navigation column to the side of each page. I had read enough here to know that I did not want to use tables for layout, so I slowly and painfully learned enough CSS to produce a two column layout that validated to Transitional. Many thanks to those of you who helped me through that learning process!

Once I was using CSS my thoughts turned to validating to Strict. Changing the DOCTYPE to Strict on a couple of files produced a discouraging number of error messages, many of which I did not immediately see how to deal with. So instead of attempting to convert the site, I started reading and attempting to make all new constructs valid according to Strict.

Converting to Strict

After about one year I felt confident that it was doable. I had identified two major non-Strict constructs:

  • Tables all were centered using align="center" in the <table> tag
  • All blockquotes had loose text in them; Strict requires it to be encapsulated in a block element (paragraph).

Fortunately both of these problems could be corrected with an editing script (I used sed). After that I had a couple of dozen minor things to clean up, all in all it took about half a day to make the transition. One problem remained: I wanted to use target="_blank", which is not valid Strict, and did not like any of the workarounds. So I exported the files to my server with a DOCTYPE of Transitional, but on my workstation I used Strict to be sure that no other non-Strict constructs would creap in.

A few weeks ago I realized that the pages I linked to in a separate window were rarely used by my users, so I gradually removed the target="_blank", and now am entirely (except for one page) Strict.

Next Steps

There are at least three things that validate but still bother me that I am working on:

  • I have a lot of places were I use <br> for spacing, I will need to find better ways of achieving the layout I want.
  • 4.01 Strict allows loose text in the <body> tag, so I have lots of "paragraphs" that do not have a <p> tag (after headers, lists and tables, where the previous tag puts out the desired space). I feel that they should all be properly tagged.
  • I never close my <p> or <li> tags, the spec allows them to be unclosed. But tags should enclose their contents; I want to do so.

Fortunately HTML-Tidy can do the last two things automatically. Unfortunately I use a pre-processor (GTML), so my source files cannot be fed to Tidy. Instead I will have to convert the html files, then edit them back into GTML format. I am pretty sure that I have figured out how to do that, with a mixture of editing scripts and, alas, some hand editing.

There is no mechanical solution to the first problem, but I should be able slowly to solve most instances of it, one at a time.

tedster

7:58 pm on Mar 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for that account, Mohamed_E. I sure hear lots of things in there that ring true for me. In fact, I wish I could go back and change to valid code on legacy pages, but for now the best I can do is clean up my act going forward.

I was kicked into using closing </p> and </li> when I saw that some builds of NN4 NEED those tags to render right alignment properly over multiple lines of text. I didn't do it to validate, I did it to handle one of many NN4 bugs.