Canonical tag - HTML forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Canonical tag

Won't validate with W3 validation service.

Broadway

3:13 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I'm trying to validate some new static html pages, doctype HTML 4.01 Transitional.

I'm using the W3 validator.

In my header I have a canonical tag.

<link rel="canonical" href="URL"/>

The W3 validator doesn't like the "/" mark contained in the tag.

It says:
The sequence <FOO /> can be interpreted in at least two different ways, depending on the DOCTYPE of the document. For HTML 4.01 Strict, the '/' terminates the tag <FOO (with an implied '>'). However, since many browsers don't interpret it this way, even in the presence of an HTML 4.01 Strict DOCTYPE, it is best to avoid it completely in pure HTML documents and reserve its use solely for those written in XHTML.

(Having the "/" mark there also causes some problem associated with validating the following "</head>" tag in my code.)

If I leave out the "/" the page does validate.

I assume the W3 folks are an authority but when I Google this I don't find a form of the canonical tag that doesn't have the "/" mark included. Google Webmaster Central only shows an example of the tag with the "/" included.

I want to be proper but I'm more interested in being interpreted properly by Google. What to do?

encyclo

3:21 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Google doesn't care about the trailing slash, however I would agree that they've been sloppy if they only show an example for XHTML.

For XHTML 1.x, use:

<link rel="canonical" href="http://example.com/" />

For HTML 4 or 5, use:

<link rel="canonical" href="http://example.com/">

Broadway

4:04 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Thank you. Appreciated.

rocknbil

7:35 pm on Feb 2, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The /> for empty tags is specific to XHTML doctypes.
<br />
<img />

Remove them for 4.01 doctypes to validate. I'm reasonably sure they are specifying XHTML style empty tags in the G documentation because it seems everyone is on the XHTML doctype bandwagon, even though most of the pages using XHTML doctypes are not using XHTML at all, they are vanilla html.

Broadway

4:04 pm on Feb 3, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I'm at a loss about all of this. After many years of having a websites I'm just now trying to validate pages.

All of my pages are static HTML. For no real reason, I have declared this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

is an XHTML declaration a better choice for any reason?

Fotiman

5:25 pm on Feb 3, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

No, XHTML is not a better choice for most people. If you're serving your documents as text/html, then HTML 4.01 is probably what you should be serving those documents as. Do you want your site to work in IE6? If so, you need to serve them as text/html, so stick with HTML 4.01. My only suggestion would be to use a strict doctype instead of transitional.

For more info:
Why most of us should NOT use XHTML [webmasterworld.com]

rocknbil

7:03 pm on Feb 3, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

For no real reason, I have declared this:

There are some very good reasons to use a 4.01 or html 5 declaration over XHTML.

In "the beginning" XHTML was meant to be a cross between HTML and XML, which would allow you to create your own tags and define what those tags are. My favorite example:

<p><movie>The Titanic</movie> was a movie about a passenger ship named <shipname>Titanic</shipname> of <adverb>titanic</adverb> proportions.</p>

Combined with a custom DTD that defined the meaning of those tags, this was supposed to provide context to the search engines and other machine devices, which ordinarily cannot extract the meaning of the three words in that paragraph.

For whatever reason, my guess is in reaction to the possibility (probability) of abuse, it never really took off. Yet many developers (blindly?) uses XHTML doctypes, claiming it's the latest and greatest, often with derisive comments like "html 4.01 is so 1990's." One argument is that is makes for cleaner code, which is only partly true. You can be equally strict with your code in 4.01, as you're discovering with validation.

So if you're not doing any extending of the HTML set, allow your document declaration to accurately define the contents of the document: plain old HTML.

Many here can help you debug validation problems. Know that often, a single error can cascade to other areas, and fixing the first may make subsequent errors go away. One example is

scriptname.php?this=that&these=those

Ampersands, even in URL's, need to be encoded or converted to entities:

scriptname.php?this=that&these=those

EDIT:ARGH . . . since when does this site parse entities? What you want there is the ampersand

&
next to the rest of the entity
amp;
with no spaces between.

Broadway

6:03 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Once again, thank you all.

swa66

7:11 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

There are other god reasons to use xhtml over html4.
As discussed out here in the past:
[webmasterworld.com...]

xml has structure, parsers etc. that can help you out in the far future where html tags will let you down.

If you do use xhtml: make _very_ sure it is valid, as that's the one big issue: invalid xhtml should not be rendered, but currently browsers still do so wrongly.

And html5 still has xhtml5 in there, so it's not a dead-end no matter what some would like you to think.