|HTML 4.01 vs XHTML|
In some kind of old "make your bussiness website" books, writers were advising the reader to stay with HTML4, since SE robots used to have some problems crawling trough XHTML code.
Is this problem completly solved now?
Does the spiders likes the XHTML code?
I imagine that they would as it goes for the pure seperation of style from content.
[edited by: BlobFisk at 1:23 pm (utc) on Aug. 19, 2005]
At one time, I was all over XHTML because it was the "latest." But looking back on it now, I ask just what the point of XHTML was to begin with. Unless there is a particular reason to use XHTML, it is better to stick with HTML 4.01 Strict and save a few random bites.
There is a strictly theoretical possibility of a problem with trailing slashes, in particular on meta elements in the head section. However, no such problems exist with any of the major search engine spiders which all have no problem parsing XHTML.
One thing that the spiders can't do is parse XHTML when served with the MIME type
application/xml - however Internet Explorer can't read such files either, so content served this way is rare. If you are doing it, make sure you are serving
text/html by default.
For absolute maximum compatibility you can use HTML 4.01 Strict. However, there is no real reason why you can't use XHTML 1.0 syntax if you prefer it. As BlobFisk says, if you are using a strict XHTML DTD, the separation of style from content will make your XHTML page much easier to parse for a spider than a "tag soup" HTML page.
|For absolute maximum compatibility you can use HTML 4.01 Strict. However, there is no real reason why you can't use XHTML 1.0 syntax if you prefer it. As BlobFisk says, if you are using a strict XHTML DTD, the separation of style from content will make your XHTML page much easier to parse for a spider than a "tag soup" HTML page. |
I do not see how XHTML 1.0 is better as it can suffer the same "tag soup" problems as HTML 4.01. Also, you can just as easily separate style from content in HTML 4.01 Strict. So what real and practical advantages does XHTML 1.0 has that do not exist in HTML 4.01?
In practical terms, if you're serving only to a web-browser, HTML 4.01 strict is just fine. I gather that there are some theoretical advantages in embedding advanced content, but it's typically very poorly supported.
If your pages might need to be parsed by something else, however, you should be with XHTML all the way; XML is far easier to parse than SGML.
A quick example: You might be catering for a customized browser (perhaps put out on your company intranet) that offers different views of the same data using built in XSLT transformations.
XHTML is the cornerstone of cross-platform scalability. combined with css and other markup languages, it will drive the 'write once, use everywhere' objective.
|XHTML is the cornerstone of cross-platform scalability. combined with css and other markup languages, it will drive the 'write once, use everywhere' objective. |
How so? This seems to more hype then practical reality. And remember, we are talking about webpages here being served to webbrowsers.
There are some edge cases where XHTML 1.0 is easier to use - it has been through a revision (XHTML 1.0 Second Edition) which corrected a few anomalies, for example adding an
id attribute on the
html element or (one thing I came across today) allowing a percentage value for the
cols attribute on a
textarea. XHTML syntax, which enforces the closing of all elements, also has a distinct advantage with complex data (or layout!) tables where missing end tags (optional in HTML4 so not picked up by the validator) could cause browser layout bugs.
Mostly there is very little practical difference, so the choice between HTML or XHTML syntax is down to personal preference. I tend to use HTML 4.01, but my largest site is based on XHTML 1.0 Transitional.