Forum Moderators: open
One thing that the spiders can't do is parse XHTML when served with the MIME type
application/xhtml+xml or application/xml - however Internet Explorer can't read such files either, so content served this way is rare. If you are doing it, make sure you are serving text/html by default. For absolute maximum compatibility you can use HTML 4.01 Strict. However, there is no real reason why you can't use XHTML 1.0 syntax if you prefer it. As BlobFisk says, if you are using a strict XHTML DTD, the separation of style from content will make your XHTML page much easier to parse for a spider than a "tag soup" HTML page.
For absolute maximum compatibility you can use HTML 4.01 Strict. However, there is no real reason why you can't use XHTML 1.0 syntax if you prefer it. As BlobFisk says, if you are using a strict XHTML DTD, the separation of style from content will make your XHTML page much easier to parse for a spider than a "tag soup" HTML page.
If your pages might need to be parsed by something else, however, you should be with XHTML all the way; XML is far easier to parse than SGML.
A quick example: You might be catering for a customized browser (perhaps put out on your company intranet) that offers different views of the same data using built in XSLT transformations.
id attribute on the html element or (one thing I came across today) allowing a percentage value for the cols attribute on a textarea. XHTML syntax, which enforces the closing of all elements, also has a distinct advantage with complex data (or layout!) tables where missing end tags (optional in HTML4 so not picked up by the validator) could cause browser layout bugs. Mostly there is very little practical difference, so the choice between HTML or XHTML syntax is down to personal preference. I tend to use HTML 4.01, but my largest site is based on XHTML 1.0 Transitional.