|Does Google Index XHTML?|
Does googlebot choke on higher versions?
What version of HTML/XHTML is recomended? I believe it can handle all versions of HTML, but what about XHTML?
Google doesn't understand HTML in the strict SGML/XML sense, so with XHTML 1 being so similar to HTML 4 you shouldn't encounter problems. HTML version problems such as the extra ">" in "<BR />" wouldn't affect Google's index as far as I can imagine.
I have a large number of XHTML pages, albeit served as text/html, that are indexed and perform quite well.
Google also indexes documents served as application/xhtml+xml. However, many browsers, including IE, do not render pages served as application/xhtml+xml. In fact, google offers a "view as HTML" link for these pages to make it possible for the bulk of users to access these pages.
You can see examples of application/xhtml+xml pages in Google SERPs by searching for "XHTML 1.0 Strict as application/xhtml+xml"
I have been staying away from XHTML. Not that I dont believe google to index this, but for the (admittingly minor issue) that I want a file size as small as pos, so I get rid of quotes in my HTML as well. Getting rid of white space and scripts improve file size tremendously and quotes on the top of that is an extra saving.
Do people still experience that file size is an issue for google at all? I understand that 100KB plus is an issue (which I would not attempt anyway).
However, will the difference of let us say 25KB vs. 15KB HTML size make a (gradual) difference as well, or does google just stay away from the very big-file-size-offenders?
It's funny you are sticking with HTML for filesize. A lot of people are moving to XHTML + CSS and thereby reducing filesize quite a lot, by seperation of presentation and content. (XHTML is not required for CSS, but Browsers use doctype-switches, which will only render css in strict/standards mode with xhtml doctypes)
Anyway, I have several pages that are XHTML 1.0 transitional, XHTML strict 1.0 and XHTML 1.1. All are ranking fine, and there has not been any problem with google.
I agree with ruserious - doing the whole layout with CSS is a much more effective way of reducing file size than removing quotes and whitespaces. If I calculated correctly, you need to remove 1024 quotes to reduce the file size by one K. I'd say it's not worth the trouble!
Back to the original topic: Google doesn't seem to have any problems with XHTML. I have a site written entirely in XHTML 1.1 and it has several pages ranking extremely well on Google for a variety of keywords.
All my sites are XHTML1 Strict and they're getting indexed just fine by Google. I do add the extra-blank for older browsers, as always suggested (i.e. <br /> instead of <br/>), and I don't include the <?xml ...> because some older browsers display it.
Google doesn't validate your page, so it won't care about the document type (as long as you serve as HTML and not XML, it won't).
I find the issue of file-size to be of absolutely minor importance. Whether or not you quote attribute-values won't make any real difference. You get actual differences when it comes to strict separation of layout and content, using CSS (not that CSS would only be possible with XHTML -- HTML4 works just the same).
Anyway, you have to quote some attribute values even in "normal" HTML, those containing certain special characters, so it's much easier to just quote everything and not always think about whether or not this or that value needs quotation.
But you can use single quotes ' in XHTML, that should save some bytes... the double-quote " has twice as many pixels! Especially if you print the source your paper will weigh less, and you'll save on ink. OK OK...