Forum Moderators: open
I'm running an e-commerce site that is currently not indexed on Google and the only problem I can see is that the client has used invalid HTML in allot of his product decriptions and text. Would this prevent him from getting indexed?
Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones.
Valid code is a good thing, there are many reasons to validate code, but making Googlebot happy is not one of them.
Googleguy:
The only data point I'd add is Eric Brewer's '96 paper that mentioned 40% of pages have actual errors in the pages.
and:Embarassing that major sites don't validate:
[webmasterworld.com...]
Any parser which is designed to run on the entire Web must handle a huge array of possible errors
The key term is huge range of errors not every possible error.
If your write HTML that contains errors you absolutely positively need to test that your errors are not ones that trip up important parsers -- or make the site unviewable for the human editors at important directories.
Having said that, most of the guys on this site are hair-trigger on any issue that affects Google inclusion or ranking and if there was a measurable effect for common HTML errors, I'm sure someone would have started a very long thread about it. So you could better prioritize your time by looking elsewhere first.
However, we don't know what changes happen month-to-month in the google parser, so bad HMTL is risking being one of the sites whose parsing (and thus inclusion or ranking) is delayed or aborted -- low though that risk may be. Also we don't have a definitive list of the hugh array of errors that the google parser will handle. It remains possible that your errors are tripping one of its weakspots.