Googlebot and HTML Validation

Forum Moderators: open

Message Too Old, No Replies

Googlebot and HTML Validation

Jasonn

1:10 pm on May 14, 2003 (gmt 0)

Just a quickie - would invalid HTML cause Googlebot to ignore pages?

I'm running an e-commerce site that is currently not indexed on Google and the only problem I can see is that the client has used invalid HTML in allot of his product decriptions and text. Would this prevent him from getting indexed?

anallawalla

1:26 pm on May 14, 2003 (gmt 0)

Not that I have seen. I had some pages written in 1994 and handcoded (how else back then?) using <HEADER> instead of <HEAD>. I have seen lots of sites with multiple TITLEs, typos in tag names etc -- all the ones I saw are spidered.

- Ash

Mohamed_E

1:26 pm on May 14, 2003 (gmt 0)

Only if the errors are so bad that Googlebot cannot find the content. If browsers can display the content Googlebot will find. It was designed to understand bad HTML, in the basic design of Google paper [www7.scu.edu.au] the authors write:

Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones.

Valid code is a good thing, there are many reasons to validate code, but making Googlebot happy is not one of them.

juniperwasting

2:10 pm on May 14, 2003 (gmt 0)

I have a similar problem. The e-commerce platform designers I work with put together a js menu system for some complicated option and dynamic pricing issues. The js works great, but if I try to validate, (DOCTYPE) my code the js blows to pieces. I know that google does index me, but I wonder if my placement on the SERPs is being effected by this problem.

Jasonn

2:23 pm on May 14, 2003 (gmt 0)

Many thanks for the replies, I was kind of hoping that invalid HTML would cause Googlebot to ignore pages as now I'm stumped :)

vitaplease

2:33 pm on May 14, 2003 (gmt 0)

[webmasterworld.com...]

Googleguy:

The only data point I'd add is Eric Brewer's '96 paper that mentioned 40% of pages have actual errors in the pages.

and:Embarassing that major sites don't validate:
[webmasterworld.com...]

victor

2:45 pm on May 14, 2003 (gmt 0)

Any parser which is designed to run on the entire Web must handle a huge array of possible errors

The key term is huge range of errors not every possible error.

If your write HTML that contains errors you absolutely positively need to test that your errors are not ones that trip up important parsers -- or make the site unviewable for the human editors at important directories.

Having said that, most of the guys on this site are hair-trigger on any issue that affects Google inclusion or ranking and if there was a measurable effect for common HTML errors, I'm sure someone would have started a very long thread about it. So you could better prioritize your time by looking elsewhere first.

However, we don't know what changes happen month-to-month in the google parser, so bad HMTL is risking being one of the sites whose parsing (and thus inclusion or ranking) is delayed or aborted -- low though that risk may be. Also we don't have a definitive list of the hugh array of errors that the google parser will handle. It remains possible that your errors are tripping one of its weakspots.

g1smd

7:07 pm on May 14, 2003 (gmt 0)

What is blown apart in your javascript?

If it is complaining about closing tags in document.write statements, then simply split them up or escape them:

'</a>' simply becomes '</' + 'a>' or '<\/a>'

Oaf357

8:07 pm on May 14, 2003 (gmt 0)

Brutally invalid HTML can stiffle googlebot.

A few errors here and there won't.

In my opinion googlebot's job is EASIER when the HTML is valid but it doesn't mean that googlebot won't visit and index an invalid page.