Forum Moderators: phranque

Message Too Old, No Replies

html validator

         

paff3

1:22 pm on Jan 21, 2003 (gmt 0)

10+ Year Member



Html validator found some problems in my page that are hard to fix.Does it affect my position in search engines, if i dont fix those problems.

pendanticist

2:29 pm on Jan 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Html validator found some problems in my page that are hard to fix.

Could you briefly describe what those problems are?

Someone will surely assist you with validation and your question specifically.

It's my opinion that validation is perhaps the single most important item to have going for your site.

Pendanticist.

Mohamed_E

8:55 pm on Jan 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google (and, I assume, all search engines) realizes that we live in an imperfect world, with more HTML invalid than valid. So unless it is absolutely garbled Google will attempt to extract the information it contains. This was stated explicitly by Sergey Brin and Lawrence Page in their fundamental paper The Anatomy of a Large-Scale Hypertextual Web Search Engine [www7.scu.edu.au], in section 4.4 Indexing the Web:

Parsing -- Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones. For maximum speed, instead of using YACC to generate a CFG parser, we use flex to generate a lexical analyzer which we outfit with its own stack. Developing this parser which runs at a reasonable speed and is very robust involved a fair amount of work.

Italics added by me.

There are many reasons for validating HTML, pleasing Google is not one of them.

victor

9:41 pm on Jan 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google and other spiders/search engines are extremely tolerant of errors -- at its simplest, if they can approximately separate the tags from the text, they have someting to work on.

But being extremely error tolerant is not the same as guaranteeing they will correctly separate tag from text in every possible error condition.

As far as I know, no one has done any research into errors that trip up Google. But until they do so, it remains a risk that any given error may upset it -- or cause it to mistake text for tag and so only index part of the page.

I prefer not to take the risk with unvalidated HTML. And there are, as Mohamed_E says, many reasons for validating -- even if you have made a special effort to make errors that Google will handle.