Welcome to WebmasterWorld Guest from

Forum Moderators: mademetop

Message Too Old, No Replies

Valid HTML and Search Engine Signals of Quality

Does Valid HTML Belong in the SEO Toolkit?

4:37 pm on Sep 7, 2007 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
votes: 492

Creating Valid HTML is a good practice because it future-proofs your website to display well for standards compliant browsers of the future, and it makes it easy to revise and correct, when neat and tidy.

However there seems to be disagreement on whether Valid HTML is necessary for Search Engine Optimization, for promoting your site to the search engines.

On one side there is the belief that modern bots are engineered to wade through bad code to get to the content, otherwise vast amounts of quality content would be left unranked. As a consequence, because search engines do not validate websites, valid code does not send a positive signal, nor is it generally necessary for properly indexing a site. Yes, it is within the realm of possibilities for absolutely horrid code throw off a bot, but that generally isn't happening with today's smarter bots.

On the other side, some people state that valid code leads to better and smoother indexing and as a consequence, higher rankings.

What do you say?

6:06 am on Sept 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 4, 2002
votes: 0

Two thoughts:

1. some malformed HTML will cause some spiders to miss some content (or misinterpret some content, skip some links, etc).

Unless you know precisely what affect those malformed tags have on the spiders you care about, best not to deploy them.

2. indexing the content a spider has retrieved is a huge, high speed, operation. Broadly and crudely speaking, the stages are:

a. parse it to find text and links
b1. pass the links and anchor text to the spider for further retrieval
b2. pass the text to the indexer for indexing.

The first bottleneck is the parse. So there is likely to be more than one parser: a simple high speed one that rips as much content and links as it can. Then a slower, more precise one that handles the cases that the first-pass parser rejects.

That way, the spiders and indexers are being fed as fast as possible.

But a minority of pages (those that trip the simple parse) get put on the back burner for later handling.

So, in effect, some quality signals lead to slower indexing.

Do I know that for sure? NO.

But if I were building the backend to a search engine's indexer, it would be the approach I'd take. Most other approaches would slow things down too much.

Would I want to take the risk that malformed tags lead to slower indexing? NO.

Would you?

6:26 pm on Sept 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 14, 2002
votes: 0

Victor's model of two parsers complicates answering the original question. There are "errors" that validators complain about that no sane browser or spider will have any difficulty dealing with. The most frequent one (in my case) is "invalid" characters in a URL. Any browser or spider must be able to understand unescaped ampersands and spaces in a URL.

To those who view "validity" as a theological issue any invalid construct is evil; for those whose approach is pragmatic I think it is essential to discuss which invalid constructs need to be fixed and which can safely be ignored.

For what it's worth, I attempt to write valid XHTML 1.0-strict (but advertise it on the server as HTML 4.1-strict). However, when I find a lot of pasted URLs with multiple ampersands in them I tend to leave them as is.

[edited by: Mohamed_E at 6:28 pm (utc) on Sep. 8, 2007]


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members