Welcome to WebmasterWorld Guest from 126.96.36.199
Forum Moderators: mademetop
joined:Apr 13, 2002
However there seems to be disagreement on whether Valid HTML is necessary for Search Engine Optimization, for promoting your site to the search engines.
On one side there is the belief that modern bots are engineered to wade through bad code to get to the content, otherwise vast amounts of quality content would be left unranked. As a consequence, because search engines do not validate websites, valid code does not send a positive signal, nor is it generally necessary for properly indexing a site. Yes, it is within the realm of possibilities for absolutely horrid code throw off a bot, but that generally isn't happening with today's smarter bots.
On the other side, some people state that valid code leads to better and smoother indexing and as a consequence, higher rankings.
What do you say?
1. some malformed HTML will cause some spiders to miss some content (or misinterpret some content, skip some links, etc).
Unless you know precisely what affect those malformed tags have on the spiders you care about, best not to deploy them.
2. indexing the content a spider has retrieved is a huge, high speed, operation. Broadly and crudely speaking, the stages are:
a. parse it to find text and links
b1. pass the links and anchor text to the spider for further retrieval
b2. pass the text to the indexer for indexing.
The first bottleneck is the parse. So there is likely to be more than one parser: a simple high speed one that rips as much content and links as it can. Then a slower, more precise one that handles the cases that the first-pass parser rejects.
That way, the spiders and indexers are being fed as fast as possible.
But a minority of pages (those that trip the simple parse) get put on the back burner for later handling.
So, in effect, some quality signals lead to slower indexing.
Do I know that for sure? NO.
But if I were building the backend to a search engine's indexer, it would be the approach I'd take. Most other approaches would slow things down too much.
Would I want to take the risk that malformed tags lead to slower indexing? NO.
To those who view "validity" as a theological issue any invalid construct is evil; for those whose approach is pragmatic I think it is essential to discuss which invalid constructs need to be fixed and which can safely be ignored.
For what it's worth, I attempt to write valid XHTML 1.0-strict (but advertise it on the server as HTML 4.1-strict). However, when I find a lot of pasted URLs with multiple ampersands in them I tend to leave them as is.
[edited by: Mohamed_E at 6:28 pm (utc) on Sep. 8, 2007]