homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

This 38 message thread spans 2 pages: 38 ( [1] 2 > >     
My pages are now W3C validated. Will this help on the search engines?

 12:46 pm on Sep 18, 2004 (gmt 0)

Hello all: It was a tough job, but I have W3C validated most of my pages now. The validator error messages in "verbose" are somewhere between worthless and misleading.

I'm glad its nearly done for several reasons. I cleaned up several of my own markup messes for one thing.

Here's my question:

Might all this work have a positive impact on my rankings in Google and/or Yahoo Serps? Maybe some small advantage I would have missed by not validating?

Or, do G and Y frankly not care one way or the other?

Has anyone ran any experiments along these lines?

Your comments are eagerly awaited. While I'm waiting, I will try and "validate" some junk pages showing higher for my keyword(s). That should tell me something.

Best wishes - Larry

[edited by: tedster at 8:50 pm (utc) on Sep. 18, 2004]



 1:33 pm on Sep 18, 2004 (gmt 0)

No one knows if validation makes a direct difference in the SERPS, but it will help you indirectly. Valid code helps to make sure that users have a good experience on your site, which will improve your chances of getting links.

Case in point: I've been working with a partner to compile a directory for a certain industry. We've found hundreds of great sites, but there are a few that I'm flatly refusing to list because they broke so badly in my browser. They're missing out on a well-targeted, spider-friendly link that their competitors are getting. Over time that sort of thing will make a difference in the SERPs.


 3:33 pm on Sep 18, 2004 (gmt 0)

Hello Buckworks: Thanks for your response.

I think we are on the same wavelength, that valid code is the only way to go in the long term.

I have some results for my narrow niche, "ufo".

For now, I rank #21 out of a field of about 4.4 million which isn't bad. Some of those ahead of me definitely deserve to be there, others not, but that is a side issue.

Here is what I learned from Validating each and every one of the first 30 listings:

NONE of them validated at all.

The vast majority had declared NO DTD as in
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">.

The vast majority declared NO Character Set as in
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Most of the 30 did _neither_ of the above.

I lost track of minor errors, things like unopened <table> or unclosed </font> and a jillion similar things. One site had no less than 152 "minor" errors like that.

My site did not validate either, due to a "NOSCRIPT error" I cannot fix. (It is part of my borrowed hit counter code.)

All of the above and much else seems to indicate a great forgiveness on the part of the all seeing Google; a forgiveness which favors non-commercial sites like mine, at the expense perhaps, of tax-paying commercial enterprises?

Is it not written: If a swallow should fall from his/her nest, onto a computer keyboard, and thrash about a bit, a new home page is born?

Sorry for that.

Frankly, I think the G ignores picture perfect code because it still has something of an ethic that I value.

==> Cruddy code implies authentic grassroots origins <==

I'm going to keep on cleaning up my code because I can't stand the sight of digital mess.

Best wishes (and thanks for replying again) - Larry


 9:04 pm on Sep 18, 2004 (gmt 0)

Just like browsers, search engines use error recovery routines -- so that some common errors do not hurt you. But some definitely can.

Last year, there was one unique phrase (word1 word2) that I knew should rank - and it didn't. In fact, a quoted search on "word1 word2 word3 word4" brought NO results at all, even though Google showed the page on a site:example.com search.

I checked the page and a <p> was malformed, and written as <p with no closing bracket - probably a copy/paste error. Everything between that tag and the next <p> tag was evidently not in the index.

I fixed the mark-up and within 7 days, the two word phrase was #1 and bringing in traffic from the search engines.

Similar problems can definitely come from a missing close quote. Things like deprecated attributes and such have no real effect that I've ever seen. But when your markup is not well-formed -- there are real errors on the page -- then you can have sections of a page not indexed while the error recovery works to find something it knows how to work with farther down the page.


 11:30 pm on Sep 18, 2004 (gmt 0)

As i was probably one of the first guys to post the ridiculous statement that "SE's and bots like valid html" i feel i should post a few thoughts here.

Ridiculous? Well, not really, but do note that i can build a 100% validating page anytime, that ranks absolutely nowhere. That is: Valid markup does not cut it by itself, it takes a wee bit more than that to get rankings.

So, what's the deal?

The SE benefit of valid HTML explained

1) From code to pages:
By validating your documents (pages, and you should validate your links too, btw) you will be forced to correct errors. Although validating is a #*$! when you're not used to it, it will become increasingly simpler quite quickly - you simply learn-as-you-go, and make fewer mistakes along the way.

So, validating is actually the easy part, and tedster already posted one example of the benefit of this.

The really interesting benefit comes not so much from the validating itself, but from the mindset you will find yourself in, after valid markup becomes routine. You'll start by learning about the markup codes, and their specific use of course - quite probably you'll be able to trim your pages in the process, giving you faster download times and eliminating unnecessary code.

Then you'll find that "the codes" are specific elements of "a page" that all serve specific purposes, and, as you start working with these elements, and learn more about their intended use, you will start making changes to the way you create pages in the first place. This will lead to changes in the way you create web sites, and this will, again, lead to changes in the way your particular web site integrates with the rest of the web. At the end of this long road lies the real benefit.

The following points will illustrate this:

2) From Pages To Documents:
No, you won't have a web page anymore. In stead you'll have a structured document. A structured document has characteristics such as a title, headline, paragraphs, sections, and perhaps even sub-sections with sub-headlines.

Working with document structure is essential - although it's not as easy as validating (ie. there's no one-size-fits-all formula) you'll be able to set up your own rules (templates, even) for what types of content goes where, and when.

This might sound basic, but it's not: A well-structured document contains all the necessary information to easily determine what the document is really about, put in the right places. Determining what web pages are really about is the most important task for a search engine - do i need to say more?

3) From Websites To Documents:
No, you won't have a web site anymore. In stead you'll have a structured collection of structured documents. Yes, your whole web site becomes one big document that contains the individual pages. Think "book" - one big document with separate parts, hosting separate chapters, with separate sections, and ...pages.

Working with site structure is essential - although it's not as easy as validating, or structuring individual documents (ie. there's no one-size-fits-all formula) you'll be able to set up your own rules for what types of content goes where, and when.

Your task when structuring a set of documents is to make sure that each document is grouped with related documents, and can be reached from the relevant other documents, both on higher, same, and lower levels (ie. your internal link policy). In particular, you should make sure that it is easy to locate the most relevant pages on the particular subjects that your users will find interesting (yes, your money terms, or keywords). You do that by creating the right sections and sub-sections, and creating the right linking between those and the relevant pages.

Why is this good? Because search engines seek clues that your individual page is really about widgets, not only from the widget page itself, but also from the pages surrounding it. These clues can be anchor text, section names, urls, in- and outbound links, and a variety of other stuff.

4) From "The Web" To Documents:
No, your web site will not be "your website" anymore. In stead, it will be a structured document in a bigger, more-or-less-structured set of structured sets of documents, called the web. Think "library" - a collection of topics and sub-topics containing "books" with sections, chapters, and so on...

Working with web structure is essential - as it's not easy and you can't really do it totally on your own, you'll have to to set up your own rules for what types of content goes where, and when.

The task here is to make your book stand out as the right one for the chosen subject. A web site ("book") about "horses, printers, and railroads" will most likely not be thought of as an authority on any of these issues by any potential reader. Also, it will be quite hard for the librarians to find the proper shelf to put it on, so how should anyone potentially interested in one of these subjects ever find it?

Find your shelf for that book (yes, your niche for the site) and stick to it. Group with (link to homepage of) other books on the shelf or quote sections (deeplink). Get links back, not only from related sites, but also from relevant niche directories. If necessary, split the site up in separate sites about horses, trains, and whatever.

5) But, that wasn't W3C, that was SEO 101?
Exactly, and that's the beauty of it. As you get accustomed to working with the structural elements of your page (the HTML markup), sooner or later, you will "automagically" start to see the bigger picture. The W3C is working towards a thing called "the semantic web", which (apart from being fancy words) is a ruleset for a flexible, yet standardized way to show the intrinsic meaning of documents.

Each one of those HTML tags is there for a reason, it's no longer to format the look of your text (as many people thinks), it is to help you give an even better presentation of what your page is really about. Get it? Study those tags, and their meaning (ie. the intended use) and use them as intended.

Working "from the bottom up", so to speak, you will start by correcting errors, then you will get your individual pages more focused and on-topic, then you will do this to your site, and then your site will fit better into the relevant sections of that one great document known as the web, and hence, it will become easier to find you.

6) Yeah great, but that's what i do without valid html
You think you do. You will learn, eventually, that "doing the right things" will imply that you also "get things right". It's not about pixel-perfect design, CSS, tables, or about having one font or another; it's the mindset you will enter (with some routine) from the very beginning of putting a page together - you will be forced to think about the relation of one element on your page to another element, and the relation between this page and that page, and so on.

It's simply a framework that will enable you to build better pages, ie. pages that easier convey what they're all about. You can do this without valid html, but in the long run, you can't take the valid html route without doing this.

So, it's not the valid html that does it - it's the choices that validating will force you to make, and the understanding of the semantics of the web that you will gain in the process.

Here's a quote from The W3C Semantic Web page [w3.org]. Read this, and think about the concept of a "Search Engine" for a moment... exactly what is a SE except for a machine that tries to understand your data?

Facilities to put machine-understandable data on the Web are becoming a high priority for many communities. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. For the Web to scale, tomorrow's programs must be able to share and process data even when these programs have been designed totally independently. The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.

...valid html is only the first step.


 11:49 pm on Sep 18, 2004 (gmt 0)

My (limited) experience of it tells me that validating code makes no difference to your ranking unless it makes you close a " or a > as mentioned above.

Certainly in my keyword SERPs the top 10 sites have only 2 validating, 2 (If I remember correctly) without Doctypes at all, and the rest just have lots of errors.


 12:08 am on Sep 19, 2004 (gmt 0)

>> validating code makes no difference to your ranking

Exactly. A valid page can be beaten by another page, just like an invalid page can.


 2:27 am on Sep 19, 2004 (gmt 0)

sometimes I wonder whether nonvalidated pages are given a little bump for homegrown in G ... e.g. ebay is impossible to load with the debugger on ...


 2:48 am on Sep 19, 2004 (gmt 0)

Wow! Great post, claus. Nicely "structured". :)

My (literal) two cents:

When a document or set of structured documents meets the established standard, there is a much greater likelihood that a program which relies on the standard will be able to do its job.

Proper code = easier for spiders to parse as expected.


 2:55 am on Sep 19, 2004 (gmt 0)

Hi Larryhat

Has anyone ran any experiments along these lines?

I did the same thing, and completed it about 10 months ago. I can only emphasize to re-read the statements already made in this thread. My rankings have generally improved in that time, but not as a result of having validated pages - IMO. My pages were missing Doctypes or had no closing html tag, etc., thus no real indexing problem like the situation that Tedster noted.

It just felt good to have valid pages! That was a significant milestone in my journey to become a webmaster.

I've probably tripled the size of the site in the last year, and have no great enthusiasm to go thru each page and validate it again. But it should be done, and is probably easier done when the page is first created. Lazy and lazier, that's me, so now I have a chore to do.

When it's done, I expect that good feeling will come back again.


 4:58 am on Sep 19, 2004 (gmt 0)

and have no great enthusiasm to go thru each page and validate it again. But it should be done, and is probably easier done when the page is first created. Lazy and lazier, that's me,
Naw, you're not nearly lazy enough!

If you were really lazy, you would write your web pages as simple XML (make up your own custom schema, aimed at just exactly your particular content), then use some free, standard tool (like xsltproc) to automatically generate the HTML pages from the XML. (play your cards right, and it'll even strip out all the comments and extraneous spaces so you get some HTML compression for free).

Your own custom XML is much easier to get valid (only need 5 different tags? then only use 5 different tags!), the XSLT processor nearly guarantees you 100% valid HTML is spit out, and if you set it up right, you get the added benefit of making it easy to change your entire website HTML (e.g., switching to CSS) by just changing a single template.

XSLT is, of course, a pain in the butt to learn. However, a truly lazy person is happy to spend 5 hours on learning something new just to avoid doing 1 hour of boring rote work (like re-debugging HTML pages). IMO :-)


 5:49 am on Sep 19, 2004 (gmt 0)

My (limited) experience of it tells me that validating code makes no difference to your ranking unless it makes you close a " or a > as mentioned above.

That's a big problem. Without a fully-tested model of what errors cause which ranking difference (plus or minus), I'd be very wary of anyone advising me to deliberately write buggy code.

Anyone recommending deliberately writing buggy code (and that's anyone who recommends releasing non-validating HTML or CSS) needs to present such a model.

Otherwise, it is a lot of dangerous guessing.

[edited by: tedster at 5:52 am (utc) on Sep. 19, 2004]
[edit reason] fix typo [/edit]


 5:51 am on Sep 19, 2004 (gmt 0)

For me, another hidden benefit of learning about validation and well structured documents has been finding natural and appropriate ways to include keywords in my pages.

I remember a few years ago how I would stare at a page and try to figure out ways to boost the frequency of some keyword or other. But a well-structured and accessible document seems to have plenty of natural spots to include the right vocabulary words - lots better than the free form hodge-podge I used to create.

Not only that, but maintenance is a dream - even with a static site. If there are no "oddities" from page to page, then a global search and replace tool is incredibly effective.

All these plusses are pretty hard to sell to someone who is struggling for the first time with validation. But the cumulative effect of learning about valid mark-up has made my sites and my search engine traffic much better - of this I am certain. I'd also hazard a guess that fewer browser/OS combinations have trouble rendering my pages. That's a direct plus for my users.


 6:29 am on Sep 19, 2004 (gmt 0)

I think part of the reason many find validating difficult is the results one gets from the W3C validator. Some times it is difficult to understand what the problem is and how to fix it.

I personally use software that more or less is accurate to validate my template (just do a search and you can find some). Then I run it thru the W3C just to make sure. Finally after my site is up I use the software to check again with the added bonus of link checking.

Any little thing I can do to help a spider digest my content I gladly find time. And all of this talk of semantic web and documents and the like I think you are dead on.


 9:42 am on Sep 19, 2004 (gmt 0)

Stick with valid HTML4.x or XHTML1.0, served as text/html MIME type though.
Today's spiders can't handle the application/xhtml+xml MIME type required for correctly served XHTML1.1. I know to my cost, as my site effectively disappeared from the radar when using XHTML1.1/application/xhtml+xml.


 9:50 am on Sep 19, 2004 (gmt 0)

Thanks to all for your fine and relevant input on this thread.

I originally tried to W3C validate my pages about a year ago, and gave up due to the hopeless error messages.

One of the worst ones disallowed my placement of the
<Hx> header </Hx>. Error message was just off the wall, I want to say word-salad.

It turned out that I had declared a font type and size BEFORE the <Hx> header </Hx> and </fonted> AFTER the
header. Font sizes etc., if any, go INSIDE the header as in <H1><font size=3> UFO Widgets </font><H1>.

Other totally different errors cause the SAME EXACT and seemingly meaningless W3C error message, as I learned from browsing! Maybe that's why the error msg is so obtuse .. its trying to cover too wide a field.

I suppose its too much to ask for messages like:
" <Hx> declaration cannot be inside a font field .."

Now I'm going to validate the remaining messy pages that I should go overhaul anyway, and yes grandpa,
it gives me a good feeling too, to know that this is set straight at last.

Best wishes - Larry


 11:13 am on Sep 19, 2004 (gmt 0)

I use CSE HTML Validator.

Besides letting me work offline it offers messages like:

(line 37) The "h1" tag is contained in a "font" tag (which was opened in line 34). This may be acceptable for some browsers (such as Internet Explorer and Netscape), but HTML 4.01 does not allow this. It may also cause problems or unexpected page rendering for more compliant browsers such as Opera. Possible solutions: 1. Close the "font" tag before using the "h1" tag. Depending on the correct usage of the closed tag, you may be able to reopen it after using the "h1" tag; 2. Eliminate the "h1" tag; or 3. Reorder the "h1" and "font" tags.


 12:27 pm on Sep 19, 2004 (gmt 0)

Oh, if my competitors started to concentrate on valid html ;)

Unfortunaltely they become better and better SEOs...

...back to work - R


 1:43 pm on Sep 19, 2004 (gmt 0)

I recently developed a fetch & refresh spider in addition to an offline parsing app for a local search engine and their take was basically that outside the title tag that the valid html really didn't matter. All html was stripped and only text was staged for inclusion into the search index...


 4:01 pm on Sep 19, 2004 (gmt 0)

Great thread Larry. Awhile back I discovered the link between validation and seo 101 that tedster and claus mentioned and saw immediate improvement.

claus, thanks for chiming in on this and putting the whole thing in perspective.

<request>is it possible to have a 'claus' section where these nuggets can be easily mined?</request>


 4:02 pm on Sep 19, 2004 (gmt 0)

All html was stripped and only text was staged for inclusion into the search index...

That's a basic text match search -- like the earliest years of web search. The algorithms used today by G, Y!, Ask (and soon the new MSN) are more complex than that - storing, ranking and and combining various on-page factors.

Text match works for a site search because a single site tries NOT to compete inappropriately within its own pages. Start building search for the entire web and you've got "search engine persuasion" as a major factor to deal with - and basic text match is just too easy to influence.

I use a third party site search that goes beyond text match and allows me to tweak the search algo for various factors until the results work the way I want - so even with site search, well-formed code then becomes a factor.

I'd say the practical issue +for now+ is well-formed mark-up, and not 100% validated documents of a specific DTD. DO you want to include a target attribute on an HTML 4.01 strict page? That validation failure will not affect today's search engines one bit.


 5:43 pm on Sep 19, 2004 (gmt 0)

Official Klaus Section [google.com] ;)


I used to not worry about validation, relying on browser testing.

However, I have found producing valid code tends to produce consistent results (for the most part), which tends to reduce the testing time / trial and error when creating a template.

I create all my current templates by hand - no dreamweaver, no frontpage - just Ultraedit and IE/Mozilla for testing.

I have found that my code has been coming out cleaner and more streamlined than when I was wysiwyg editor reliant.

Rankings have been correspondingly positive.


 6:55 pm on Sep 19, 2004 (gmt 0)

I'd say the practical issue +for now+ is well-formed mark-up, and not 100% validated documents of a specific DTD.

Absolutely - this is certainly true of search engine bots, and it is also true for you real visitors.

Neither bots nor browsers are validating user agents, and so the only standard they follow internally are their own rules. The only difference the doctype choice makes in the browser itself is the quirks mode/standards mode rendering switch.

Documents which are not well-formed can create problems for all user agents. This article by Ian Hickson [ln.hixie.ch] of Opera Software shows how three different browsers handle improperly-nested markup. What's fascinating is how each browser handles the situation in a completely different way. The same is true for the search engine bots: Googlebot won't handle errors in the same way as Slurp, and neither will be the same as the browsers listed. How can you be sure how a bot is going to handle your markup if it isn't well-formed?

That's why you have to validate - not to clean up stray attributes, but to remove one big possible stumbling block to proper parsing of your documents - either by bots or end-users.


 12:08 am on Sep 20, 2004 (gmt 0)

Claus, I've just copied your post and printed it to "hang above my head". That's a STERLING keeper. I'm not sure I can explain why it hit me as such a revelation, but it did.



 8:28 am on Sep 20, 2004 (gmt 0)

D3mon said:

"Stick with valid HTML4.x or XHTML1.0, served as text/html MIME type though.
Today's spiders can't handle the application/xhtml+xml MIME type required for correctly served XHTML1.1. I know to my cost, as my site effectively disappeared from the radar when using XHTML1.1/application/xhtml+xml."

couldn't it be something more? because even though I do that (not that I am recommending doing it in any way), some of my pages wrongly served due to that are well positioned in Google. (months ago they started to disappear slowly due to a wrongly written robots.txt. After mending it, the things started to resurface).


 9:48 am on Sep 20, 2004 (gmt 0)

Send XHTML only as application/xhtml+xml when it is explicitely accepted. So IE will get text/html whereas FF will get the application/xhtml+xml.
Note that AdSense will not work in application/xhtml+xml.

RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml
RewriteCond %{HTTP_ACCEPT}!application/xhtml\+xml\s*;\s*q=0
RewriteCond %{REQUEST_URI} \.html$
RewriteCond %{THE_REQUEST} HTTP/1\.1
RewriteRule .* - "[T=application/xhtml+xml; charset=iso-8859-1]"


 10:48 am on Sep 20, 2004 (gmt 0)

I just validated my whole site - couldn't believe some of my errors lol.
I think validating helps a lot in.

1 backward compatability. or multi browser compatability.
2 DMOZ acceptance. should we start a new thread?
3 learning certain do's and donts of markup language.
eg dont nest a <P> element within a <FONT> element like I did. IE may have no problem with this but I imagine it can really mess up some others.
4 just plain - lets make the web a bit more sane.
5 googlebot may be able to handle errors but some of the other less savy ones may not be able to crawl your page over some stupid little thing like an improperly nested tag.
I have one q though. if I am using table data cells of a predefined height is it absolutely nessesary to have the height and width declared on all my images?
The reason I am asking is do I need to figure out the height on all my 300 images in my photo gallery? Is this important?

michael heraghty

 1:59 pm on Sep 20, 2004 (gmt 0)

I find it interesting that Google's pages don't use a doctype.

But don't get me wrong; I'm all for validation. (Nice post Claus.)


 2:05 pm on Sep 20, 2004 (gmt 0)

But I find it interesting that Google doesn't use a doctype.

True, but of course Google doesn't need to worry in the slightest about their own pages being indexed correctly.

The simple lack of a doctype won't make one iota of difference to the indexation of a page - but the document structure will, and validated pages are the safest route to ensure a coherent structure. So, you need the doctype to validate the page, and to switch rendering modes in the browsers to get it to display correctly to your users.


 6:07 pm on Sep 20, 2004 (gmt 0)

Yahoo does not validate.
Wired validates
Google does not validate.
Microsoft does not validate.
Ebay does not validate.
Amazon does not come close to validating.
Webmasterworld does not validate.
Blogger does not validate.

All these sites are not only popular but rank well. They are the big dogs, but what applies to them moreso applies to you. Your products and services are what counts and make your customers happy. Google has been saying all along to design your page for your users. Google itself does not validate, yet their website renders properly perhaps in more browsers than 99.9999% of all websites, because they designed their site for their users.

In a perfect world everything would go according to the W3C's specifications. In this world though, Microsoft is obliged to think in the interests of its shareholders and even Firefox does not follow all the specifications. Clearly validation at this point in the game is a silly rationale and purely an idealism. If you have the time it's a fun game, but it isn't going to drive a profit and I highly doubt the search engines are running validators through your page.


This 38 message thread spans 2 pages: 38 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved