Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does Google reward "valid" HTML

         

pauldmitri

10:33 pm on Jun 20, 2006 (gmt 0)

10+ Year Member



As many of you know the W3C consortium has established a set of standards for so-called Valid HTML. They offer a free tool that analyzes your site along these standards and alerts you to elements of your HTML code that differ from those standards -- in fact they even refer to them as Errors.

Does Google give a hoot as to whether your site code is "valid"?

My first inclination is to say no, it is not probably a big deal to Google one way or the other. The evidence is that there are a group of 4-5 sites that I know of that by all accounts have knocked the ball out of the park, SEO-wise on Google. They are ranked in the top five (often #1) for every imaginable keyword that applies to their vertical and I’m talking about hyper-competitive verticals – mortgage, auto, online pharmacy, etc. When I test these sites using the Validation tool, they were without exception LOADED with errors.

On the other hand, the site with the W3C validator tool is a Google PR10. I’ve honestly never seen another site other than Google that was a PR10. In fact I thought it was kind of like a cute little inside joke over there at Google that they were the only 10 on the entire net. Yahoo, MSN, etc are all 9’s. The fact that these guys chalked up a 10 presumably means that Google loves them. The question is do they love the site, or do they love so-called “validated” HTML?

tedster

2:08 am on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would say no, Google does not bestow special blessings on validated code. However, error free code stands a much better chance of being fully indexed -- because there is no need for "error recovery" routines that may or may not recover all the content that the author intended.

Validating is a good discipline, and learning not to write "cowboy code" makes a much more spiderable and indexable site altogether. This does not mean that the occasional non-standard attribute in the mark-up is somehow a black mark. Clearly, it isn't. Just look at almost any SERP.

digitalghost

2:12 am on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Does Google give a hoot as to whether your site code is "valid"

No.

A random walk shows that less than one half of one percent of sites that rank 1-10 have valid code.

Tedster, I hate that 'cowboy' bit. All the cowboys I know work their a$$es off and strive to do their best. Can we replace 'cowboys' with 'suits'? ; )

abates

2:12 am on Jun 21, 2006 (gmt 0)

10+ Year Member



The validator site is probably PR10 because they encourage people who use the tool to put a wee badge on every validating page which links back to the validator tool. Result: millions of backlinks.

SuddenlySara

2:46 am on Jun 21, 2006 (gmt 0)



Validate google...
They fail.

sandpetra

6:13 am on Jun 21, 2006 (gmt 0)

10+ Year Member



Nope!

kaled

9:56 am on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It would require additional effort to reward valid code. All any search engine is likely to worry about is whether the code is spiderable. Nesting errors, etc. may cause spiders to spit the dummy and give up.

As for the validator being a PR10, that's hardly surprising since loads of webmasters add the little symbol as a link.

Kaled.

Quadrille

10:30 am on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No.

But if the invalid code includes bad navigation, that can affect your site's spidering, and so deeper pages may get excluded.

For me, the point of validation is to get the site looking good in as many browsers as possible, and a discipline on [my] sloppy coding - an exercise well worth the effort.

bb_paul

11:48 am on Jun 21, 2006 (gmt 0)

10+ Year Member



---

The validator site is probably PR10 because they encourage people who use the tool to put a wee badge on every validating page which links back to the validator tool. Result: millions of backlinks.

---

this is probably exactly why it is PR10.

as for other pr10's, macromedia is, adobe is etc etc.

i wouldn't think google gives an advantage to validated code - their mission is to organise the world's information, so as long as a page is readable to the human eye (hence the rule against text color), it can be classed as information.

your average member of the public internet user couldn't give a monkeys about valid code or a logo in the bottom corner declaring the fact - as long as they can read the information in a presentable way and get out of the site what they visited it for in the first place i wouldn't say it was an issue at all.

but we never quite know with the big g do we...

Phil_Payne

1:26 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



> Does Google give a hoot as to whether your site code is "valid"?

In most cases, absolutely not. The exception is bad syntax rather than bad tags, etc.

I saw one case of an unclosed quoted string in the <head> that caused the bot to ignore the body. At that level, yes it does matter. But occasional unsupported tags or missing img alt tags - Google doesn't seem to care.

So validate to spot the gross stuff like straight syntax errors. But I know of a home page doing very well indeed with its chosen keywords (mail order baby clothing - a competitive field) and 492 errors on validation.

arnarn

3:45 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



maybe taking things a bit further on code validation.

is it possible that with a standard in place and possible additional extensions that we could cut down on the scrapers, spamers, etc etc if webmasters were "required" to follow the standard?

If it were possible, I assume G would pay more attention to the standard since it would help them, too.

Just a thought

victor

4:22 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



492 errors on validation.

Of course (as you say) they have to be the right 492 validation errors. Another site with just one validation error may be completely invisible to Google.

It seems silly to me for webmasters to try to guess which validation errors are neutral and which are dangerous.....After all such a list may vary by spider edition and search engine operator.

If you spend time adding bugs to a webpage then you have no guarantee that the spiders will forgive the mistakes in the way you intended.

So it seems stupid to either spend extra time when coding to add random bugs, or to have purchased/acquired HTML generating tools that do that automatically.

Use tools that produce valid code and you need never worry about the issue.

pageoneresults

4:26 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A random walk shows that less than one half of one percent of sites that rank 1-10 have valid code.

A random walk may possibly show that less than 1/100th of one percent of sites validate period.

A more precise test would be to locate those sites that validate and see where they stand overall against competing sites that do not validate. I'd also be looking at depth of indexing, quality of indexing, etc.

And then, there are so many other factors that come into play that the statistics would be somewhat meaningless just from a "Valid HTML" perspective.

My own belief? Two pages exactly identical with all things being equal except valid code. The valid site will win. But, that's just my opinion and I'm sure most know that I'm a strong supporter of valid code. ;)

F_Rose

4:51 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



I have checked all our competitors that are well listed on all search engines, and thier code does not validate..

However,it most probably depends what your errors are, being that some errors may very well stop the bots from indexing..

Could someone provide some examples that may stop the bots from spidering?

jomaxx

5:02 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nobody's code validates completely. Those HTML checkers are insane. I'm somewhat of a correct-code nazi myself and have tried to use them to detect important errors that can have real consequences such as wrongly defined tables, duplicate tags, tags closed in the wrong order etc., but I always get bogged down in pages of messages which are for all practical purposes irrelevant.

[edited by: jomaxx at 5:06 pm (utc) on June 21, 2006]

trinorthlighting

5:03 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is a very big deal if you do not have your href tags correct.

encyclo

5:11 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google cannot reward valid HTML as it does not validate the markup it parses. However, as other comments have mentioned, specific markup errors, in particular unclosed or mismatched tags can cause misunderstandings, meaning that some keyword-rich parts of the text or links can get passed over or ignored.

Validation is a tool, a sanity check. You can make some assumptions as to which validation errors will be of no consequence (unknown attributes, for example), but for other errors you can't be sure if there wil be no influence or a detrimental effect. Validation helps avoid such pitfalls.

Formal validators are in general too strict for measuring such problems, as a page can fail for tiny errors and there is no leeway. A less formal syntax/well-formedness check would identify many more documents which are invalid in a technical sense but are structurally sound.

cchooper

6:03 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



Nobody's code validates completely.

I disagree :)

This Page Is Valid HTML 4.01 Strict!

crobb305

6:14 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google.com doesn't even validate using the W3C validator. Everyone assumes perfect code is rewarded, yet the algorithms can't even accurately identify and remove spammy subdomains.

trinorthlighting

6:38 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They might not validate it, but if your site shows up bad using a mozzilla browser, how do you think mozzilla bot is going to look at it?

jrs_66

7:57 pm on Jun 21, 2006 (gmt 0)

10+ Year Member



--- Nobody's code validates completely.

I hate to toot my own horn, but all mine validates... there's really no reason not to keep your code clean.

g1smd

8:55 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see lots of sites with code that has tags like one of these:

<title ....... </title>
<title> ....... </title
<title> ....... /title>
<title> ....... <title>

and so on. Many of those sites are NOT properly indexed.

trinorthlighting

9:49 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, and most sites with clean code are fully indexed. It does not take much work to clean up code.

The same people you see complaining about issues with their sites more than likely have code that is a mess.

All of mine validates as well. I learned my lesson the hard way and now I am reaping the benefits over my competition.

Also, if you look at the websites that are compliant via the backlink command, most of the sites have good internal page rank, something a lot of people struggle with.

inbound

10:24 pm on Jun 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a real example from the last 2 weeks.

We took a neglected old site that was poorly laid out and did not validate well, re-designed it , added a little more content, validated it, gave it a high level of accessibility and made sure everything was working well.

Result?

More traffic per day now than it used to get per month!

It took just a few days for Googlebot to go 4 levels deep and seee all of the pages. (as you will see from other posts, Yahoo is a different story).

I would also say that MSN takes notice of well formed pages.

As a final note to the sceptics, a page that is 100% correct in terms of mark-up has a much better chance of being correctly understood by a spider. Why take chances? We all know that correct coding can still mean nice layouts. I suppose the taks of changing current sites may be too much for some but let's all make sure that new projects are correct.

Halfdeck

2:00 am on Jun 22, 2006 (gmt 0)

10+ Year Member



Here's an example where lack of HTML Validation caused problems in Google. I have about 10 blogs on Blogspot. 9 of them are listed correctly. One of them isn't. The only difference between the blogs is that with this one, I replaced an XHTML declaration with an HTML Transitional. Google snippetized "notify objectionable content" on this blog instead of on-page content, sending the entire blog into the supplemental index.

F_Rose

5:47 pm on Jun 22, 2006 (gmt 0)

10+ Year Member



When I try to validate our site the following error is coming up:

The character encoding specified in the HTTP header (utf-8) is different from the value in the <meta> element (windows-1252). I will use the value from the HTTP header (utf-8) for this validation.

This page is not Valid HTML 4.01 Strict!

Would someone know what the Doctype would be for a coldfusion site?

g1smd

6:00 pm on Jun 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nothing to do with the DOCTYPE at all.

As it says in the error message, the character encoding declared as the server default, is different to the one that you have actually declared in the meta tag for it. Change the declaration in the meta tag to match the HTTP header, and make sure that you save the pages using the same encoding (i.e. UTF-8).

F_Rose

6:10 pm on Jun 22, 2006 (gmt 0)

10+ Year Member



Thank you.

We have it set to: charset=windows-1252.

Who set the the charset?

Is it the server, webmaster?

F_Rose

6:31 pm on Jun 22, 2006 (gmt 0)

10+ Year Member



I actually removed from the source charset=windows-1252.

Just wondering could be a possible cause for Google bots not to index our full site?

encyclo

1:42 am on Jun 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is a short introduction to character encoding in this thread [webmasterworld.com] in the HTML and Browsers forum library - you might find it useful.

You choose the charset for your document, and you must save your documents in that charset. If your content is in English, there is little difference between UTF-8, ISO-8859-1 and windows-1252 within the range of characters you will mostly be using, but the problem you describe is a server misconfiguration. The server should not be setting a default charset unless all the documents on the server use the same character encoding, and once set in a HTTP header, you cannot override the default with a meta element.

Having said all that, however, this problem is extremely unlikely to have any influence on the ranking or indexing of a site.