Forum Moderators: Robert Charlton & goodroi
Does Google give a hoot as to whether your site code is "valid"?
My first inclination is to say no, it is not probably a big deal to Google one way or the other. The evidence is that there are a group of 4-5 sites that I know of that by all accounts have knocked the ball out of the park, SEO-wise on Google. They are ranked in the top five (often #1) for every imaginable keyword that applies to their vertical and I’m talking about hyper-competitive verticals – mortgage, auto, online pharmacy, etc. When I test these sites using the Validation tool, they were without exception LOADED with errors.
On the other hand, the site with the W3C validator tool is a Google PR10. I’ve honestly never seen another site other than Google that was a PR10. In fact I thought it was kind of like a cute little inside joke over there at Google that they were the only 10 on the entire net. Yahoo, MSN, etc are all 9’s. The fact that these guys chalked up a 10 presumably means that Google loves them. The question is do they love the site, or do they love so-called “validated” HTML?
Validating is a good discipline, and learning not to write "cowboy code" makes a much more spiderable and indexable site altogether. This does not mean that the occasional non-standard attribute in the mark-up is somehow a black mark. Clearly, it isn't. Just look at almost any SERP.
No.
A random walk shows that less than one half of one percent of sites that rank 1-10 have valid code.
Tedster, I hate that 'cowboy' bit. All the cowboys I know work their a$$es off and strive to do their best. Can we replace 'cowboys' with 'suits'? ; )
As for the validator being a PR10, that's hardly surprising since loads of webmasters add the little symbol as a link.
Kaled.
But if the invalid code includes bad navigation, that can affect your site's spidering, and so deeper pages may get excluded.
For me, the point of validation is to get the site looking good in as many browsers as possible, and a discipline on [my] sloppy coding - an exercise well worth the effort.
The validator site is probably PR10 because they encourage people who use the tool to put a wee badge on every validating page which links back to the validator tool. Result: millions of backlinks.
---
this is probably exactly why it is PR10.
as for other pr10's, macromedia is, adobe is etc etc.
i wouldn't think google gives an advantage to validated code - their mission is to organise the world's information, so as long as a page is readable to the human eye (hence the rule against text color), it can be classed as information.
your average member of the public internet user couldn't give a monkeys about valid code or a logo in the bottom corner declaring the fact - as long as they can read the information in a presentable way and get out of the site what they visited it for in the first place i wouldn't say it was an issue at all.
but we never quite know with the big g do we...
In most cases, absolutely not. The exception is bad syntax rather than bad tags, etc.
I saw one case of an unclosed quoted string in the <head> that caused the bot to ignore the body. At that level, yes it does matter. But occasional unsupported tags or missing img alt tags - Google doesn't seem to care.
So validate to spot the gross stuff like straight syntax errors. But I know of a home page doing very well indeed with its chosen keywords (mail order baby clothing - a competitive field) and 492 errors on validation.
is it possible that with a standard in place and possible additional extensions that we could cut down on the scrapers, spamers, etc etc if webmasters were "required" to follow the standard?
If it were possible, I assume G would pay more attention to the standard since it would help them, too.
Just a thought
492 errors on validation.
Of course (as you say) they have to be the right 492 validation errors. Another site with just one validation error may be completely invisible to Google.
It seems silly to me for webmasters to try to guess which validation errors are neutral and which are dangerous.....After all such a list may vary by spider edition and search engine operator.
If you spend time adding bugs to a webpage then you have no guarantee that the spiders will forgive the mistakes in the way you intended.
So it seems stupid to either spend extra time when coding to add random bugs, or to have purchased/acquired HTML generating tools that do that automatically.
Use tools that produce valid code and you need never worry about the issue.
A random walk shows that less than one half of one percent of sites that rank 1-10 have valid code.
A random walk may possibly show that less than 1/100th of one percent of sites validate period.
A more precise test would be to locate those sites that validate and see where they stand overall against competing sites that do not validate. I'd also be looking at depth of indexing, quality of indexing, etc.
And then, there are so many other factors that come into play that the statistics would be somewhat meaningless just from a "Valid HTML" perspective.
My own belief? Two pages exactly identical with all things being equal except valid code. The valid site will win. But, that's just my opinion and I'm sure most know that I'm a strong supporter of valid code. ;)
However,it most probably depends what your errors are, being that some errors may very well stop the bots from indexing..
Could someone provide some examples that may stop the bots from spidering?
[edited by: jomaxx at 5:06 pm (utc) on June 21, 2006]
Validation is a tool, a sanity check. You can make some assumptions as to which validation errors will be of no consequence (unknown attributes, for example), but for other errors you can't be sure if there wil be no influence or a detrimental effect. Validation helps avoid such pitfalls.
Formal validators are in general too strict for measuring such problems, as a page can fail for tiny errors and there is no leeway. A less formal syntax/well-formedness check would identify many more documents which are invalid in a technical sense but are structurally sound.
The same people you see complaining about issues with their sites more than likely have code that is a mess.
All of mine validates as well. I learned my lesson the hard way and now I am reaping the benefits over my competition.
Also, if you look at the websites that are compliant via the backlink command, most of the sites have good internal page rank, something a lot of people struggle with.
We took a neglected old site that was poorly laid out and did not validate well, re-designed it , added a little more content, validated it, gave it a high level of accessibility and made sure everything was working well.
Result?
More traffic per day now than it used to get per month!
It took just a few days for Googlebot to go 4 levels deep and seee all of the pages. (as you will see from other posts, Yahoo is a different story).
I would also say that MSN takes notice of well formed pages.
As a final note to the sceptics, a page that is 100% correct in terms of mark-up has a much better chance of being correctly understood by a spider. Why take chances? We all know that correct coding can still mean nice layouts. I suppose the taks of changing current sites may be too much for some but let's all make sure that new projects are correct.
The character encoding specified in the HTTP header (utf-8) is different from the value in the <meta> element (windows-1252). I will use the value from the HTTP header (utf-8) for this validation.
This page is not Valid HTML 4.01 Strict!
Would someone know what the Doctype would be for a coldfusion site?
As it says in the error message, the character encoding declared as the server default, is different to the one that you have actually declared in the meta tag for it. Change the declaration in the meta tag to match the HTTP header, and make sure that you save the pages using the same encoding (i.e. UTF-8).
You choose the charset for your document, and you must save your documents in that charset. If your content is in English, there is little difference between UTF-8, ISO-8859-1 and windows-1252 within the range of characters you will mostly be using, but the problem you describe is a server misconfiguration. The server should not be setting a default charset unless all the documents on the server use the same character encoding, and once set in a HTTP header, you cannot override the default with a meta element.
Having said all that, however, this problem is extremely unlikely to have any influence on the ranking or indexing of a site.