Does google notice validation?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does google notice validation?

No page in top 10 in search validates

texasville

11:16 pm on Nov 27, 2005 (gmt 0)

I have spent the time making sure that sites I create validate. I have heard for a long time that google counts validation as a sign of a quality site.
I became curious today about sites in sectors I watch. I ran a three word search "green widgets texas" and looked at each of the top 10's source code and ran them thru the w3c validator.
I was shocked at the results. Not ONE in the top 10 validates. Only 2 out of the top 10 even had a doc type. Most had a very high number of problems.
I am starting to wonder if google sees validation as seo work and discounts the page.
I wonder if I went back and removed my doc type if my sites would finally make it into the top 10.
Anyone have anything to dispute this? Any experience to show otherwise?

bears5122

11:14 pm on Nov 28, 2005 (gmt 0)

I've started recommending it only as a secondary task. If you're bored one rainy Sunday afternoon, get those pages validated.

Do I think it matters in the rankings right now? No. Do I think it will in the future? Maybe.

I never think it will be a big factor, but certainly a tie breaker. You have two sites that are essentially equal. Why wouldn't you take the one that is validated? It usually means it was made by a designer or someone who knows design. More than likely the design will look much better.

BigDave

11:29 pm on Nov 28, 2005 (gmt 0)

I would still like to see a sector where the top 10 is predominantly validated sites.

It ain't gunna happen. Since the majority of pages are never validated, then the majority of top ten results are not likely to be validated.

Let's see, I'll just try a few big names at random:
nobelprize.org - 2 errors
ford.com - 1 error
fordfound.org - 12 errors
microsoft.com - passed
google.com - 47 errors
msn.com - 2 errors
charlesschwab.com - 2 errors
toyota.com - 76 errors
drupal.org - 8 errors

Now, all these sites are able to get good rankings, but I doubt that it is because they do not validate. I suspect that they do not validate, because validation simply does not matter.

What matters is that you display properly in all the common browsers and that all the search engines can index you.

Think of it as the difference between using proper english, and the sort of english you use in your day to day speech. Proper english is not necessary to be understood, and in fact does little to help you be understood. Use proper english in rural kentucky and you will just confuse your listener.

texasville

3:37 am on Nov 29, 2005 (gmt 0)

So it seems my sites are oddities in that they validate. They also don't do well in google. Just wonder if they are seen as overop because they do validate.

BigDave

4:31 am on Nov 29, 2005 (gmt 0)

Just wonder if they are seen as overop because they do validate.

Well, you can certainly feel free to wonder all you want, but google simply isn't going to bother validating every page out there. Have you noticed how long it takes to validate a web page? Multiply that number by 20 billion.

There is simply no reason for them to validate your pages, and they certainly are not going to penalize you if your pages do validate. Hell, a lot of pages where people do not run them through a validation program still validate.

You have this one idea stuck in your mind, and you do not want to let go, because you will then have to figure out some other reason why you are not ranking as well as you want. This idea is simply wasting your time.

roodle

8:31 am on Nov 29, 2005 (gmt 0)

I think validation is one of those things where, once you find out there IS such a thing as valid code, you'll try to make valid pages. It's hard to deliberately NOT make vaild pages after that point. Even if it becomes an afterthought to check your code, you're still aware of this aspect of your pages and will naturally build them with that in mind (or I do at least!).

Wouldn't Google mention validation on their guide for webmasters if it were important to ranking? I'm sure they could quite easily provide a facility to check your pages on Google if that were the case, but they obviously haven't taken that path (yet?).

soapystar

9:08 am on Nov 29, 2005 (gmt 0)

i would be very surprised if an element of validation wasnt being used as a sign for quality. Not straight validation but a threshold or markjers for poor or good validation.

Patrick Taylor

9:46 am on Nov 29, 2005 (gmt 0)

There's a range of reasons why a page does not validate. It might be completely broken, or it might be missing a few alt tags. If a page is very badly broken it's possible that Googlebot may not read it properly and its ranking could be affected, but other than that, validation per se is unlikely to have any effect of the ranking of a page. I'm sure that Google does not have the processing resources to run pages through a validator even if they wanted to, and there is no reason why they would want to. Each page is the responsibility of its creator and not the search engine.

What I think there might be a case for, purely in terms of rankings, is pages with a high power to weight ratio, typically achieved with lightweight table-less construction and the placing of various elements of content in the most advantagous position in the page code. Add to that semantically correct markup with, for example, logically sequenced heading tags, or links that are actually crawlable, then it perhaps begins to add up to a ranking advantage.

It might also be that as time goes by the user begins to recognise a well-built page, adding credibility to the site as well as performance benefits. Also a validating and correctly structured page is likely to be more accessible to more people using different devices - to that extent I disagree with BigDave's comparison with proper English and localised English. At least proper English is universal.

texasville

2:11 pm on Nov 29, 2005 (gmt 0)

>>>>You have this one idea stuck in your mind, and you do not want to let go, because you will then have to figure out some other reason why you are not ranking as well as you want. This idea is simply wasting your time. <<<

It is just one possibility I am pursuing. Comparing my sites to my competitors. What is different and why do they rank better. And why mine isn't ranking well.

victor

2:14 pm on Nov 29, 2005 (gmt 0)

Do you have an analysis of which errors in HTML help their sites to rank better?

bumpski

3:18 pm on Nov 29, 2005 (gmt 0)

I have quite a few number 1 ranked pages for moderately competitive phrases. I'm frugal, and I know my pages don't validate.

I use Front Page 2000, it produces code that will not validate.

BUT
I can check all my links pointing off site in a couple minutes. (Which is more important, proper code that doesn't validate, or bad external links?)
Front page compiles pages on your server when you publish, drastically cutting FTP time (allows for rapid typo correction, which Google seems to penalize!).

Using an HTML generator (Dreamweaver, FP) is like using a "C" compiler, every once in a while you check the Assembly code for correctness and efficiency. The compiler will be more accurate in the long run.

If I choose to upgrade Frontpage I'm pretty sure the new version will revamp my content and produce validated code (which I deem very desireable)

BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.

I do think a page that validates would get a very very, minor "At a Boy" from Google, one of those tweaky 100, or 100's, of things determining SERP position.

In fact I'll probably use IFrames to ad new content to these number one pages (Silly, but it will validate!). Google appears to ignore IFrame content when unlinked (hurray). I take advantage of this to keep all none topic relevant content from impacting Ranking and Adsense Relevance. For example, copyright notice, ads, and other extraneuos non-relevant page content( used to even have Adsense code in an IFrame, till Google broke it).

I don't believe the standards actually mandate a Doc Type line either, but many validators stop if there is no Doc type.

netmeg

4:02 pm on Nov 29, 2005 (gmt 0)

(FrontPage 2003 - the current version - does not produce validating code by default)

BigDave

4:16 pm on Nov 29, 2005 (gmt 0)

At least proper English is universal.

Yes, it is universal, but it is not universally understood in areas where the local dialects are what is spoken.

The point of good communication is to be understood, not to be structually correct. It only needs to be correct enough for your audience.

So, if you take any list of websites that don't validate according to w3c, the majority of them will still validate with the test that counts, the user's browser.

On the other hand, you can have a page that validates just fine, but does a lousy job of communicating. It can validate as good HTML, but it could be positioning things very badly.

There is a value to validation tools, to look for obvious mistakes. Changing things that work perfectly well in a browser, just so you validate is like changing your choice of words because a spellchecker does not contain the word you are using.

texasville

4:28 pm on Nov 29, 2005 (gmt 0)

The one thing that they each have in common is they have no doc type. The w3c validator will still parse the page.
Many of the errors have to do with elements such as table height, margin height, and so on.
One returns this:
>>>Sorry, I am unable to validate this document because on line 196-197 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication<<<
Most are simple errors. One seems to just be alt tags not specified. (under-opped?)
Pretty much a mixed bag.

frup

4:36 pm on Nov 29, 2005 (gmt 0)

The need to "validate" is nothing but a myth posted and reposted by wannabe gurus who like to give advice that has absolutely no basis in reality.

Checking your code for un-intended hidden links or hidden text, now that I can see as a worthwhile effort. It's amazing what I find when I actually look at the HTML. ;-)

g1smd

4:42 pm on Nov 29, 2005 (gmt 0)

I have just supplied 35 pages of content to a site, and validated all the pages just before the final ZIP up and email. Validation and correction took less than 20 minutes.

I spotted one tag like > which just makes an extra > show up in the text, and I also spotted a couple of typos and spelling errors in the text while looking at the source code in the validator - errors that I had missed when looking at the source in the text editor.

I had a couple of places with "non SGML character #146 found" which is just Smart Quotes (quotes that curl in) instead of the usual ' or " quotes. That was easily fixed too.

Finally, I had a <h3>heading here<h3> error which initially threw 25 cascading errors in the validator. This was also easily fixed - the second tag should be closing the heading not opening a new one. This error might have had an impact on ranking, as it effectively tried to make the rest of the page a heading... there was no closing tag at all.

These are all very minor errors, but now that the pages are sent I know that I have done the best possible job, the pages will display OK, and they should be easily spidered and indexed. I will be surprised if they don't rank reasonably well. If they don't do as well as expected, then I know that I will only have to tweak the title, description, and body text, not play around with any HTML code at all.

pageoneresults

4:51 pm on Nov 29, 2005 (gmt 0)

The need to "validate" is nothing but a myth posted and reposted by wannabe gurus who like to give advice that has absolutely no basis in reality.

Thanks frup, you're on my list. ;)

roodle

4:52 pm on Nov 29, 2005 (gmt 0)

Yeah, I really think this is "clutching at straws". Can't see validity being a factor at all. I validate code 'cos I get a kick out of getting that nice yellow text saying "You've got vaild XHTML 1.0!". Sounds cynical, but I can't imagine many people with disabilities frequenting the websites of companies I work for (but who knows?). No-one else will benefit really.

Patrick Taylor

4:57 pm on Nov 29, 2005 (gmt 0)

Changing things that work perfectly well in a browser, just so you validate...

It depends on what you mean by "work perfectly well in a browser"... anyway we're not going to agree on this. It's a mindset thing and depends on one's priorities.

I think the case for working to web standards and accessibility standards is a strong one irrespective of search performance, but I agree that for the forseeable future Google is not going to take any notice of whether a page validates or not.

Incidentally BigDave, you might remember that a couple of years or more ago you were here berating Flash designers for using a hammer. I was building in Flash the time, but I thought about the arguments you put forward about accessibility etc - Chris_D as well - and I can report that even though Flash could be said to work perfectly well in a browser (with the plug-in), nowadays I never use it, partly because Google doesn't read .swf files but also because of the wider question of web standards.

I think if one were able to compare the search performance of two pages with exactly the same content: Page A is built with complex tables and masses of font tags and spacer images, and has its heading tags all over the place... and Page B is built lightweight with DIVS, CSS, and intelligently organised semantic markup - Page B would rank higher than Page A, it would be accessible to more people, and it would be more efficient to maintain.

To those who can't see the benefits, sail on...

BeeDeeDubbleU

5:07 pm on Nov 29, 2005 (gmt 0)

BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.

As I said earlier I had the same problem with an established (almost four years old), high ranking, but badly written site. I rewrote it using CSS and valid code and relaunched it the beginning of September. Since then (11 weeks) it has maintained all its positions - rock solid through Jagger et al. It has actually crept up further slightly on some pages so if my experience is anything to go by you would have no problem.

Johan007

5:21 pm on Nov 29, 2005 (gmt 0)

Make it valid and follow Accessiblity guidlines. It cant do bad.

pageoneresults

5:29 pm on Nov 29, 2005 (gmt 0)

BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.

When a spider indexes your page, it strips all html markup. It first has to traverse that markup and process it accordingly. If there are errors in the markup, it has to make a decision. Hopefully it is the right one. ;)

Taking a top performing page that does not validate and cleaning it up so it does validate usually does not harm the page in reference to rankings. In fact, my experience has shown that rankings will improve. Why? Because you've stipped all that junk that doesn't need to be there. You've provided a cleaner path for the spider and you've increased your text to html ratio which has to be good, don't you think?

texasville

5:58 pm on Nov 29, 2005 (gmt 0)

Well, it might be wrong but I am removing my alt tags from my pages. If it helps my pages to rise in google then so be it.
I, also believe not too many visually impaired will be visiting our site as our products aren't used by visually impaired people. If it is wrong, so be it. Blame it on google. If it doesn't help in two months, I will replace them.

BigDave

6:02 pm on Nov 29, 2005 (gmt 0)

Patrick,

My current position is based on the same reasoning that I used against flash back then. I am not against validation of pages, in fact I think it is a great way to catch errors and to remind you of accessibility issues. But the vast majority of the errors out there have little to do with such things.

In fact accessibility is a great example of how concentrating on validation does not help with communication. Lets just take a look at ALT attributes.

If you run validation on a page, and it tells you you need to add the alt attributes, you will add them quickly just to get it to validate. It is the validation you are worrying about instead of your end user.

On the other hand, if you care about making your site accessible, you can have great ALT attributes, and titles and all the rest, but you might use instead of . Do you wnat to know a secret? Every browser and addon out there, including those for the disabled, can deal with just fine. If they can't, it is the browser's fault because the browser is not following the standard that counts, the real world.

pageoneresults

6:08 pm on Nov 29, 2005 (gmt 0)

But you might use instead of . Do you wnat to know a secret? Every browser and addon out there, including those for the disabled, can deal with just fine.

They sure do. But, is a presentation element and will normally be treated that way. has semantic meaning and may be treated another way. They both render the same visually. Same goes for and . I know you knew that, just wanted to bring it to others attention. ;)

BillyS

6:33 pm on Nov 29, 2005 (gmt 0)

I validate all pages for my visitor's sake, not search engines... It's built into my normal process, but if I mess up (skip a step), it takes me less than 2 minutes to fix a page. I enjoy seeing the green at W3.

I also made sure my css was helping me in browsers. I was suprised at what I saw when I first started using Firefox and Opera... It's discouraging to write good content, then see it all centered instead of left justified.

bumpski

6:51 pm on Nov 29, 2005 (gmt 0)

For Number 1 ranked pages that do validate, has anyone still seen signs of these pages temporarily going URL only?

Or do you feel Google never produces fault indications of any kind on these pages once they validate?

I think Google is using a URL only listing for a number 1 ranked page as a way to filter scrapers, but I'd like to hear other reasons for URL only when one knows, at least from the server logs perspective, there has been no problem crawling that page and the overall content has not changed. One thing that has changed is byte count, (one byte for example, smaller always seems to delay reincluding URL only pages IMHO). Byte count might change due to a common border used on all pages, common CSS, etc, etc.

I'm certain this page will come back from URL only, most likely still number one, (actually it's still number 1 for it's keywords as URL only which is weird, by Google standards!), but a URL only listing really cuts into traffic, this is one hint of the fear of validation related changes.

I'd say I've seen large ranking drops for typo corrections, not misspellings but missuse of a word, herd versus heard, invariably when you review pages for validation you want to correct obvious typo problems as well and this can be a hazard for top ranking.

roodle

7:08 pm on Nov 29, 2005 (gmt 0)

For Number 1 ranked pages that do validate, has anyone still seen signs of these pages temporarily going URL only?

Nope. SOME pages within some of my validated sites are url-only, but only a handful, and of those some are under a 301 and G can't seem to get them out of its system.

bumpski

12:21 am on Nov 30, 2005 (gmt 0)

roodle

Make sure there is an absolute link to the old page path that Google can crawl at least 3 times. In some cases it appears this link must be in a different domain, to assure Google will take your redirect as permanent.

Google must crawl the old address at least 3 times! This literally can take months. You must differentiate between the HTTP 1.1 bot crawl and the HTTP 1.0 bot crawl. In the past I'd say make sure the old HTTP 1.0 bot saw the old path redirected 3 times, now it may require, both the new and old bot have both crawled uncompressed and GZIP compressed content, for Google to convince itself the 301 is permanent! The new bot does not have to see GZIPped content but possibly both old and new bot must see the redirect 3 times minimum.

roodle

10:52 am on Nov 30, 2005 (gmt 0)

now it may require, both the new and old bot have both crawled uncompressed and GZIP compressed content, for Google to convince itself the 301 is permanent!

Sorry, you've lost me there. Why would I have gzipped content on my site?

So would it be wise to set up an absolute link to the old pages from a site-wide main menu? I can't believe G hasn't picked up the redirect 3 times. It's strange. The "stragglers" were part of a set of pages that were all redirected at the same time, and 2 of them are still hanging about, while the others were dealt with quite quickly. "That is illogical captain"!

larryhatch

11:24 am on Nov 30, 2005 (gmt 0)

I just checked the short search-word for my niche.
I tried to validate the first 14 pages that came up in Google.

Only two passed, nsa.gov (yes, the cryptographic agency, in 9th place)
and my site which came up on page 2, in the #13 slot.

All others failed, some miserably. Number of errors ranged from 7 to 244.
I skipped ahead to page 18 of results, (#s 171 to 180)
and picked some site at random.
That one specified no character set, no doc-type .. and just scads of errors of course.

I still validate my pages, to avoid future problems and for bragging rights,
but it looks to me like Google couldn't care less one way or the other. -Larry

This 66 message thread spans 3 pages: 66