Forum Moderators: Robert Charlton & goodroi
Do I think it matters in the rankings right now? No. Do I think it will in the future? Maybe.
I never think it will be a big factor, but certainly a tie breaker. You have two sites that are essentially equal. Why wouldn't you take the one that is validated? It usually means it was made by a designer or someone who knows design. More than likely the design will look much better.
I would still like to see a sector where the top 10 is predominantly validated sites.
It ain't gunna happen. Since the majority of pages are never validated, then the majority of top ten results are not likely to be validated.
Let's see, I'll just try a few big names at random:
nobelprize.org - 2 errors
ford.com - 1 error
fordfound.org - 12 errors
microsoft.com - passed
google.com - 47 errors
msn.com - 2 errors
charlesschwab.com - 2 errors
toyota.com - 76 errors
drupal.org - 8 errors
Now, all these sites are able to get good rankings, but I doubt that it is because they do not validate. I suspect that they do not validate, because validation simply does not matter.
What matters is that you display properly in all the common browsers and that all the search engines can index you.
Think of it as the difference between using proper english, and the sort of english you use in your day to day speech. Proper english is not necessary to be understood, and in fact does little to help you be understood. Use proper english in rural kentucky and you will just confuse your listener.
Just wonder if they are seen as overop because they do validate.
Well, you can certainly feel free to wonder all you want, but google simply isn't going to bother validating every page out there. Have you noticed how long it takes to validate a web page? Multiply that number by 20 billion.
There is simply no reason for them to validate your pages, and they certainly are not going to penalize you if your pages do validate. Hell, a lot of pages where people do not run them through a validation program still validate.
You have this one idea stuck in your mind, and you do not want to let go, because you will then have to figure out some other reason why you are not ranking as well as you want. This idea is simply wasting your time.
Wouldn't Google mention validation on their guide for webmasters if it were important to ranking? I'm sure they could quite easily provide a facility to check your pages on Google if that were the case, but they obviously haven't taken that path (yet?).
What I think there might be a case for, purely in terms of rankings, is pages with a high power to weight ratio, typically achieved with lightweight table-less construction and the placing of various elements of content in the most advantagous position in the page code. Add to that semantically correct markup with, for example, logically sequenced heading tags, or links that are actually crawlable, then it perhaps begins to add up to a ranking advantage.
It might also be that as time goes by the user begins to recognise a well-built page, adding credibility to the site as well as performance benefits. Also a validating and correctly structured page is likely to be more accessible to more people using different devices - to that extent I disagree with BigDave's comparison with proper English and localised English. At least proper English is universal.
It is just one possibility I am pursuing. Comparing my sites to my competitors. What is different and why do they rank better. And why mine isn't ranking well.
I use Front Page 2000, it produces code that will not validate.
BUT
I can check all my links pointing off site in a couple minutes. (Which is more important, proper code that doesn't validate, or bad external links?)
Front page compiles pages on your server when you publish, drastically cutting FTP time (allows for rapid typo correction, which Google seems to penalize!).
Using an HTML generator (Dreamweaver, FP) is like using a "C" compiler, every once in a while you check the Assembly code for correctness and efficiency. The compiler will be more accurate in the long run.
If I choose to upgrade Frontpage I'm pretty sure the new version will revamp my content and produce validated code (which I deem very desireable)
BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.
I do think a page that validates would get a very very, minor "At a Boy" from Google, one of those tweaky 100, or 100's, of things determining SERP position.
In fact I'll probably use IFrames to ad new content to these number one pages (Silly, but it will validate!). Google appears to ignore IFrame content when unlinked (hurray). I take advantage of this to keep all none topic relevant content from impacting Ranking and Adsense Relevance. For example, copyright notice, ads, and other extraneuos non-relevant page content( used to even have Adsense code in an IFrame, till Google broke it).
I don't believe the standards actually mandate a Doc Type line either, but many validators stop if there is no Doc type.
At least proper English is universal.
Yes, it is universal, but it is not universally understood in areas where the local dialects are what is spoken.
The point of good communication is to be understood, not to be structually correct. It only needs to be correct enough for your audience.
So, if you take any list of websites that don't validate according to w3c, the majority of them will still validate with the test that counts, the user's browser.
On the other hand, you can have a page that validates just fine, but does a lousy job of communicating. It can validate as good HTML, but it could be positioning things very badly.
There is a value to validation tools, to look for obvious mistakes. Changing things that work perfectly well in a browser, just so you validate is like changing your choice of words because a spellchecker does not contain the word you are using.
Checking your code for un-intended hidden links or hidden text, now that I can see as a worthwhile effort. It's amazing what I find when I actually look at the HTML. ;-)
I spotted one tag like <p>> which just makes an extra > show up in the text, and I also spotted a couple of typos and spelling errors in the text while looking at the source code in the validator - errors that I had missed when looking at the source in the text editor.
I had a couple of places with "non SGML character #146 found" which is just Smart Quotes (quotes that curl in) instead of the usual ' or " quotes. That was easily fixed too.
Finally, I had a <h3>heading here<h3> error which initially threw 25 cascading errors in the validator. This was also easily fixed - the second tag should be closing the heading not opening a new one. This error might have had an impact on ranking, as it effectively tried to make the rest of the page a heading... there was no closing tag at all.
These are all very minor errors, but now that the pages are sent I know that I have done the best possible job, the pages will display OK, and they should be easily spidered and indexed. I will be surprised if they don't rank reasonably well. If they don't do as well as expected, then I know that I will only have to tweak the title, description, and body text, not play around with any HTML code at all.
Changing things that work perfectly well in a browser, just so you validate...
It depends on what you mean by "work perfectly well in a browser"... anyway we're not going to agree on this. It's a mindset thing and depends on one's priorities.
I think the case for working to web standards and accessibility standards is a strong one irrespective of search performance, but I agree that for the forseeable future Google is not going to take any notice of whether a page validates or not.
Incidentally BigDave, you might remember that a couple of years or more ago you were here berating Flash designers for using a hammer. I was building in Flash the time, but I thought about the arguments you put forward about accessibility etc - Chris_D as well - and I can report that even though Flash could be said to work perfectly well in a browser (with the plug-in), nowadays I never use it, partly because Google doesn't read .swf files but also because of the wider question of web standards.
I think if one were able to compare the search performance of two pages with exactly the same content: Page A is built with complex tables and masses of font tags and spacer images, and has its heading tags all over the place... and Page B is built lightweight with DIVS, CSS, and intelligently organised semantic markup - Page B would rank higher than Page A, it would be accessible to more people, and it would be more efficient to maintain.
To those who can't see the benefits, sail on...
BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.
As I said earlier I had the same problem with an established (almost four years old), high ranking, but badly written site. I rewrote it using CSS and valid code and relaunched it the beginning of September. Since then (11 weeks) it has maintained all its positions - rock solid through Jagger et al. It has actually crept up further slightly on some pages so if my experience is anything to go by you would have no problem.
BUT, I'm scared to death to touch the content of number one pages in any way! That's what will keep code that doesn't validate on the Internet for a long time, the unstable way search engines respond to change.
When a spider indexes your page, it strips all html markup. It first has to traverse that markup and process it accordingly. If there are errors in the markup, it has to make a decision. Hopefully it is the right one. ;)
Taking a top performing page that does not validate and cleaning it up so it does validate usually does not harm the page in reference to rankings. In fact, my experience has shown that rankings will improve. Why? Because you've stipped all that junk that doesn't need to be there. You've provided a cleaner path for the spider and you've increased your text to html ratio which has to be good, don't you think?
My current position is based on the same reasoning that I used against flash back then. I am not against validation of pages, in fact I think it is a great way to catch errors and to remind you of accessibility issues. But the vast majority of the errors out there have little to do with such things.
In fact accessibility is a great example of how concentrating on validation does not help with communication. Lets just take a look at ALT attributes.
If you run validation on a page, and it tells you you need to add the alt attributes, you will add them quickly just to get it to validate. It is the validation you are worrying about instead of your end user.
On the other hand, if you care about making your site accessible, you can have great ALT attributes, and titles and all the rest, but you might use <b> instead of <strong>. Do you wnat to know a secret? Every browser and addon out there, including those for the disabled, can deal with <b> just fine. If they can't, it is the browser's fault because the browser is not following the standard that counts, the real world.
But you might use <b> instead of <strong>. Do you wnat to know a secret? Every browser and addon out there, including those for the disabled, can deal with <b> just fine.
They sure do. But, <b> is a presentation element and will normally be treated that way. <strong> has semantic meaning and may be treated another way. They both render the same visually. Same goes for <i> and <em>. I know you knew that, just wanted to bring it to others attention. ;)
I also made sure my css was helping me in browsers. I was suprised at what I saw when I first started using Firefox and Opera... It's discouraging to write good content, then see it all centered instead of left justified.
Or do you feel Google never produces fault indications of any kind on these pages once they validate?
I think Google is using a URL only listing for a number 1 ranked page as a way to filter scrapers, but I'd like to hear other reasons for URL only when one knows, at least from the server logs perspective, there has been no problem crawling that page and the overall content has not changed. One thing that has changed is byte count, (one byte for example, smaller always seems to delay reincluding URL only pages IMHO). Byte count might change due to a common border used on all pages, common CSS, etc, etc.
I'm certain this page will come back from URL only, most likely still number one, (actually it's still number 1 for it's keywords as URL only which is weird, by Google standards!), but a URL only listing really cuts into traffic, this is one hint of the fear of validation related changes.
I'd say I've seen large ranking drops for typo corrections, not misspellings but missuse of a word, herd versus heard, invariably when you review pages for validation you want to correct obvious typo problems as well and this can be a hazard for top ranking.
Make sure there is an absolute link to the old page path that Google can crawl at least 3 times. In some cases it appears this link must be in a different domain, to assure Google will take your redirect as permanent.
Google must crawl the old address at least 3 times! This literally can take months. You must differentiate between the HTTP 1.1 bot crawl and the HTTP 1.0 bot crawl. In the past I'd say make sure the old HTTP 1.0 bot saw the old path redirected 3 times, now it may require, both the new and old bot have both crawled uncompressed and GZIP compressed content, for Google to convince itself the 301 is permanent! The new bot does not have to see GZIPped content but possibly both old and new bot must see the redirect 3 times minimum.
now it may require, both the new and old bot have both crawled uncompressed and GZIP compressed content, for Google to convince itself the 301 is permanent!
Sorry, you've lost me there. Why would I have gzipped content on my site?
So would it be wise to set up an absolute link to the old pages from a site-wide main menu? I can't believe G hasn't picked up the redirect 3 times. It's strange. The "stragglers" were part of a set of pages that were all redirected at the same time, and 2 of them are still hanging about, while the others were dealt with quite quickly. "That is illogical captain"!
Only two passed, nsa.gov (yes, the cryptographic agency, in 9th place)
and my site which came up on page 2, in the #13 slot.
All others failed, some miserably. Number of errors ranged from 7 to 244.
I skipped ahead to page 18 of results, (#s 171 to 180)
and picked some site at random.
That one specified no character set, no doc-type .. and just scads of errors of course.
I still validate my pages, to avoid future problems and for bragging rights,
but it looks to me like Google couldn't care less one way or the other. -Larry