Everyone should keep in mind that this video is about Why Google Doesn't Validate and not about Why You Shouldn't Validate!
I woke up this morning to 15+ emails pertaining to this video, thanks Google. I think you're going to assist me in proving some points in the very near future.
I wrote an extensive article on 2009 September 08 on why I thought Google didn't validate. I read my article, listened to Matts video and it's almost as if it were the other way around. I mean, my article mimics what Matt says almost word for word. ;
I've watched the video multiple times and have transcribed it word for word so those who use it as a crutch can't get away with what they are about to do. That video is being shared amongst developers like candy right now. It's going to be the one thing they present before being handed their pink slip for failure to follow protocol and breaking the site. Again, don't even think about using this video as an excuse not to validate your own documents.
|Typically we've been a little more willing to say things like "oh, we don't need that double quote or something like that" or we'll specify a color in a way that doesn't validate. |
I've been coding now for over 10 years. For the life of me, I cannot find any reasons to specify a color in a way that doesn't validate for the web. I'm very familiar with the old school HTML syntax required for email marketing and such. One of these days they'll catch up.
The article I wrote on the 8th, takes Google's home page and dissects it byte for byte, well, almost. Since I don't know the intricacies of the dynamics, I can only guess at the solutions to existing malformed and invalid syntax. I looked at every single error and warning. I documented those and provided the fix for each one that was visible to me.
The very first thing that Matt states in the video is this...
|Google looks at the number of bytes that we actually return to users and we want that to be as small as possible because every byte matters when you are serving up hundreds of millions of search requests to users. |
I'm on board with that. I can just imagine the volume of bytes being served by Google. Here's the part that confuses me. If you are saving bytes at this level, what exactly happens when the UA has to process that invalid code?
|Idiosyncratic browsers. Worried about compatibility vs validation. |
I'm also on board with this. We all work hard to make sure our websites display properly across a variety of platforms and devices. I still don't see how that justifies some of the 1990s code practices at play here. You'd have to peel apart Google's home page to fully understand the extent of the coding syntax and what they are doing.
For example, the first 7 errors reported for Google are from the <body> Element...
<body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 topmargin=3 marginheight=3>
I remember having to include those attributes way back in the IE vs Netscape days. IE had their margin attributes and Netscape had theirs. That was during the old school browser wars.
If you look at the CSS Google is using and then look at the invalid markup they are using, it makes you wonder WHY they can't clean up some of this stuff. I fully understand the bytes issue. But the above can easily be handled via CSS with less bytes.
|The vast majority of pages on the web don't validate. |
That statement might have been true 5 years ago. That is no longer the case in many industries. Things are starting to shape up. Out of the 520 website home pages we are now monitoring, 13% or 68 of them are valid. If I would have performed this exercise 5 years ago, that number might be more like 3%. So, that above statement may still be somewhat valid but I don't think it is a viable excuse for not writing valid markup and following protocol.
|We have to crawl, index and return results on the web. Even if pages don't validate. |
That's the bottom line. And, we all know that the bottom line is usually the overriding factor. After building a crawler over the past few years, I fully understand the procedures involved with crawling and indexing. The number of error handling routines we have in place is pretty involved. And you know what? During our crawling and comparisons, sites with invalid code require a little more processing time on our end due to the error handling routines.
So, is Google stating that they don't care about valid code? That Webmasters can code however they wish and that the almighty Google/Googlebot will figure it out while crawling? Here's a question and one that will probably never be answered.
If I have two identical sites (twins) and one is valid, the other is invalid, which site performs better overall? That question will never be answered because in real life, it doesn't happen. There is no way to perform testing at this level. I've thought about it for years and it just isn't going to happen.
That's okay. I have many individuals right now who are interested in cleaning up their markup errors. In fact, I've been providing a few free consultations to various folks and assisting them in getting things in order. We have one person who has documented everything and will be monitoring through the rest of the year. Not to mention many others who have cleaned up and learned quite a bit in the process. ;)
I'm happy that Matt released this video as it explains why Google doesn't validate. Nowhere, and I do mean nowhere in that video does Matt explain why your site doesn't or shouldn't validate!