homepage Welcome to WebmasterWorld Guest from 54.166.105.24
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 51 message thread spans 2 pages: 51 ( [1] 2 > >     
Google Home Page has 67 Validation Errors!
What gives?
pageoneresults




msg:3244246
 4:16 pm on Feb 6, 2007 (gmt 0)

How can a page like www.google.com which contains a total of 13,805 bytes of information have 67 html errors? I just don't understand that. I can see a few here and there, but not 67, especially in relation to the minimal amount of code.

How can we expect Webmasters to take code quality seriously if the leader in search doesn't?

Hey Google, www.msn.com validates XHTML 1.0 Strict. And, www.msn.com weighs in at a whopping 231,558 bytes.

A challenge? Up for it? Wanna start some buzz and clean up a part of the web at the same time?

 

le_gber




msg:3244949
 9:13 am on Feb 7, 2007 (gmt 0)

It must be some kind of corporate decision to save on bandwidth.

Looking at it from the figures you gave, Google's page is almost 17 times 'lighter' than MSN's. This could mean big money if they were to make it valide code.

Although looking at their code a bit, they use font tags, inline css, and tables ... which is bad, bad, bad practice Big G!

As for the challenge, yes I am always up to get the web cleaner. What do we do? Start a website about getting them to validate. Something like, www.validgoogle.com or www.googleisnotvalid.com?

we might have some trouble with the trademark though

M_Bison




msg:3245086
 12:05 pm on Feb 7, 2007 (gmt 0)

looking at the errors, most of them are unescaped ampersands. To put & instead of &, when this error causes little to no problems would be silly considering the amount of traffic that google does.

pageoneresults




msg:3245087
 12:08 pm on Feb 7, 2007 (gmt 0)

Is it really that much of a performance issue to escape ampersands and quote attributes?

BillyS




msg:3245093
 12:14 pm on Feb 7, 2007 (gmt 0)

Try my msn instead and you'll see more like:

Failed validation, 129 errors

along with a

No DOCTYPE found!

mattg3




msg:3245162
 1:34 pm on Feb 7, 2007 (gmt 0)

It passes the "displays in a browser and is usable" validation, which is I guess all they care about.

Johan007




msg:3245234
 2:37 pm on Feb 7, 2007 (gmt 0)

Because it has been tested. Most valid sites are worse and do not bother to test on Unix, Mac etc...

Johan007




msg:3245235
 2:37 pm on Feb 7, 2007 (gmt 0)

Additionally Wc3 is only a recommendation ;)

Johan007




msg:3245236
 2:39 pm on Feb 7, 2007 (gmt 0)

If you really want too you can use this:
[labs.google.com...]

...and then report how it improved your user experience?!?!?!?

CainIV




msg:3245611
 8:10 pm on Feb 7, 2007 (gmt 0)

How can we expect Webmasters to take code quality seriously if the leader in search doesn't?

We can't and Google doesn't either. I can't direct us to the statement where Google emphasizes W3C validation for websites.

WC3 is a very useful TOOL - especially when there are ranking issues with a website (debuging html etc) however validation is not a prerequisite of high rankings from what I see (most of our competitors return way more serious errors than that.

pageoneresults




msg:3245632
 8:36 pm on Feb 7, 2007 (gmt 0)

However validation is not a prerequisite of high rankings.

I'll agree. But, it is a prerequisite in developing a solid foundation for a successful site, or at least from my perspective it is. A possible Signal of Quality although based on Google's lack of respect for html guidelines, I have to wonder if it even registers as a signal. ;)

I'm going to guess that the bulk of the errors reported are due to bandwidth savings, that is the only logical conclusion I can come to.

BigDave




msg:3245659
 9:07 pm on Feb 7, 2007 (gmt 0)

But, it is a prerequisite in developing a solid foundation for a successful site, or at least from my perspective it is.

Not from mine. Give me solid testing every day of the week over meeting a spec produced by a committee. Validating can be part of testing, but too many people think that it is the same as testing.

If you put <b> into an html 4 or xhtml document, it will not validate. But it will work on every browser that anyone uses and it will work on any browser for the foreseeable future.

On the other hand, I used a popular theme with a CMS that is xhtml compliant and all the pages validated perfectly. It looked great on firefox, mozilla and opera. In IE it produced a page, that was incredibly difficult to read.

Give me thoroughly tested over validated any day.

pageoneresults




msg:3245668
 9:18 pm on Feb 7, 2007 (gmt 0)

If you put <b> into an html 4 or xhtml document, it will not validate.

Yes it will. :)

On the other hand, I used a popular theme with a CMS that is xhtml compliant and all the pages validated perfectly. It looked great on firefox, mozilla and opera. In IE it produced a page, that was incredibly difficult to read.

Doesn't sound like an HTML/XHTML Validation issue. More of a CSS issue?

Give me thoroughly tested over validated any day.

Validate first, test second. Eliminate one part of the equation. If testing fails, at least you'll know its not because of invalid code. ;)

gpmgroup




msg:3245675
 9:30 pm on Feb 7, 2007 (gmt 0)

However validation is not a prerequisite of high rankings.

But, it is a prerequisite in developing a solid foundation for a successful site...

Prerequisite?

Hmmmm, It would be interesting to see such a site as that non validating home page has just generated $1.03bn net income in the last three months.

pageoneresults




msg:3245679
 9:36 pm on Feb 7, 2007 (gmt 0)

But, it is a prerequisite in developing a solid foundation for a successful site, or at least from my perspective it is.

Note the last part of my comment... ;)

pageoneresults




msg:3245752
 10:38 pm on Feb 7, 2007 (gmt 0)

Validate first, test second. Eliminate one part of the equation. If testing fails, at least you'll know its not because of invalid code.

I've lost count of the number of times I've been on the phone with a programmer and they can't figure out why certain things are not displaying the way they should. A quick validation on that page and you can easily see where the problem was. Simple to fix too.

I think it all comes down to awareness and due diligence. ;)

Swanson




msg:3245773
 10:48 pm on Feb 7, 2007 (gmt 0)

Who cares. I mean really given that most of the people who create websites aren't XHTML Vx.746474 compliant webmasters!

If it looks ok - then it probably is ok (pageoneresults, you mention when it needs a fix because it doesn't look right - well google home page looks fine to me)

If you give a #*$!X when it looks ok - then you probably need to get some help because you are suffering from OCD. (Thats Obsessive Compulsive Disorder for the non-OCD's out there!)

[edited by: Swanson at 10:51 pm (utc) on Feb. 7, 2007]

Swanson




msg:3245778
 10:49 pm on Feb 7, 2007 (gmt 0)

And stop checking home pages for compliance (could take up your life if you let it!)

OCD....

pageoneresults




msg:3245779
 10:51 pm on Feb 7, 2007 (gmt 0)

If it looks ok - then it probably is ok.

Hehehe. That's a rather broad assumption considering that many of us don't know exactly what the User Agent is going to interpret from our malformed code.

Beauty is only skin deep.

And stop checking home pages for compliance.

lol! I get paid to do this stuff. Its a nice little niche market with little to no competition, yet. ;)

Swanson




msg:3245785
 10:59 pm on Feb 7, 2007 (gmt 0)

Flippance aside, I would be interested if anyone has any evidence where non W3C validation has actually resulted in ranking problems.

I ask that as I run 400+ websites - and they suck as regards good markup but they do just fine in ALL search engines.

I don't even have a clue what XHTML is - I have no idea about CSS, or even good HTML. But the sites have good content, good links, good structure and render in all browsers.

Just thought that "cleaning up the web" might be a complete waste of time in that respect - I imagine many sites that "pass" the test look awful, are unuseable and provide no benefit to anyone?

Swanson




msg:3245786
 11:02 pm on Feb 7, 2007 (gmt 0)

Ha ha pageoneresults!

Good points about your experience - interested in what happens when there are validation problems (either real or perceived)!

Patrick Taylor




msg:3245810
 11:26 pm on Feb 7, 2007 (gmt 0)

It's the mindset that counts. Either your pages validate, or they don't and you know and accept the reason why. Testing pages for valid code is just good working practice, even if it's currently only a minor factor with Google, and from time to time it helps throw up an issue that might make a difference. The more webmasters there are testing their pages for valid code, the stiffer the overall competition.

My guess is that Google is not living in ignorance and it knows and accepts the reason why its pages don't validate.

tedster




msg:3245837
 12:01 am on Feb 8, 2007 (gmt 0)

evidence where non W3C validation has actually resulted in ranking problems.

Yes, especially where the mark-up was not well-fomed -- things like complex nesting errors or unclosed tags. Validation is the first place I turn to fix either display or ranking problems. It just takes ctl+alt+v in Opera, or one click in Homesite to check the W3C. Why not do it? I do not understand the common resistance to this very simple step.

The first time you find a missing < or " and put it back, and then see traffic return, you'll get the habit. These are only computers here. No matter how much error recovery is programmed into their routines, they still need something approaching a standard to work properly

pageoneresults




msg:3245854
 12:19 am on Feb 8, 2007 (gmt 0)

I really hate to put Google code slingers on the carpet here but, after reviewing their code closely, they are quoting some attributes and not others so I don't think it was a bandwidth consideration.

Google's home page has to be one of the simplest pages html wise. Take a look, there really is nothing to that page, it is extremely minimalist.

There is no reason for them not to validate other than lack of concern? Interest? What? Why not take the few seconds to quote the attributes and escape the ampersands? And, take a few more minutes and get rid of the 1990's coding practices, please.

You know what it is? That page hasn't changed much from when they first launched publicly. They're using the same code now that they were then. Just a little heavier. It looked cleaner back then. ;)

encyclo




msg:3245963
 3:07 am on Feb 8, 2007 (gmt 0)

OK, let's do some markup analysis. First comment is that you have to be careful what the validator is actually validating, because Google is using content-negotiation (for browser and charset) as well as IP delivery so what you see is not what you validate. Careful with page weight too, as Google uses gzip over the wire.

So I took for my example the plain google.com homepage when logged out, as viewed in Firefox 1.5 (because that's what I'm using). There are some differences between this version and the one "seen" by the validator, for example the character encoding.

The test version weighs in at 4617 bytes (uncompressed), and contains 63 validation errors. Because a large number of errors are repeats and doubles due to the unescaped ampersands, you can reduce this down to 30 actual errors. Break these down, and you get the following:

a) No DOCTYPE (1 occurence)
b) missing type attributes for script/style (3)
c) unquoted values (15)
d) topmargin/marginheight (1 occurence of each)
c) unescaped ampersands (7)
e) using name attribute on a span (1)
f) using nowrap on a div (1)

I took the markup and made the least number of changes possible to make the page validate. I used the doctype:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

as this preserves quirks mode and was required to handle the transitional markup of the page. I couldn't use HTML 4.0 Transitional (which would have saved one more byte) as this would trigger another error, and no earlier HTML version was appropriate for the markup used.

The resulting valid version weighs in at 4911 bytes. However, it was not 100% valid - one error remains, as it would involve altering the Javascript used on the page:

<span [b]name[/b]=more

which is referenced in the script as:

getElementsByName('more')

The difference between the original invalid version and my "almost valid" one is therefore 294 bytes.

If you remove the doctype (which was added for validation, but is not required in real use as the page is being served as quirks mode anyway), you can save 63 bytes. It is also possible to make minor changes to get rid of unnecessary markup and some line-breaks. The doctype-less and corrected version weighs in at 4767 bytes, leaving a meager difference of 150 bytes to (almost) validate the page - or a 3.1% increase in page-weight. Over the wire (rather than viewing locally), the gzip compression would reduce the 150 bytes to as little as one-fifth of that.

This admittedly very rapid analysis was done without actually looking at the design of the page itself. There remains a glut of font elements in the markup which can be replaced by CSS and probably save enough to cover the difference. And no, it's not for older browsers that font is still used, there is plenty of other more advanced CSS on used in the page beyond the simple font-size declarations. The one remaining validation error could probably be easily fixed with a script tweak (the details are beyond this overview).

So to summarize, you cannot in my opinion justify the invalid markup on bandwidth savings - even when you take into account the huge number of pageviews that page gets. The page appears to have been built and modified over time, and without a review of the whole page design and treating new parts differently than old ones - for example the new feature at the top right uses CSS whereas the footer uses font tags. Google would be better to take the page as a whole and review and recode, the result would probably undercut even the current version for size and could validate to boot.

encyclo




msg:3245982
 3:35 am on Feb 8, 2007 (gmt 0)

In terms of analysis, rather than comparing with msn.com we should really be comparing with [live.com...] - as this is more the MS equivalent of the Google.com homepage. The validator shows 56 errors:

[validator.w3.org...]

So MS aren't really any better than Google in this respect, although their markup is very different in style (XHTML strict).

g1smd




msg:3246275
 1:03 pm on Feb 8, 2007 (gmt 0)

Nice analysis encyclo. I can see that the page size could be cut by a few percent by using more CSS.

The amount of work for the change might be quite large. That root page is replicated across countless machines on 800+ IP addresses, and that page is available in dozens of languages.

I think that the "minor change" would probably keep someone occupied for several weeks; and I suppose that it would have to go through several review stages and a signing off procedure before being propogated to live servers.

My guess is that they have "more important things to do", and that this job is of minor importance compared to many others. However, maybe someone might like to tackle it in their "10% time"?

g1smd




msg:3246300
 1:26 pm on Feb 8, 2007 (gmt 0)

>> I would be interested if anyone has any evidence where non W3C validation has actually resulted in ranking problems. <<

To turn that around; every site that I have ever tidied up the code on has increased in ranking a few days or weeks after the changes.

I have seen a number of cases where a missing quote on a link, or a missing closing </head> or opening <body> tag (or multiples of those tags) has stopped a page being properly indexed.

Don't mention [google.com...] right now. Bearing in mind that the title and meta description are very important, do you think any of those sites are affected? I do.

pageoneresults




msg:3246480
 4:06 pm on Feb 8, 2007 (gmt 0)

Bearing in mind that the title and meta description are very important, do you think any of those sites are affected? I do.

Nah, the general consensus round' here is that it doesn't matter. As long as the visitor can see and use your website, that is all that matters. There is no need to write valid code. None, whatsoever. Google is a fine example of that. If they can do it, why can't I?

I'm off to break a few pages just so I can feel good about myself and get the "new me" started.

BigDave




msg:3246570
 5:08 pm on Feb 8, 2007 (gmt 0)

There is a difference between writing broken code, which doesn't validate, and working code that doesn't validate. I think that we can all agree that the first is bad. You should always close your quotes and your elements.

There is a difference between an error that causes parsing problems, and one that simply violates some spec, but doesn't cause any problems. Quoting attributes is one of these issues. Forgetting to quote "rowspan=2" is not going to cause any decent rendering engine or spider any real or imagined problems. Forgetting to quote a "style=" which is full of punctuation and spaces might be parseable, with a ridiculous amount of work, but that is likely to introduce other errors.

I've seen broken code get fixed and had it lead to improved ranking. I have also seen things "cleaned up" so they improve in ranking. I have not seen working-but-doesn't-validate code get "fixed to validate" and improve in ranking.

I also hope those of you discussing going with CSS, aren't discussing external style sheets reducing bytes. You have to remember the bytes for the additional file request, not t mention the latency and the server processing time and the additional server load for just a few bytes of gain.

This 51 message thread spans 2 pages: 51 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved