Search Engine Validation Discussion

Forum Moderators: open

Message Too Old, No Replies

Search Engine Validation Discussion

brotherhood of LAN

3:24 pm on Jun 29, 2002 (gmt 0)

Discussion of Search Engine Validation Results Chart [webmasterworld.com]

wow....the table says it all.

Google better not be determining ranking with code validation [webmasterworld.com] after looking at that ;)

Fair enough SE's should have ethics, but hypocrisy will never be a good line to spin :)

victor

4:18 pm on Jun 29, 2002 (gmt 0)

Thanks for doing the work and producing the chart.

Just for interest and consistency, Brett: what service/application did you use to do the validations?

I checked a few with CSE HTML Validator, and got results consistent with yours, though not identical.

The HTML quality control on some of these sites is mind-numbingly stupid. And they can't blame all of the errors on paring down for speed: there are things like stray close tags with no preceding open. D'oh!

papabaer

7:26 pm on Jun 29, 2002 (gmt 0)

Great post Brett! Very enlightening - depressing too...

I'm not even going to comment on the number of errors returned by M$N - that would be just too easy!

On the whole, regarding your results, and the question of the possiblity of search engines rewarding pages with validating code higher rankings [webmasterworld.com] over similar content pages fraught with errors: I wonder if it's kinda like havin' a 350lb., deep-fried-food eatin', three packs of cigarettes a day Cardiologist scolding patients for not taking better care of themselves?

Axacta

7:39 pm on Jun 29, 2002 (gmt 0)

I remember reading somewhere that DMOZ editors check sites for validation, and use it as part of their judging criteria. Any DMOZ editors care to comment on this?

tedster

7:47 pm on Jun 29, 2002 (gmt 0)

One of these big boys is so bad that Opera can't render the homepage - and it's been that way for weeks. Their code is full of non-breaking spaces with an extra ampersand. (&-&-nbsp; !)

Another has Javascript rollovers but the buggy code doesn't pre-cache the images.

I wouldn't accept pages like these from any team member I work with.

brotherhood of LAN

7:58 pm on Jun 29, 2002 (gmt 0)

You would wonder how much work it would involve these "big boys" to validate?

I mean...home page and results page....is that not just two pages or do they have to rip everything down to the foundations just to validate?

dcheney

8:12 pm on Jun 29, 2002 (gmt 0)

It wouldn't take any more work than the rest of us spend on our sites to make them validate!

Xoc

8:14 pm on Jun 29, 2002 (gmt 0)

The AOL home page doesn't even have the <title> tag within the <head> section!!! Duh. I have a hard time explaining that.

It does show how any idiot can do better html than the largest "media" company in the world.

papabaer

8:26 pm on Jun 29, 2002 (gmt 0)

Great examples to follow... NOT!

There has always been a segment of Web developers who insist on writing valid code; the numbers appear to be increasing. I suspect with the growing awareness of Web Standards, and accessibility issues finally coming into the forefront, that at some point, even the "big boys" will heed the call. That WOULD set a great example...

victor

8:41 pm on Jun 29, 2002 (gmt 0)

Axacta:
I remember reading somewhere that DMOZ editors check sites for validation, and use it as part of their judging criteria. Any DMOZ editors care to comment on this?

I'm a DMOZ editor, and it is not part of my brief.

If a site doesn't display in my usual working environment (Windows/Opera, all plug-ins and Java disabled), I will go back and take another look (I have Mozilla, IE, Amaya and Lynx to try).

But that other look may not be for a week or two. So at the very least, the submission gets delayed.

I've never excluded a site for rendering too poorly to be of use. But I wouldn't rule it out. If it can't be read, it can hardly be useful content.

Brett_Tabke

9:17 pm on Jun 29, 2002 (gmt 0)

I updated the tables with two new metrics: K size, and Errors Per K.

Search terms used for serp reports was "mp3 rippers". W3C validator was used.

kujanomiko

10:34 pm on Jun 29, 2002 (gmt 0)

Axacta:

No, ODP editors do not take code validation into consideration in a site unless it hinders navigation to the point where it is impossible. I've run into sites poorly coded that I couldn't find the navigational links so I couldn't review the site, so I couldn't accept it. :(

Also, I rarely edit with Flash and Java turned on, simply because of my only internet connection choice, 24k dialup. In that case, I leave the site in unreviewed for another editor to look at, which may delay the inclusion, but it isn't rejected.

Axacta

12:47 am on Jun 30, 2002 (gmt 0)

Interesting. Did Teoma have two different teams designing their Home Page and their SERP's page? The home page is tied for smallest of all SE home pages, with the third largest errors per k, while their SERP's page is the largest of all by a large margin, with the third least errors per k of info.

The MSN team must have been all "designers" and no "coders". :)

(Actually, now that I take a closer look, maybe it was all management and no designers or coders.)

brotherhood of LAN

1:17 am on Jun 30, 2002 (gmt 0)

The ultimate question for me is....do they hand code? :) With those errors maybe they are "closet frontpage" fans...

Its worth noting that the chance of them making an error is between 1 and 0.05%, and that DMOZ has code that is around 20 TIMES more compliant.

Also for the SERP's table to the right...we might also want to take into account the number of SERP's displayed....due to some errors maybe getting repeated in the SERP?

Axacta

1:27 am on Jun 30, 2002 (gmt 0)

>With those errors maybe they are "closet frontpage" fans...<

That's a low blow - I use FrontPage, and my site validates. :)

Brett_Tabke

8:32 am on Jun 30, 2002 (gmt 0)

I believe MSN does that to exploit problems with other browsers.

As Tedster said, there is one in the first group that doesn't even work with some browsers. That's just sad.

The one big improvement and surprise was Altavista. A couple years ago, they had close to 1000 errors on their serps.

I know, many of them will argue that they don't go for strict w3c validation because of bandwidth concerns. However, if you take time to look at some of the serp code, you'll find a great deal of gratuitous code. They seem to be able to afford bandwidth for js rollovers, exits, and other bells - they can certainly afford the bandwidth for valid code.

So what is it? Why can't they take the time to validate?

If you present valid code, you get assured browser compatibility regardless of the browser specifics. All browser developers test their browsers with valid code from the w3c core tests. If you can produce validated code, then no matter what browser connects, you have a higher chance of putting something on their screen they can read.

If you don't validate the code, you could be witting off 1-5% of your visitors who use nonstandard browsers. Those 1-5% can pay for a great deal of bandwidth.

I recently listened to a long (75 min) presentation from a search engine engineer in charge of large data centers for a major search engine. During that discussion, he stressed over and over KISS: Keep It Simple Stupid, and redundancy redundancy redundancy. They want to keep the system as simple as possible and always have at least two paths for data for when something fails.

Furthermore, that same search engine operates on open source tools, has contributed open source code, and has made many remarks about the number of PHD's per square foot of office space.

I feel that, presenting a page that wouldn't pass a 7th grade HTML class pop quiz, severely damages their tech credentials.

tbear

11:34 am on Jun 30, 2002 (gmt 0)

Just out of interest I checked this: http://www.webmasterworld.com/forum21/2662.htm [validator.w3.org]
(don't know why I bothered :) )
Warning: No Character Encoding detected! To assure correct validation, processing, and display, it is important that the character encoding is properly labeled. Further explanations.
Below are the results of attempting to parse this document with an SGML parser.

No errors found!
This document would validate as the document type specified if you updated it to match the Options used.

Crazy_Fool

12:57 pm on Jun 30, 2002 (gmt 0)

>>I feel that, presenting a page that wouldn't
>>pass a 7th grade HTML class pop quiz, severely
>>damages their tech credentials.

well i just checked the html 4.01 validation on google homepage, and the errors are minor. although there are pieces of code that do not validate according to w3c specs, will these actually cause any problems to users?

i don't have time to go through all the errors displayed, so i'll pick just one - quotes around attribute values. for example, google uses <body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onLoad=sf()>, yet w3c specs say they must use quotes around the attribue values, ie bgcolor="#ffffff".

how many browsers will reject this non-validated code just because of the lack of quotes? can anyone name just one browser that will choke on this code?

even if a browser cannot cope with something like bgcolor=#ffffff because of the lack of quotes, then it will simply default to it's normal background color. because google has been consistent in it's failure to use quotes, the same browsers will also use default colors for text, links and so on. therefore, users of poor browsers will still be able to view the page, just not in google's desired colors.

so, in order to provide validated content on just this one small piece of code, google would need to use quotes around not one, but 5 attributes. that makes 10 extra quotes (characters) per page view.

according to the google zeitgeist for 2001 (http://www.google.com/press/zeitgeist2001.html ), there were more than 150 million queries per day. you don't need a PhD to work out that for google to validate just that one small piece of code, they would need to deliver 1500 million extra characters of information per day. how many gigabytes is that? i've kinda run out of fingers ...

bear in mind that this is just for the one piece of code. try a search that produces 10 results. how many missing quotes around attributes in that page? multiply that by 150 million, work out how many gigabytes of information that becomes and add it to the above.

if you can find out how much bandwidth costs in bulk, you can get a pretty good idea how much they save by stripping out a few irrelevant characters. don't forget that delivering less content also speeds up delivery of content - sure, a few quotes won't make much on one page, but on 150 million page views it's got to make one hell of a difference.

brett, i reckon some people deserve far more credit than you give them.

Brett_Tabke

1:10 pm on Jun 30, 2002 (gmt 0)

Thanks tbear, I kicked back on character encoding. I've been messing with unicode here the last week and had it off.

>non-validated code just because of the lack of quotes

No one knows. A problem could be caused by combined or accumulated minor errors (see: MSN.com).

Either way, that's the not the point. The core of it is if some of the top tech centers on the internet can't follow the agreed upon standards, then they deserve no quarter and no respect for their technology. Their tech credentials are tainted and their credibility stretched.

> how many gigabytes of information that becomes and
> add it to the above.

They have several hundred bytes of trival deletable code on their serps. If it were a bandwidth issue, they wouldn't be using that code in the first place. Gratuitous css and js for what appears in the browser to be html 3.2? Not to mention the "cached" page distribution. Obviously, bandwidth isn't the issue. The only thing I can conclude, is it is either flat out laziness or disrespect for the agreed upon standards.

papabaer

2:34 pm on Jun 30, 2002 (gmt 0)

Validation isn't like a game of "horseshoes;" close doesn't count. You can't split hairs and say that the only errors present are trivial and don't really matter. A document either validates or it does not.

As Brett notes, the presence of gratuitous code negates the "bandwidth excuse." Not offering valid code make these sites look bad and definately does bring into question doubts about their tech credentials.

I wonder who among the group will be the first to offer valid code at some point? That would carry a lot of PR value - as in PUBLIC RELATIONS, if their marketing people were savvy enough to capitalize on the inferred credibility.

Crazy_Fool

3:28 pm on Jun 30, 2002 (gmt 0)

>>A document either validates or it does not.

validation for what purpose? validation for the purpose of validating? validation for the purpose of gaining credibility for creating validated code?

>>gratuitous css and js

sure, none of the css or javascript is absolutely necessary, but the css gives google it's look and the javascript performs simple functions that are useful to many people, like putting the cursor in the search box. this isn't exactly gratuitous, it can (and does) help some people. remember, half the people you meet are below average intelligence. you might think google would be idiot proof without the javascript, but no matter how hard you try to idiot proof something, there is always someone that will go one better.

>>who among the group will be the first to offer
>>valid code at some point? That would carry a
>>lot of PR value - as in PUBLIC RELATIONS

i doubt it. how many people actually know that google code doesn't validate to some w3c standard? how many people know what w3c is? how many people care?

the answer is, that outside the world of web experts, very few people know or understand things like this, and even fewer care. people go to google to search for something, to get relevant results quickly and efficiently. they get what they want then go away. thats it. thats all that matters.

victor

3:57 pm on Jun 30, 2002 (gmt 0)

Part of the ethos underpinning the Internet is:

Be liberal in what you accept; be strict in what you send"

(It's quoted in several RFCs).This ethos is part of why it is so easy for widely differing hosts and servers and agents and browsers and everything else scrabbling for a living on the Internet to intercommunicate.

But its taking unfair advantage to interpret it as:

I can be as sloppy as I like because you're got to be as liberal as possible.

One analogy is grammar and spelling. We all communicate better if we attempt to write our best, while not fussing too much over other's lapses.

bBUT. THaht dontmean u shld rite anywitchwatys juss cos im not cared much usually n.e.ways. :)

The effort to understand does not belong in the browser, even though browsers do sterling work here. The effort to communicate clearly according to agreed standards belongs squarely with the sites' producers.

ijan

6:14 am on Jul 1, 2002 (gmt 0)

The core of it is if some of the top tech centers on the internet can't follow the agreed upon standards, then they deserve no quarter and no respect for their technology. Their tech credentials are tainted and their credibility stretched.

Brett, please correct me if I got it wrong. I suppose you are talking about Google, without doubt the finest search engine as confirmed many times in these forums, and you believe that they deserve no respect for their technology just because "they can't follow the agreed upon standards"?!

They apparently don't use any quotes around color codes and other HTML attributes. So what? They seem to do this knowingly, to reduce the page size.

If you don't validate the code, you could be witting off 1-5% of your visitors who use nonstandard browsers. Those 1-5% can pay for a great deal of bandwidth.

Well, I am sure any modern browser can render the Google web site right. And there is a very good chance that even archaic browsers like Netscape 1.x can do it right. Because Google's code is simple and elegant. Just take a closer look.

As for other web sites, it all depends on the type of your audience. If your core audience believes that code validation is the most important thing in a web site then by all means make sure that your code validates. Otherwise I don't see any good reason to check every single quote. If it looks good in IE, Netscape and Opera (99% of all users) then why bother? If the rest 1% refuses to use a real browser, then it's their problem.

pageoneresults

6:42 am on Jul 1, 2002 (gmt 0)

My two pennies...

The web has been struggling for years to develop a set of standards that everyone should follow so that we can clean the mess up that has developed over the years, and there is a mess!

It would only be appropriate for the webs largest properties to validate their websites against the W3C standards and help educate the public on the benefits of validation.

What if Google decides to make validation part of the algo. I'll bet everyone's tune changes then. I figure I'd stay one step ahead of everyone else and validate now before its too late!

Its all about writing clean html, xhtml, css, js, etc... If it weren't for IE, we'd all be writing valid code. I've said this before and I'll say it again, NN4.x is one of the best website validators out there. If you can get it to work in NN4.x, it will work everywhere else!

And you know what, it feels really good to be able to advertise that you've validated. So what if many don't know about it. They will after you validate and advertise it! Let the voice be heard. Man, I sound like some sort of W3C groupie, huh?

The Internets major properties could change the web overnight if they started promoting validation! Think about it, if Google posted a W3C icon on their home page tomorrow, this community would go over the edge. There would be topics all over the place discussing validation. There would be a lot of sleepless nights for many while they worked round the clock to clean up their acts!

Either work towards validation now, or do it later when it may be too late. Just think, you'll be in the top 10% of developers/designers who are leading the way into the future. Someone's gotta lead and it might as well be us. I hate following, don't you?

pageoneresults

7:14 pm on Jul 1, 2002 (gmt 0)

In response to taking the lead, we've just validated our DRP's (Directory Results Pages). Thanks to mbauser2 and the tip on unescaped ampersands, we were able to validate 100%. Anyone care to join us? ;)

DMOZ.org ?
Looksmart.com ?
AllTheWeb.com ?
Altavista.com ?
Yahoo.com ?
Google.com ?
WiseNut.com ?
Lycos.com ?
Overture.com ?
Search.aol.com ?
Teoma.com ?
Hotbot.com ?
MSN.com ?

P.S. We know you are following the thread...

Crazy_Fool

8:01 pm on Jul 1, 2002 (gmt 0)

>>Just think, you'll be in the top 10% of
>>developers/designers who are leading the way
>>into the future.

leading the way into the future, or desperately clinging to a utopian fantasy ?

w3c standards are developed way too late - technology developers and browser authors have moved on a long long way by the time w3c create a standard for any new technology.

standards are set not by w3c, but by the technology and browser authors. w3c then come along and set their own "official" standards. new browser versions are then modified to incorporate the new "official" standards, but support for the original "unofficial" standards remains. browsers must continue to support old, non-w3c standards compliant code as to reject it would render the vast majority of the web useless.

until w3c get ahead of the game and work with developers and browser authors to create the standards together, w3c standards are worthless.

pageoneresults

8:22 pm on Jul 1, 2002 (gmt 0)

> standards are set not by w3c, but by the technology and browser authors.

If I'm not mistaken, I thought the W3C was the authority on web standards, not the browser authors.

> w3c then come along and set their own "official" standards.

I thought the W3C set the standards first and then the browser authors and web developers decided that it was easier to code this way because it was quicker for them and less time consuming to try and validate against the standards.

> new browser versions are then modified to incorporate the new "official" standards, but support for the original "unofficial" standards remains.

Aren't they usually modified after the fact when they find out that what they produced is not functioning the way it should across multiple platforms.

> browsers must continue to support old, non-w3c standards compliant code as to reject it would render the vast majority of the web useless.

That's because the W3C could never gain a solid footing with the standards. Non compliant code is rampant on the web and to try and clean up now may be too late. But, if the major players in the industry decide that it is now time to follow standards, what do we do?

Its unfortunate but I think the web is too far gone to establish a solid set of standards that are followed by all. Its a hodge podge of invalid code out there and people will just continue to justify their invalid code by saying they are producing what works. If that keeps up, then we will just keep seeing browsers that display web pages differently based on the standards they are following. Then all of us here discussing this will continue to produce hacks that make up for the non-compliant code that is being spewed forth by those justifying the non-compliant code works, 95% of the time! ;)

Crazy_Fool, no personal attack intended. I became a W3C groupie a couple of years ago and have since then, seen the light. Compliancy is going to become a major issue in the very near future. I just think we need to prepare ourselves for that shift and now is the best time!

mattur

8:27 pm on Jul 1, 2002 (gmt 0)

David Weinberger in the latest Joho newsletter, points out that amongst others, Linus Torvalds homepage is invalid.

Should Linus' homepage be bumped down serps or should he drop all the other *unimportant* stuff he does and learn correct html?

Weinberger goes on to relate the following joke:

A man goes to a doctor. "Doc, it hurts when I go like this," he says, poking himself gently in the foot with his index finger. "It hurts when I go like this," he says, poking his knee. "It hurts when I go like this," he says as he pokes his thigh. He proceeds the same way up to the top of his head.

"I see," says the doctor. "You've got a broken finger." :)

[hyperorg.com...]

pageoneresults

8:34 pm on Jul 1, 2002 (gmt 0)

I have one question...

What is so difficult about writing valid html?

And on top of that, as the web shifts gears and moves towards xhtml, you won't have a choice if you plan on making the transition to xhtml.

mattur

9:44 pm on Jul 1, 2002 (gmt 0)

I thought the W3C set the standards first

Very funny. In html 3.2, the w3c added "widely deployed features such as tables, applets, text flow around images, superscripts and subscripts"

What is so difficult about writing valid html?

For those of us who understand valid html/know there is such a thing as valid html there is nothing difficult/clever about it.

For the other 99.9percent of the world population... well that's another matter :)

I have no plans for moving to xhtml until there is a valid reason to do so. The only reason at the moment is to allow other folks to rip off your pages real easy... how often do you need to programatically access the content of Google's homepage?

This 89 message thread spans 3 pages: 89