Google Could Change the Web For the Better

Forum Moderators: open

Message Too Old, No Replies

Google Could Change the Web For the Better

If they could be bothered

ukgimp

8:06 am on Apr 30, 2004 (gmt 0)

No another one of my, “I don’t start Google threads” but a thought occurred to me this morning.

We all know the resources that Google have at their disposal and the resources soon to be coming their way. Well why don’t they make the web a standards compliant place with strong steps towards accessibility.

Here is how I see it, get their own house in order (valid code, at least some level of accessibility) and then start to reward webmasters with valid and accessible websites. The vast majority would do it in a flash. Of course people would moan, so what, they moan when there is an update, they moan when their site is not #1, it would just be a little extra moaning :).

Come on Google make the web a nice place to play. The same goes for your people over at Yahoo and you lot at MSN.

Cheers

IITian

9:09 pm on May 1, 2004 (gmt 0)

>> In its serps Google can add a small W3C image below each compliant page
>I wonder how much that would cost Google in extra bandwidth on an annual basis - a million dollars? 2 million?

200 million queries a day. Worst case scenario: W3C is 5K (in practice about 200 bytes for small image, 5-6 bytes if text) => 1000 GB every day.

For $8/month accounts one can get about 50GB bandwidths (on top of 2GB storage and other facilities.) => 20 * 8 = $160 /day. (Since G won't be using storage ... more likely if negotiated it will be about $20 per day.) => 366 * 160 < $60,000 a year.

In practice, about $1000 a year? ;)

g1smd

9:22 pm on May 1, 2004 (gmt 0)

Well, let me throw this into the mix:

The ODP has recently started listing external RSS feeds (and Atom too) in some categories. The RSS feed MUST be valid code, or it gets rejected by the editing tools.

The whole of the ODP should now also be 100% valid HTML 4.01 Transitional, and the whole lot is now using UTF-8 encoding throughout the 4 million site data URLs, titles, and descriptions (caveat: there are some known encoding errors in a few category descriptions and @links, and those are already being worked on).

PatrickDeese

9:37 pm on May 1, 2004 (gmt 0)

> In practice, about $1000 a year? ;)

Not arguing with your math, but are you multiplying that by 10 listings per page, plus, say a "greyed out" one if the site is not compliant.

Then there is the issue of increased server load per query, etc. and as someone else pointed out, the increased cost to google of running every page through a validator.

If it only took 1/100th of a second to run each page through a validator, that (according to my calc.exe) would take about 1.4 years to validate every page G claims to have in the index.

And this is supposed to cost $1000 per year?

Ya know, I think they could better spend their time on other projects. ;)

mbauser2

9:47 pm on May 1, 2004 (gmt 0)

In its serps Google can add a small W3C image below each compliant page.

Bah. Google removed the dmoz categories and descriptions from its listings because people weren't using them. Adding symbols that people won't use does not improve Google.

I'm probably been pro-standards longer than any of you, but I'm not budging on this one: Standards compliance is not a search-engine issue. Anything that boosts format over content (or even implies that a certain format is better than others) will have negative effect on the quality of search results. Google won't do that.

soapystar

9:59 pm on May 1, 2004 (gmt 0)

Google worked best when its primary goal was to find content relevant to the search. Personally i dont see how code that validates or doesnt validate would make it more likely or less likley to be what im looking for. As long as it displays :-)

IITian

10:46 pm on May 1, 2004 (gmt 0)

>>And this is supposed to cost $1000 per year?

The point I was trying to make was that it is not a question of money but of philosophy. Regarding computing time, I don't know how many computers Google has got (100,000+?) and therefore can't tell. Worst come worst some company like IBM, Dell, or HP must be dying to provide all the hardware and bandwidth just to get a brief mention somewhere on the Google site.

>Adding symbols that people won't use does not improve Google.

Personally I don't care much about the compliance of others' sites. I do it for my new sites because I don't want browsers to do something different from I had intended. W3C symbol might be quite useless for me, but symbols for whether it has any popups, time to load (as in Alexa display), has any weird music software or so on could be helpful to a surfer like me whose Windows OS has crashed far too many times. For example Google displays PDF for pdf files and it is very much welcome by me.

Providing certain information can help surfers.

royalelephant

10:59 pm on May 1, 2004 (gmt 0)

ukgimp. if you could figure out a way for G to make some $ from this cleanup op, then they'd love you.

victor

12:52 am on May 2, 2004 (gmt 0)

a way for G to make some $ from this cleanup op

Here's one way.

Alter the FAQ for webmasters to say something like:

"One factor in influencing positioning in SERPS is the well-formedness of a page's HTML. For tools that can produce or validate HTML and are approved by Google click [here] and [here] or [here]"

Google could then:

be an affiliate of the [here]s so they take a small cut on all tools sales that result

gouge the tool manufacturer USD10,000 to go through the approval process (with no guarantee of success, of course)

sell ad space on the FAQ page at a premium

Once webmasters believe that well-formed HTML can affect SERPS positioning, the stampede for tools or consultancy to fix HTML will start. And there's money to be made all round there.

Then they do the same with PDF, Flash and other formats too,

IITian

1:03 am on May 2, 2004 (gmt 0)

LOL victor. Would you consider working for Google? That is, if you are not already employed by Yahoo!. (If GatesBee lurns here for ideas, I can almost visualize him franatically taking notes.)

SlowMove

1:14 am on May 2, 2004 (gmt 0)

I wouldn't mind seeing something right next to the PageRank on the Google Toolbar that tells if the page in the browser window validates. It would make editing a lot easier as you could view the page and see if it validates at the same time.

eraldemukian

7:05 am on May 2, 2004 (gmt 0)

beyond validation this discussion touches on the fact that google was invented for the internet without google. The whole '1 link 1 vote' concept (over simplified by now, I know) only works really well, if people travel the web by links. Which they don't anymore. They use SEs. Whether Google wants it or not, it did changes the internet. So it might as well starting to do some good with this. Extending the WWW standard would not hurt. Location encoding in header keywords would be one of these things that google could do easy while nobody else couldn't.

digitalv

3:19 pm on May 2, 2004 (gmt 0)

I wouldn't mind seeing something right next to the PageRank on the Google Toolbar that tells if the page in the browser window validates. It would make editing a lot easier as you could view the page and see if it validates at the same time.

Nice idea, but again it's not really something that should be Google's responsibility. If all of you standards-junkies want something like this, you should get W3 to make you a toolbar not Google. The fact that this was even suggested (not the Toolbar thing, I mean the issue of Google giving a boost to validated sites) shows a lack of understanding on what users want out of a search engine. People expect RESULTS - if you can see it in your browser and the content matches what you were looking for, the search engine did their job. It really doesn't freaking matter whether the site you're looking at validates or not as long as you're getting what you want out of it.

Any search engine who would implement what was suggested would be doing a disservice to their searchers.

g1smd

5:54 pm on May 2, 2004 (gmt 0)

At [google.com...]

-- Check for broken links and correct HTML

could be expanded with a couple of dozen words about code validation.

Rewording required in:

-- Make sure that your TITLE and ALT tags are descriptive and accurate

Ugh. It should be: Make sure that your TITLE tag and ALT attributes are descriptive and accurate.

I didn't see the meta description mentioned there either.

SlowMove

10:55 pm on May 2, 2004 (gmt 0)

Fatal Error: No DOCTYPE specified!

That's the message w3.org gave on a serp page. I think that part of the problem is that when running a database driven site, the programming weighs in heavier than anything else. I've got a simple Perl site, and often my pages don't validate. I worry more about getting the content right and checking it against a few popular browsers to make sure it looks alright. If I ran a fast and complex search engine, I think I'd be preoccupied with things like trying to filter out million page plus spam hubs without knocking out the legitimate sites.

IITian

11:14 pm on May 2, 2004 (gmt 0)

In case of PDF files Google warns us by putting a PDF label next to the title and adding a line about file format. Should Google remove this because it is not Google's job? (Lots of system will crash, but who cares?)

europeforvisitors

1:24 am on May 3, 2004 (gmt 0)

In case of PDF files Google warns us by putting a PDF label next to the title and adding a line about file format. Should Google remove this because it is not Google's job?

PDF files are a different type of content from HTML files, and they aren't supported by standard browsers, so it makes sense for Google to identify them as PDF files.

For Google, the question is likely to be "What does the user want to know?" It's reasonable for Google to assume that users would like to know if a link points to a PDF file that requires a separate reader; it's less reasonable for Google to assume that users care about whether a page's HTML code passes a validator check.

SlowMove

1:44 am on May 3, 2004 (gmt 0)

Try checking a Google page with the w3.org validator and see what happens. Maybe the only thing that really matters is how well pages are displayed in browsers.

rfgdxm1

3:24 am on May 3, 2004 (gmt 0)

>Try checking a Google page with the w3.org validator and see what happens. Maybe the only thing that really matters is how well pages are displayed in browsers.

Interesting point. If Google itself isn't all that concerned whether their pages validate at w3.org, it would be hypocritical for them to consider that relevant for other sites.

a1call

5:43 am on May 3, 2004 (gmt 0)

Hi,
6000 years ago people lived without a centralized govenment. Think about it no taxes no big brother.
It had its advantages and disadvantages. WWW at this point in time has the same advantages without any of the disadvantages. Let's keep it that way.
The proposition in the original tread is for a level of govenment we could do without.
As long as a human browser can see a page, who cares if a software would valdate it or not. What purpose would that serve. In my oppinion making sure your pages are compliant are a waste of time. In fact I make sure that they don't not to give credibilty to standardization. Also this could hamper with creativity as the standard might not consider newer and better formats.

digitalv

3:24 pm on May 3, 2004 (gmt 0)

A1call,

I am with you 100%, man! I say to hell with the standards - the more people that go AGAINST them the better off the web will be.

BigDave

3:44 pm on May 3, 2004 (gmt 0)

Actually, I have always been entertained by the idea of "standard by committee". An awful lot of those "standards" are never adopted by the rank and file.

On the other hand, when the standard just documents common usage, it tends to be a good standard.

The problem with the web standards are that they tend to depreciate what is still in common usage.

They can decide all they want that , and are bad, but they are all easier to use than what replaced them. And every browser handles them properly, and they always will.

The standard that is defined by common usage is that thse tags really are still a part of html 4.

Now, if google could tell if a page was really broken, then it would be a good thing to ding it.

RFranzen

4:45 pm on May 3, 2004 (gmt 0)

Now, if google could tell if a page was really broken, then it would be a good thing to ding it.

That would be a lot easier than many people think. Look for any page with "charset=windows-1252" in a content-type meta tag, and they'd catch most of them.

The only bad part would be that the 12 people who spend time hand-fixing their FrontPage code would be penalized as well...

-- Rich (ducking back into the bushes)

g1smd

5:02 pm on May 3, 2004 (gmt 0)

>> Now, if google could tell if a page was really broken, then it would be a good thing to ding it. <<

OK then, let's not mark up pages that validate with a "validated" marker, but instead flag up those that have either a gross amount of errors, or major HTML nesting errors, missing tags, or malformed tags instead.

hutcheson

5:04 pm on May 3, 2004 (gmt 0)

Rfranzen, that's not quite fair. Sure, nobody breaks braindead common sense HTML better than FontPlague, except MS-Word: what were those cretins thinking (if that is the right word) when they came up with the idea that every table needed multiple copies of the </td><td> construct? Did that break Netscape Navigator 3.0 or something?

And all the WYSIWYG alternatives are orders of magnitude better -- which doesn't make them all that good. I haven't seen any tool that provided good clean XHTML output (other than a text editor driven by a human brain).

I'm a fanatic about standards: they are the most powerful weapon users have against the IT monopoly. So, obviously, Microsoft hates and fears them: they have their standard "embrace and extinguish" technique -- which won't work if USERS demand actual compliance -- and their astroturfers trash them at every chance. Any proposal for enforcing web standards has to deal with the real problem. Standards are a POTENTIAL good for users -- every time they have to upgrade their system, good standard compliance will save them 50-80% of upgrade costs. Standards are a POTENTIAL good for competitors -- they can compete for the business of the system-upgrading customers. Standards are BAD for the monopoly -- they represent a loss of control of locked-in customers (and so you see the Microsoft shills whining about "standards controlling programmers" -- no, but they take away SOME programmers' ability to control users: just like the Bill of Rights "controls" the U.S. government -- by preventing its assumption of certain powers over the people.)

The proposal, while no doubt heartfelt, was braindead on arrival. So far, Microsoft has won all the battles against web standards, and it only cost them between two and three billion dollars in penalties, and about the same in broken-software development costs -- pocket change, compared with the profit potential of the perpetual enslavement of humanity. Google does not have the power to fight that battle, and it would distract them from their real mission to fight spam.

rfgdxm1

5:26 pm on May 3, 2004 (gmt 0)

>pocket changed, compared with the profit potential of the perpetual enslavement of humanity. Google does not have the power to fight that battle, and it would distract them from their real mission to fight spam.

But isn't Google's mission the enslavement of humanity as much as Microsoft's?

BigDave

6:06 pm on May 3, 2004 (gmt 0)

I'm a fanatic about standards: they are the most powerful weapon users have against the IT monopoly.

Oh, I like real standards. I just don't like standards by fiat.

If you think about how most of the internet standards work, and compare it to w3c standards, you might notice some irony.

All the protocols are pretty much covered by the RFCs, or Requests for Comments. They are not a "standard" they are usually something that has already been implemented and it is the documentation by the author of what he did. then you either meet RFC****x or you do not. There is no enforcement other than interoperability.

If you look at standards like SCSI and ATAPI, they really do work the same way as RFCs. They document the way that things already interoperate. Commands were only depreciated when they fell out of common useage, or created a real incompatability. And these are groups where all the players with a say had seats on the committees.

On the other hand, the web "standards" depreciate things that are in common usage. In fact they are in more common usage than their replacements. And only an infinitesmal minority of the players are on the committee. HTML is not like SMTP or SCSI, where there are only a select few who ever delve into those protocols. HTML is of interest to all the producers of web pages.

Unless all those content producers sign on to the standards, they are simply not standards. They are simply an RFC that everyone ignores. If the code works in all the major browsers, it is standards compliant.

Like some of the others said, validating your code is mostly a good way to make sure that your page will work on the different browsers out there.

europeforvisitors

6:24 pm on May 3, 2004 (gmt 0)

I'm a fanatic about standards

In that case, you're welcome to be fanatical in adhering to them, but why should Google let fanatics influence the quality of its SERPs?

loanuniverse

8:14 pm on May 3, 2004 (gmt 0)

I am against this due to the fact that my pages would never validate. :D

I also think it is a bit elitist of people to enforce it when most pages show perfectly well on the great majority of people's browsers, but are not up to standard. And yes, I even have a couple of </html> instances in every page.

digitalv

11:15 pm on May 3, 2004 (gmt 0)

OK then, let's not mark up pages that validate with a "validated" marker, but instead flag up those that have either a gross amount of errors, or major HTML nesting errors, missing tags, or malformed tags instead.

This is no better than the original suggestion ... the simple fact is that when you're looking for something on the web IT DOESN'T MATTER whether the code is "excellent" or "sloppy" by any standards at all, as long as the page you were taken to contains what you were looking for.

Google, or any search engine for that matter, should make any comments at all about the "condition" of the code behind the site. It's CONTENT we're looking for when we do a search, not the flavor of HTML the designer used.

Of all of the things out there that COULD make the web better, having a major search engine rank sites based on a standard less than 50% of web designers agree with is definitely not one of them.

g1smd

11:41 pm on May 3, 2004 (gmt 0)

If the code is messed up enough to potentially crash my browser then they would be doing me a great service by telling me about that in advance.

That answers the "what use could it be to the surfer?" question.

This 74 message thread spans 3 pages: 74