Forum Moderators: open
We all know the resources that Google have at their disposal and the resources soon to be coming their way. Well why don’t they make the web a standards compliant place with strong steps towards accessibility.
Here is how I see it, get their own house in order (valid code, at least some level of accessibility) and then start to reward webmasters with valid and accessible websites. The vast majority would do it in a flash. Of course people would moan, so what, they moan when there is an update, they moan when their site is not #1, it would just be a little extra moaning :).
Come on Google make the web a nice place to play. The same goes for your people over at Yahoo and you lot at MSN.
Cheers
200 million queries a day. Worst case scenario: W3C is 5K (in practice about 200 bytes for small image, 5-6 bytes if text) => 1000 GB every day.
For $8/month accounts one can get about 50GB bandwidths (on top of 2GB storage and other facilities.) => 20 * 8 = $160 /day. (Since G won't be using storage ... more likely if negotiated it will be about $20 per day.) => 366 * 160 < $60,000 a year.
In practice, about $1000 a year? ;)
The ODP has recently started listing external RSS feeds (and Atom too) in some categories. The RSS feed MUST be valid code, or it gets rejected by the editing tools.
.
The whole of the ODP should now also be 100% valid HTML 4.01 Transitional, and the whole lot is now using UTF-8 encoding throughout the 4 million site data URLs, titles, and descriptions (caveat: there are some known encoding errors in a few category descriptions and @links, and those are already being worked on).
Not arguing with your math, but are you multiplying that by 10 listings per page, plus, say a "greyed out" one if the site is not compliant.
Then there is the issue of increased server load per query, etc. and as someone else pointed out, the increased cost to google of running every page through a validator.
If it only took 1/100th of a second to run each page through a validator, that (according to my calc.exe) would take about 1.4 years to validate every page G claims to have in the index.
And this is supposed to cost $1000 per year?
Ya know, I think they could better spend their time on other projects. ;)
In its serps Google can add a small W3C image below each compliant page.
Bah. Google removed the dmoz categories and descriptions from its listings because people weren't using them. Adding symbols that people won't use does not improve Google.
I'm probably been pro-standards longer than any of you, but I'm not budging on this one: Standards compliance is not a search-engine issue. Anything that boosts format over content (or even implies that a certain format is better than others) will have negative effect on the quality of search results. Google won't do that.
The point I was trying to make was that it is not a question of money but of philosophy. Regarding computing time, I don't know how many computers Google has got (100,000+?) and therefore can't tell. Worst come worst some company like IBM, Dell, or HP must be dying to provide all the hardware and bandwidth just to get a brief mention somewhere on the Google site.
>Adding symbols that people won't use does not improve Google.
Personally I don't care much about the compliance of others' sites. I do it for my new sites because I don't want browsers to do something different from I had intended. W3C symbol might be quite useless for me, but symbols for whether it has any popups, time to load (as in Alexa display), has any weird music software or so on could be helpful to a surfer like me whose Windows OS has crashed far too many times. For example Google displays PDF for pdf files and it is very much welcome by me.
Providing certain information can help surfers.
a way for G to make some $ from this cleanup op
Here's one way.
Alter the FAQ for webmasters to say something like:
"One factor in influencing positioning in SERPS is the well-formedness of a page's HTML. For tools that can produce or validate HTML and are approved by Google click [here] and [here] or [here]"
Google could then:
Once webmasters believe that well-formed HTML can affect SERPS positioning, the stampede for tools or consultancy to fix HTML will start. And there's money to be made all round there.
Then they do the same with PDF, Flash and other formats too,
I wouldn't mind seeing something right next to the PageRank on the Google Toolbar that tells if the page in the browser window validates. It would make editing a lot easier as you could view the page and see if it validates at the same time.
Nice idea, but again it's not really something that should be Google's responsibility. If all of you standards-junkies want something like this, you should get W3 to make you a toolbar not Google. The fact that this was even suggested (not the Toolbar thing, I mean the issue of Google giving a boost to validated sites) shows a lack of understanding on what users want out of a search engine. People expect RESULTS - if you can see it in your browser and the content matches what you were looking for, the search engine did their job. It really doesn't freaking matter whether the site you're looking at validates or not as long as you're getting what you want out of it.
Any search engine who would implement what was suggested would be doing a disservice to their searchers.
-- Check for broken links and correct HTML
could be expanded with a couple of dozen words about code validation.
.
Rewording required in:
-- Make sure that your TITLE and ALT tags are descriptive and accurate
Ugh. It should be: Make sure that your TITLE tag and ALT attributes are descriptive and accurate.
I didn't see the meta description mentioned there either.
Fatal Error: No DOCTYPE specified! That's the message w3.org gave on a serp page. I think that part of the problem is that when running a database driven site, the programming weighs in heavier than anything else. I've got a simple Perl site, and often my pages don't validate. I worry more about getting the content right and checking it against a few popular browsers to make sure it looks alright. If I ran a fast and complex search engine, I think I'd be preoccupied with things like trying to filter out million page plus spam hubs without knocking out the legitimate sites.
In case of PDF files Google warns us by putting a PDF label next to the title and adding a line about file format. Should Google remove this because it is not Google's job?
PDF files are a different type of content from HTML files, and they aren't supported by standard browsers, so it makes sense for Google to identify them as PDF files.
For Google, the question is likely to be "What does the user want to know?" It's reasonable for Google to assume that users would like to know if a link points to a PDF file that requires a separate reader; it's less reasonable for Google to assume that users care about whether a page's HTML code passes a validator check.
Interesting point. If Google itself isn't all that concerned whether their pages validate at w3.org, it would be hypocritical for them to consider that relevant for other sites.
On the other hand, when the standard just documents common usage, it tends to be a good standard.
The problem with the web standards are that they tend to depreciate what is still in common usage.
They can decide all they want that <font>, <b> and <i> are bad, but they are all easier to use than what replaced them. And every browser handles them properly, and they always will.
The standard that is defined by common usage is that thse tags really are still a part of html 4.
Now, if google could tell if a page was really broken, then it would be a good thing to ding it.
Now, if google could tell if a page was really broken, then it would be a good thing to ding it.
The only bad part would be that the 12 people who spend time hand-fixing their FrontPage code would be penalized as well...
-- Rich (ducking back into the bushes)
OK then, let's not mark up pages that validate with a "validated" marker, but instead flag up those that have either a gross amount of errors, or major HTML nesting errors, missing tags, or malformed tags instead.
And all the WYSIWYG alternatives are orders of magnitude better -- which doesn't make them all that good. I haven't seen any tool that provided good clean XHTML output (other than a text editor driven by a human brain).
I'm a fanatic about standards: they are the most powerful weapon users have against the IT monopoly. So, obviously, Microsoft hates and fears them: they have their standard "embrace and extinguish" technique -- which won't work if USERS demand actual compliance -- and their astroturfers trash them at every chance. Any proposal for enforcing web standards has to deal with the real problem. Standards are a POTENTIAL good for users -- every time they have to upgrade their system, good standard compliance will save them 50-80% of upgrade costs. Standards are a POTENTIAL good for competitors -- they can compete for the business of the system-upgrading customers. Standards are BAD for the monopoly -- they represent a loss of control of locked-in customers (and so you see the Microsoft shills whining about "standards controlling programmers" -- no, but they take away SOME programmers' ability to control users: just like the Bill of Rights "controls" the U.S. government -- by preventing its assumption of certain powers over the people.)
The proposal, while no doubt heartfelt, was braindead on arrival. So far, Microsoft has won all the battles against web standards, and it only cost them between two and three billion dollars in penalties, and about the same in broken-software development costs -- pocket change, compared with the profit potential of the perpetual enslavement of humanity. Google does not have the power to fight that battle, and it would distract them from their real mission to fight spam.
I'm a fanatic about standards: they are the most powerful weapon users have against the IT monopoly.
Oh, I like real standards. I just don't like standards by fiat.
If you think about how most of the internet standards work, and compare it to w3c standards, you might notice some irony.
All the protocols are pretty much covered by the RFCs, or Requests for Comments. They are not a "standard" they are usually something that has already been implemented and it is the documentation by the author of what he did. then you either meet RFC****x or you do not. There is no enforcement other than interoperability.
If you look at standards like SCSI and ATAPI, they really do work the same way as RFCs. They document the way that things already interoperate. Commands were only depreciated when they fell out of common useage, or created a real incompatability. And these are groups where all the players with a say had seats on the committees.
On the other hand, the web "standards" depreciate things that are in common usage. In fact they are in more common usage than their replacements. And only an infinitesmal minority of the players are on the committee. HTML is not like SMTP or SCSI, where there are only a select few who ever delve into those protocols. HTML is of interest to all the producers of web pages.
Unless all those content producers sign on to the standards, they are simply not standards. They are simply an RFC that everyone ignores. If the code works in all the major browsers, it is standards compliant.
Like some of the others said, validating your code is mostly a good way to make sure that your page will work on the different browsers out there.
I'm a fanatic about standards
In that case, you're welcome to be fanatical in adhering to them, but why should Google let fanatics influence the quality of its SERPs?
I also think it is a bit elitist of people to enforce it when most pages show perfectly well on the great majority of people's browsers, but are not up to standard. And yes, I even have a couple of </html> instances in every page.
OK then, let's not mark up pages that validate with a "validated" marker, but instead flag up those that have either a gross amount of errors, or major HTML nesting errors, missing tags, or malformed tags instead.
This is no better than the original suggestion ... the simple fact is that when you're looking for something on the web IT DOESN'T MATTER whether the code is "excellent" or "sloppy" by any standards at all, as long as the page you were taken to contains what you were looking for.
Google, or any search engine for that matter, should make any comments at all about the "condition" of the code behind the site. It's CONTENT we're looking for when we do a search, not the flavor of HTML the designer used.
Of all of the things out there that COULD make the web better, having a major search engine rank sites based on a standard less than 50% of web designers agree with is definitely not one of them.