Welcome to WebmasterWorld Guest from 22.214.171.124
...we used our standard evaluation system that we've developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: "Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?"
There was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?"
...we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons.
"we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons...
what would be the consequence if everyone knew Google's definition of quality?
That simply isn't true, it's post Panda that the system is rewarding quantity over quality
If too high an ad/content ratio is a problem, they could say so.
[edited by: TheMadScientist at 5:59 am (utc) on Mar 31, 2011]
One predictable consequence if Google were to tell us exactly how they are defining "quality" is that the incentive to crank out millions of pages to game Google's system would resume, with the minor proviso that everyone would make sure their pages are just barely over the minimum quality threshold.
Just my current operating theory of course...
Granting that's true, you seem to assume that this "minimum quality threshold" would be "just barely better than crappy." Why? Why couldn't Google release a threshold definition that's "pretty darn good"?
If your message uses the word "v1agra" numerous times, it's probably going to get blocked."
If they wanted to, they could make that info available through Webmaster Tools.
Google says it is attempting to detect and downrank "low quality" pages/sites. They've said nothing about below-average or high quality.
...the key is, you also have your experience of the sorts of sites that are going to be adding value for users versus not adding value for users. And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons
Google has been using visual page simulation for a while - their "reasonable surfer" model leans on it, and it has modified the way PageRank is calculated.
Last year someone had a penalty reversed because an iframe generated a false positive for their "too much white space" metric. It was documented on Google's own forum, and JohnMu got involved to place a flag on the site, in case it ever triggered that penalty again.
It was on Webmaster Central - but all we know is that the iframe triggered a false positive