|Insight into the leaked google training manual|
| 5:26 pm on Nov 10, 2011 (gmt 0)|
Ladies and Gents,
Everybody read the leaked training manual right? I mean it's 125 pages of GOLDEN insight into the brain of how Google thinks... at least how one department trains it's employees. It is way more revealing than "in the plex" (arguably an interesting read)... and the manual is FREE, it only costs a little moral equity... as it was apparently not supposed to have been leaked.
What is your insight? What did you think about the manual?
Here are a couple of my insights:
1. People always say Google $%&$s on the little guy... Yelp being one of those complainers... The Manual talks about Yelp in good light... many times over. Interesting. +1 for Google.
In another light:
1. Google uses the manual to train employees how to rate websites. The sheer fact that they are training employees how to rate a website is introducing bias into their data. They are essentially telling the raters which sites to rate highly and which sites to bury.... meaning Google is "designing" the web, themselves. (which is counter to the organic growth of the web that they have hawked in the past.)
More on the above point with example:
Google says: 3 types of user intent:
1. Know (someone who wants to know something, think "SEO tips to game Google")
2. Do (someone who wants to do something, think "Buy SEO software")
3. Go (someone that wants to go somewhere, think "webmasterworld.com google seo news and discussion")
Then they show good examples of each of sites that answer the query according to the intent. The problem with the above methodology for training and "looking" for better search results is this:
Google is training people to NOT find unique/inspiring results... they are training people to look for homogenous results that SCREAM intent to the customer.
Think about it this way:
You are a trained rater from Google. You read a document on how to rate with examples of good: Amazon/Yelp/various other arguably good sites. You are trained to analyze a query for the three types of user intent: go/do/know... then you are given query after query with the resulting pages. Overtime your mind becomes numb to the sites... overtime with decision fatigue sinking in (incidentally look up an article about decision fatigue...it's really interesting); you start to not look at what is inspiring about the website... you are looking for the AMAZON signals... you as a human rater are annoyed by design because it makes your job, finding out what the site does, harder. You reward simplicity... you reward AMAZONIAN style sites... because that is what Google is looking for... because you are trying to do a GOOD job and your mind (after hours of analyzing sites) cannot handle looking into the intricacies of otherwise artistically designed, interesting websites that offer the consumer an inspired experience.
You as a rater have just passed over the Mona Lisa and listed it as only "slightly relevant" or worse, "not relevant".... Because there isn't a bright neon sign above it pointing down saying: "Masterpiece"
And for all of you raters out there (I actually know a couple), I'm not trying to say that you don't know what a masterpiece looks like... I'm trying to say that Google has trained you NOT TO LOOK for them...
And to Google, I don't know that you could get around it... and you know what they say... indecision killed the possum... You've got to do what you have to do. It's just an interesting by-product of trying to scale, I guess. There should be another search engine (in fact anybody who wants to start another search engine PM me,) that goes through looking for the artistic sites that people actually DO want to spend some time EXPERIENCING... the kind that raters would not have had the time/energy and training to look for!
What that means to me:
Start adding neon signs to the site... while keeping to our core of being different and interesting... for the consumers who care.... the people who like our site design and functionality! The people who aren't trying to figure out what we do... for the people who by virtue of our design want to be a part of what we do!
| 5:55 pm on Nov 10, 2011 (gmt 0)|
|Google uses the manual to train employees how to rate websites. |
These clarifications came from Matt Cutts at the Monday night Pubcon networking event:
Webmasters tend to put a slightly skewed angle on this. The quality raters are actually rating a SERP (that is, a particular algo configuration) as a quality control measure for the algo team. Their ratings do not directly change rankings- but they hep the algo team see if the algo worked as planned or not.
Also, note that this document is not for the spam team. They also have a training document and use human quality raters - but that document has never been leaked.
| 6:53 pm on Nov 10, 2011 (gmt 0)|
So when they mark a site as spam this is ignored by the spam team? Yeah, right.
| 11:31 pm on Nov 10, 2011 (gmt 0)|
I think it's likely that Panda applies the raters guidelines to the best of it's limited intelligence. Remember all that BS about a site looking trustworthy and would you take medical advice, etc, being part of Panda?
Explains a lot.
| 2:36 am on Nov 11, 2011 (gmt 0)|
Great observations, Lenny2 - pretty hard to refute.
However we don't know exactly how they use the results. Hopefully they recognise the potential shortcomings of this method of evaluation before they put the data to use
| 8:00 am on Nov 11, 2011 (gmt 0)|
Thank for the post.
I always wonder why not so much insider info leaks ?
And why is it impossible to find (even on other engines!) posts from Google employees who have reasons to complain about their job/company ?
| 5:39 pm on Nov 11, 2011 (gmt 0)|
superclown2: +1! This would have been such a boneheaded waste of resources, even Google awash in money would not do it. In fact, I think that in any normal business the smaller spam team would feed right off of the output of the much larger general raters' team. Plus of course they'd look at the high profile cases and anything they come across on their own. But the bulk of their input has to come from the raters.
|So when they mark a site as spam this is ignored by the spam team? Yeah, right. |
MC's response is a typical corporate smokescreen. If he admitted that their job is delivered to his team in an easy actionable way, he may lose some funding, headcount or both. Besides, if he said that they are simply taking the output of the rater sweatshops, his team would lose the aura of "knights in shining armor" guarding the Google castle from the existential threat of hordes of spammers, which aura MC is personally very fond of as far as I can tell.
So, he has every reason, personal and corporate, to keep telling that they fight this battle on their own. And we have every reason not to trust him on this one.
| 7:29 pm on Nov 11, 2011 (gmt 0)|
@Tedster, the argument that they use the data to qualify their algorithmic data doesn't sit well for me. The reason being that no matter how you cut the cheese they are using the data. The old quip: "which came first, the chicken or the egg" comes to mind.
Allow me to explain:
They have a preconceived vision of the web which they train their staff. The staff then goes about their work qualifying websites according to this doctrine. Another set of engineers build an algorithm according to the same doctrine/or at least the same school of thought.
They then compare the results and low and behold the machine matches up with what they trained their peeps to find.
So the question about the chicken and the egg comes up....
Personally, as a webmaster/site owner/business owner/tax payer/employer/father/husband/son/friend/patron/whatever, it doesn't matter which came first. The point is that Google is not looking for the Mona Lisas; google is looking for the scalable solutions and if I want to maintain any rankings in their system I have to conform. That's all.
BUT, actually I didn't want to necessarily espouse my own ideas... I was more interested in getting this conversation going about the manual... WHAT INSIGHTS DID YOU GUYS SEE IN THIS?
| 10:03 pm on Nov 11, 2011 (gmt 0)|
|They have a preconceived vision of the web which they train their staff. The staff then goes about their work qualifying websites according to this doctrine. Another set of engineers build an algorithm according to the same doctrine/or at least the same school of thought. They then compare the results and low and behold the machine matches up with what they trained their peeps to find. |
Here's my insight: The quote above is another way of saying that they are creating their own reality.
And it's working.
I've been saying here for years that Google is refashioning the web to suit themselves, an example being their attack on "trading links", which existed long before Google but is hands-off today because Google implied it could hurt you. Ditto to webrings ~ dead, in large part thanks to Google's displeasure with them. Ditto to directories, which are gone in part because the directory owners asked for a return text link or logo added to the recipient site. Now with Panda they are hitting affiliate sites. If you want to rank with the number 1 traffic driving site in the world, it's their way or the highway, and with that control, they are attempting to build the web they want ~ the one that suits their own interests ~ NOT the one that would otherwise come into creation. There's no way to know whether that's a good thing or a bad thing because we don't know what we don't know ~ it just IS.
| 3:26 am on Nov 12, 2011 (gmt 0)|
Lenny2, this is how I felt too. Unfortunately, I am too far away from the pubcon world to raise these questions, if I were allowed to do so.
You have all the insights on what they do and what more you are looking for? :)
| 4:43 pm on Nov 12, 2011 (gmt 0)|
Superclown2...exactly. EWOQ program has to be used more than just for ad quality ratings and testing new algos. EWOQ has to be VERY expensive and there is no way google spam team disregards the data they collect.
EWOQ has classification systems for adult sites, hidden content, thin affiliates, vital sites (whether your site can 'own' a search term), scrapers, phony redirects, hidden text and more.
This stuff has to go on your permanent record. We know google for example has a adult filter...if an EWOQ reviewer said a site about breast cancer wasn't #*$!, don't think google would exempt it from the adult filter?
Sure google will be using this to test new algos, but they are going to be killing more than just one bird with this stone.
My guess is that certain EWOQ findings are checked against each other...and if most reviewers agree...and the result is not a punishment (like #*$! exemption or vital ranking) it goes on the site's record and this factors into the SERPs. If it is a penalty, it probably gets forwarded upstream to a more advanced human review system run by the spam team and they probably confirm the findings and assign the penalties based on what EWOQ found.
| 10:43 am on Nov 14, 2011 (gmt 0)|
I don’t buy this “Google are creating a preconceived vision of the web” by training staff to conform to a set of rules for evaluating sites.
An athletics coach, music teacher or university lecturer teaches their students by taking the best of what they have learned and passing it on. They help students avoid bad habits and promote evidence based best practices. As a result, each generation of athletes, musicians and scientists are more capable than the one before – building on the knowledge and experience of previous generations.
So, of course there is some "decision engineering" going on, some gems will be missed and the spam team are bound to be involved somewhere.
However, given the significant developments in search engine technology over the past decade and Google’s continued dominance, I’d say that they are making a ... reasonable... job of it so far (at least as far as most users are concerned) and they CAN’T afford to screw up by becoming a repressive regime because the fickle user base will jump ship overnight (Altavista anyone?)
The inclusion of ranking factors such as social signals, freshness, the rate of growth of organic links, reduced advertising etc. gives me hope, as a user, that the web will continue to develop as a diverse, interesting and UN-censored source of information.
As a developer, however, who makes a comfortable income from the affiliate market space I’m not so happy. I think my days might be numbered. $#@&!
| 1:17 pm on Nov 14, 2011 (gmt 0)|
|So when they mark a site as spam this is ignored by the spam team? Yeah, right |
After a 30 minute (somewhat heated) conversation with Matt Cutts at Pubcon, I have a lot better understanding. Google doesn't like extensive manual processes. These quality rater decisions are being used to evaluate the algo - not to generate manual spam penalties.
In many cases (according to Matt) they only see aggregate data and not the specific sites that were rated. So what the team is getting is more like "the algo let such a percentage of spam in for this kind of search".
Not only that, but this particular leaked document is NOT the training document for the spam team. It's the more general training document for overall SERP quality assessment. Matt says the spam team's training documents have never been leaked.