Forum Moderators: Robert Charlton & goodroi
According to some [slashdot.org] sites [searchbistro.com],
It's one of the best kept secrets of Google. It's a mystery on Webmasterworld. Also in Europe (France) they don't know what to expect from that odd URL [eval.google.com....] Click it and you get ...nothing. The site reveals itself only if you have the proper login and if you use a network known by Google. Residues of Eval.google are found on the web, but the full content of the mystery site has never been published before. Here it is: the real story about Eval.Google. They use... humans!
The site claims it is some kind of the secret google evaluation lab!
(My observations are copyright free and for good reason: they're just an opinion with no intellectual content.)
I sincerely appreciate it if you've stopped posting documents and taken out the employee's name. Let me tackle the last question you asked:
Please explain. I saw in Eval several duo-list based on the same search terms. Most duo-lists show a different order of answers then the other list. The raters were asked to choose which answers were the best. If this is not filtering, what is it then? I have many other examples.
Think of it like a taste test. If a drink maker had an idea for how to tweak their formula, they might have one version with more vitamin C, or another version with more sugar. It's natural to ask testers for their feedback. But you wouldn't say that the taste testers were the directly changing the formula that was sold in stores. It's even less directly tied with search results: if there's a slight preference for one type of scoring, but it takes 100x the computing power for that ranking, it may well be a better choice to use that 100x computing power for a different task that improves quality more.
I think it's absolutely a great idea to collect feedback/quality ratings about different types of algorithms. But hopefully the analogy of a taste test shows that we may collect feedback without it actively altering our search results.
Whatever row you are having with someone (which I don't understand; or indeed care about.)
This thread does suggest that Google has a human input.
It may not be direct human input, but it exists, and
has always been denied.
Has Google been untruthful about this in the past?
When I was a little kid we used to have to test drinks at our school sometimes :)
We had to say which one tasted nicer (thing is we always used to say that they all tasted nice - that way you got to finish the drink)
Wish we got paid $10-$20 though.
JulianDay
Did you really think that Google had reached the stage of no human input - we have not reached the stage of IRobot yet I think.
It certainly sounds wrong for someone to take Googles "secret" documents and put them where they are visible to others.
As it happens, I have found stuff that people realized too late was secret and removed from the web, in the google cache. I am sure it has happened at least ten times that someone will have information that they decided to hide, and are either unaware of the google cache, or don't know what to do about it, and I will then get the info that I need.
It seems that there is at least a bit of "wringing of hands" here that is undeserved until Google gets rid of the cache. (Of course, I would hate to see it go.)
No disrespect intended, it just seems like very similar things, with similar arguments for and against. (Most websites do say copyright on them.)
I appreciate that individual humans may not have altered particular serps.
But there is now no doubt that there has been human input into serps.
I am not saying this is a bad thing. It may well be a good thing.
But Google has, for as long as I can remember, always denied any human intervention in the serps.
I hope that sums it up.
[edited by: JulianDay at 9:20 pm (utc) on June 6, 2005]
Seriously, the whole point is that Google uses human input. Only last week I got a quote from Google that they can't help it that secret documents are spidered, since machines are doing the job.
[edited by: voelspriet at 9:24 pm (utc) on June 6, 2005]
But hopefully the analogy of a taste test shows that we may collect feedback without it actively altering our search results.
Analogies only work for the person that uses them.
If it was just a taste test you only need two buttons, "yucky" and "yummy" to rate serps, whats the deal with rating individual pages?
[edited by: TypicalSurfer at 9:31 pm (utc) on June 6, 2005]
If, as you infer, there is manual manipulation of the results (versus algotithmic manipulation) then I agree with you wholeheartedly. However, nothing in the provided documentation (or dialogue) points conclusively that that is the case.
PS: (A)
No it isn't. The source code of Google's algo is, not a memo on what is spam or not. If Coca Cola has a memo on how to spot chipped bottles (using common sense info available elsewhere), that is not a trade secret, not matter how much they say it is. Their formula is though, unless...blah blah blah.
>> If there is no legitimate news reason, and the journalist had reason to believe that it was still trade secret, they can be held liable for damages. <<
Here's how I'd justify it: "Google claims that math solves all problems and that robots do it better; this memo shows different."
Of course it's not entirely true but true enough to get First Ammen. protection in USA. To say this is not newsworthy, is dishonest. Now if he had stolen, or hacked Google to get it, it's totally different.
However, if the results from testing are used to modify the algorithm to return better results
Sounds good, doesn't it? But what are the criteria for evaluation and what URL's are chosen? Why is Kelkoo on Google's whitelist and some of the competitors not? Why is children.pr-e-g-n-a-n-c-y-p-a-m-p-e-r-s.co.uk according to Google a 'sneaky redirect' and www.film.com is not? These kind of questions interest me...
[edited by: Brett_Tabke at 10:16 pm (utc) on June 6, 2005]
[edit reason] obfuscated link [/edit]
Was there ever any doubt?
- Google has used DMOZ directory data for over 5 years - using the opinion of thousands of editors to back up their algorithmic ranking.
- Whenever Google has made an algo change, they've used JS to track visitor's clicks - aka human feedback.
- The patent filing that was released specifically mentions "TrustRank" and how it was to be acheived (through human feedback)
- obvious the search engineers check sets of search results to ensure that the end results of changes are appropriate.
--
What is very apparent is that these "hub raters" do not tank individual sites - they rate serps and google uses human feedback to tweak their algorithmic results.
This has been going on for a long time and claiming that the hub raters caused a relatively new site to "disappear" from the serps is nothing more that an odd combination of paranoia and hubris.
If your site can't pass human inspection and is ranking, it is going to get removed from the Google serps eventually - either through an update, or human intervention instigated by a spam report, or a keyword spot check.
Lesson: make your sites unique and beneficial for the end user - nothing shocking about it.
[edited by: PatrickDeese at 9:35 pm (utc) on June 6, 2005]
Nevermind, I for one welcome... uhm... For a long time I've been thinking that a higher degree of human involvement in web page rating was indeed necessary. Not just "nice". Humans simply perceive things in another way than bots (if one can indeed speak about bot perception).
>> Why is Example1 on Google's whitelist
I haven't seen said list, nor evidence2 that it exists, but if it does, that would definitely not be the way to go. A "spammer" is a "spammer", regardless of site name or URL. There is more than enough confusion about what "spam" is and isn't without such a list confusing matters even more.
Webmasters routinely become both confused, misinformed, and scared, even paralyzed for fear of doing the wrong things. This does not benefit the www, not the webmeasters, and not the search engines. Openness, clarity, and firm clear-cut principles that are valid for everyone (and either are enforced across the board or not at all) is the only way.
If it isn't "spam" when "Company X" does it, but "spam" when "Company Y" does it - then what is it, exactly? And with such an attitude, how are we (webmasters and advisors) going to take anything on this matter seriously, ever?
---
1) voelspriet, you might want to hold back on direct references to external sites due to board TOS
2) I haven't read the full thread, or the /. story, or the blog, sorry about that. Too little time...
[edited by: claus at 9:51 pm (utc) on June 6, 2005]
SE algos look for good search results (the input and output to this algo is complex. <snip>
The algo is created by humans <snip>.
[edited by: lawman at 10:46 pm (utc) on June 6, 2005]
[edited by: Dayo_UK at 10:42 pm (utc) on June 6, 2005]
>what I read at Henk van Ess's Search Bistro ,I recon will bring a lot winds of war in the marketing industry(ie nuking affiliate sites that have the codes and urls of several well known affiliate marketing companies). That will damage economically as well the 90% of all the WebmasterWorld members that run affiliate programs. <
I don't think that Google is waging a general war against affiliate program marketing, and there is no indications that Google is doing so.
Google is just trying to clean up the serps of affiliate link farms, banners farms and affiliate pages with no values added.
Affiliate program marketing pages with good contents are still ranking well on the serps.
So please go ahead and join serious affiliate programs and just remember to create valuable PRE-SELL pages. Good luck.
These are things that are year's old - so how can a Rater saying this site has x amount of hidden text or this site has x number of affiliate links help to improve the algorithm?
It can't, if the algorthim could handle these things or G wanted to base ranking factors on the use of them then it would already - so the only purpose of these reviews is to highlight individual cases. Which to me means manual reviewing of the SERPS to remove the "Spam".
I can appreciate the irritation that GG has about this info being made public and I think the fact that he is, or appears, so irritated also speaks for itself.
As I read the guide, a rater can improve the algo by giving input that goes something like "even though this page is 99% aff links, it also has 1% OTHER content that I as a human find helpful, and therefore this page should not be flushed."
I mean, how could any group of SE geeks EVER succeed in writing an algo that took into consideration ALL potentially "useful content?" How would an algo know what was "useful" unless a human told it?
As a perfect example, I've been looking at those *otelguide.net pages given as a no-spam example since 1998 because they are my competitor, and I NEVER saw that "video" link the guide pointed to as an example of useful content.
So what is the motive - mainstream reporting of news? Nah, we all know it's not really news. Personally, I figured they used human's to evaluate results. That's quality control 101. So why would voelspriet post this information?
My guess - traffic. He realized that he could make a good buck by posting the information. But what about the risk of Google lawyers? Ignorance is a pretty shaky defense.
Voelspriet is probably betting that Google wants to keep this quiet. After all, the mainstream public has no idea how a search engine really works. Most people would come away shaking their heads, coming to the wrong conclusion - Google is really just a bunch of people sitting in a room looking at results.
I find it funnier that people don't read, or understand the entire post before picking a word or two and slamming it.
Too bad we weren't suppose to see it. That's not it works, at least in USA. We weren't suppose to know of the Watergate break-in, the Enron tape recordings, Pentagon papers, Nixon tapes, MCI e-mails, and thousands of other (relevant or not) internal memos from other companies.
It's even funnier how someone tries to compare illegal activities of companies with trade secrets. Maybe I just misunderstand your point with this post.