| This 201 message thread spans 7 pages: < < 201 ( 1 2  4 5 6 7 ) > > || |
|eval.google.com - Google's Secret Evaluation Lab..|
"Rater Hub Google" Rumours?
On Apr 19, 2003, some members had spotted referrals from
followed by a question number and couple of different email addresses. You can read earlier threads here,
According to some [slashdot.org] sites [searchbistro.com],
|It's one of the best kept secrets of Google. It's a mystery on Webmasterworld. Also in Europe (France) they don't know what to expect from that odd URL [eval.google.com....] Click it and you get ...nothing. The site reveals itself only if you have the proper login and if you use a network known by Google. Residues of Eval.google are found on the web, but the full content of the mystery site has never been published before. Here it is: the real story about Eval.Google. They use... humans! |
The site claims it is some kind of the secret google evaluation lab!
|Liane, it's not mr Hess (ouch, that's a German nazi), nor mr. Ess, but Henk van Ess. |
My sincere apologies ... that really was a boner on my part!
|Google sure knew what to expect in advance. ... |
Regardless of what private communications you may or may not have had with Google employees (which by the way you are not supposed to share with us on WebmasterWorld - please see TOS), I am certain they did not tell you that it was just fine to go ahead and display their form on your blog. As stated before by Google Guy the form is reportedly marked "Google Proprietary and Confidential"
How can you justify this action by claiming you weren't aware of any restrictions or stating that Google knew what to expect? Your arguments don't make any sense? Did you tell them you planned to display a screen shot of their form? I don't see that in your communications!
Someone asks "how high" and you answer ... "yellow".
[edited by: Liane at 3:24 pm (utc) on June 6, 2005]
also - please notice Henk's join date...older than dirt. Thanks for sharing Henk.
I don't quite understand some people's outrage that it was confidential information and shouldn't have been revealed.
Say someone found the "secret ingredient" of ranking in Google (which was confidential information) and posted it on a forum. Readers of that forum could then use that information to rank highly. Would you close your eyes and not read it?
This is a forum for webmasters who want to rank in search engines. Any info is welcome to me. Google didn't want me to know about it? Too bad for them!
The way this thread is going we'll soon get to "unethical seo" and the thread will die off.
As for this being new stuff - a bullet point list of things I have to do on my site to be "good" is a little different from knowing "content is king".
If people want to get mad at someone for spilling the beans, they should get mad at the student who violated the NDA--not at Mr. Van Ess, who was just doing what a good reporter can be expected to do unless releasing a leaked document goes against the public interest or is likely to hurt someone.
I'm less interested in debating Mr. Van Ess's journalistic ethics than in the substance of the report.
I agree with the last post: focus on the content, not on me.
Just noticed a response of a webmaster from the US who wrote a freeware affiliate link protector which uses a 100% frame to render an affiliate site.
He developed the tool specifically for assisting people to prevent affiliate ID hijacking. It had nothing to do with manipulating Google. The spamguide of Google doesn't mention that there are legitimate uses for a 100% frame...
One comment and a couple question for the group:
First, it seems to me that it makes sense that Google would have some human editorial / QA process. No SE technology is perfect. It seems like most of the buzz behind this is in response to seeing the details – the UI, the SPAM guide, etc.
Secondly, speaking of the SPAM guide, I’m curious what other SEMs / SEO’s think of the SPAM guide posted on searchbistro, (presumably a Google doc.) For the sake of argument, let’s presume that it’s genuine and these guidelines represent Google’s view of SE SPAM. Most, if not all of it is not really news per se. I’m curious if anyone on SEW board thinks this may feed back into the SEO world in the form of a publication standard or ‘legitimized’ SEO technique?
Lastly, I thought it was interesting that they addressed domain squatters and secondary search results / PPC pages in this guide as well. It seems like these pages become more popular as companies start to gobble up expired domains. Does anyone think that the publication of this guideline may influence this industry in any significant way? From a QA standpoint, does anyone think that some of the criteria they outline in their “secondary search results” section could be used to identify certain blogger’s pages as “offensive” if they depend on too many RSS feeds (non-unique content) and/or PPC ads?
Thanks for the feedback.
I love the idea that Google is manually reviewing their engine - it's just one more way that Google is protecting the quality of their search results - the more filters on bad content/spam the better - they should open their whole engine up to a system to a world of raters like the netnose search engine did a few years back. I'm not sure if the engine is in use anymore because they did not have many pages indexed but essentially what they did is allow anyone to rate keywords to web sites - the sites would come up randomly so the rater could not choose the site they were to evaluate - then the rater was given several keywords to rate to the the web site that came up - then the rater could rate how relevant the keyword was. Multiple different raters would have to rate the site before it was allowed into the index. Google should do this!
Too bad we weren't suppose to see it. That's not it works, at least in USA. We weren't suppose to know of the Watergate break-in, the Enron tape recordings, Pentagon papers, Nixon tapes, MCI e-mails, and thousands of other (relevant or not) internal memos from other companies.
Technically, it's Google's fault for not protecting the info better. I know the student screwed them, and they should take that up with the leaker. You can say you aren't angry or whatever, but the tone of your posts clearly indicates that.
if this was no news to you, it was to me and I guess many of the "dumber" WebmasterWorld members. I find this very interesting. Next time you know of these things in adavance, feel free to post about it in detail; share it with us.
>> we aren't supposed to be privy to it. According to Google Guy, the page is protected and marked "Google Proprietary and Confidential". Further, the student who revealed this info has seemingly (according to Google Guy) broken a non-disclosure agreement he or she had with Google when they signed on. <<
I find it curious that no one is disputing the validity of that document. Correct me if I'm wrong but, I believe the going rate for a college student in China is closer to 25 cents an hour, not $10.00 to $20.00. I suppose the fact that GG didn't dispute it is enough, but still.......
even if it was false so what.
It doesn't shed any new light on the issue its discussing.
all it says is "We don't like fraud or spam"
no news there.
Well, we've all read it now.
I wonder how many of the people at WebmasterWorld are in on this -(and rating their own sites?)
Now that *would* be a bombshell.
Just a thought.
Might be nonsense: but published below for reference: no doubt I'll be removed from these boards :-)
(My user profile says my interests are 'Honesty' - and that's about right)
[edited by: Brett_Tabke at 7:06 pm (utc) on June 6, 2005]
[edit reason] please reread your local copyright laws [/edit]
<<-(and rating their own sites?) >>
Just a guess here, but I don't think Google would be dumb enough to allow raters to pick their keyword searches themselves. I'd say raters are probably sent a list of random KW serps to rate.
And I'd think that each one of those KW searches is sent to many (100s?) raters simultaneously and their ratings compared for anomalies.
|Correct me if I'm wrong but, I believe the going rate for a college student in China is closer to 25 cents an hour, not $10.00 to $20.00. I suppose the fact that GG didn't dispute it is enough, but still....... |
The existence of Eval.google.com was confirmed by Google. The student who worked for eval in 2004 claimed he got $20 for an hour - details on my blog. The validity of the documents I published are not confirmed by Google. I had three different sources, all international agents working for Eval, to do this for me. GG confirmed it only indirectly:
|The fact that Google does lots of testing and evaluation of our results in tons of different ways shouldn't be a surprise. That's part of the 70-30 breakdown where ~70% of our effort is on the core areas of search and advertising, but we usually don't talk about that 70% work to improve our results or validate their quality. So keep it quiet; I hear some other search engines read Slashdot too. ;) |
[edited by: voelspriet at 7:06 pm (utc) on June 6, 2005]
after that noise they must change now the subdomain from eval.google.com to
|But when Henk van Ess submitted his own blog to Slashdot, he asserted "Real people, from all over the world, are paid to finetune the index of Google," and that made it sound like people were reaching in via this console to tweak results directly, which just isn't true at all. |
and you replied
|Google Guy, do I read between the lines that you think my postings are irrelevant and misleading? That would be a shame. |
I don't believe they're irrelevant, but yes: I do believe that the assertions you've made are misleading. In my original post, I was replying to walkman, who asked "ok, so how do you know you've been manually hit by this?" which implies that walkman thought that eval.google.com was responsible for sites being hit. Likewise, I have a ton of respect for Tara Calashain at ResearchBuzz. But her post about your site says "Basically what Henk seems to have found is a part of Google that allows humans to tweak search results to ostensibly get rid of spam and let the most contextually-relevant search results rise to the top." Again, Tara wonders whether your posts said that results were being directly tweaked. Then there are assertions from your site like "The Google testers are paid $10 - $20 for each hour they filter the results of Google." "Filter" again makes it sound like an active process. And your self-submission to Slashdot ("Real people, from all over the world, are paid to finetune the index of Google"), which also gives the impression that people used eval.google.com to change our search results.
So yes, I looked at the wording from when you submitted your own site to Slashdot, plus the use of active verbs such as "filter" on your own site, plus the comments of smart people such as Tara and walkman and how they interpreted what you wrote, and in my opinion your posts have been misleading. Again, this was not a console in which people could directly fine-tune, tweak, filter, or otherwise modify our search results. eval.google.com was for "eval," i.e. passive evaluation.
Your follow-up question was "Why pay them for something if it has no effect om the index? Must be charity then." Why are you surprised that we would pay people to rate search results? The job posting has been public, after all. We do provide ways for people to volunteer to help Google (e.g. see our translation console at https://services.google.com/tc/Welcome.html ), but to rate search results consistently and well takes time and training. I think it's perfectly normal to pay people for their time.
When you quoted me on your site, you said "Google Guy: I've serious reservations about Henk van Ess" and in your post you said "Google's spokesmen Google Guy, who I love to read, has serious reservations about me." Just to be clear, that's not accurate: I don't have reservations about you personally, Henk. I think I stated clearly that I have serious reservations about two of your actions. I mentioned those two specific things in my first post, and I'll reiterate them: you took information from one of your students, and you posted information that (in my opinion) was clearly proprietary/confidential. Regarding the first, I believe you wrote in a comment on your own site that this information came from a student of yours? Regarding the second, I'm quite surprised that you assert "I'm not aware of restrictions." Besides the copyright symbol that you mentioned earlier, the very first picture you posted has a link "An NDA Reminder..." on the left in the Important Announcements section, where NDA stands for non-disclosure agreement. Are you honestly saying that if you had realized there were restrictions, you wouldn't have done five blog posts (so far), posted screenshots, posted employee's real names on the web without consulting them, and posted two training documents? In that case, I'll ask politely. Henk, this information was for ratings training. It's copyrighted, and I'm sure that the evaluation group considers it proprietary/confidential. I'd appreciate it if you would stop posting these documents.
By the way, I apologize in advance if this post comes across as strident. I hate he-said-she-said stuff, and normally I try not to post when I'm at ruffled at all. But I do think that things like posting an innocent employee's name from internal training documents is rude and unnecessary. Henk, feel free to include this entry on your blog, but if you do, I'd appreciate if you'd quote the entire post.
|I'd say raters are probably sent a list of random KW serps to rate. |
Agreed - but it can't be that hard for the seriously savvy to manipulate the results in their own direction.
(I assume the human feedback feeds into the algo. Otherwise, what would be the point? There are far too many webpages out there to evaluate manually.)
More interesting urls:
try search for:
All sites are non-responsive.
I'm concerned. Despite the post above, I have also stridently argued *against* the concept of a 'sandbox' (which has never been properly explained.)
But now I feel a bit foolish. Because a legitimate site for a new medical product, with only 3 inbound links, came into the serps for a few days.
Now it has now not merely gone off the radar - it has been removed from the serps.
It was deliberately launched prior to coming to market. Now the only thing in its place is an adsense page. What happens if customers want it - do they search a different engine?
I think we need more transparency.
> If the material was copyrighted, they really would have told me
let's be honest here: you knew, or should've known it's copyrighted; it came from Google's site. I still think a journalist has the right (and sometimes the duty) to do it, but let's not pretend. :)3
> I think we need more transparency.
Google is a for profit corporation that is in the business of making money. It sucks for us, but if they think that a "sandbox" is needed to keep making money, that's what will happen. Google needs to make money and satisfy it's shareholders who put $billions into Google. That's the reality and being "transparent" is not a smart move.
<<but it can't be that hard for the seriously savvy to manipulate the results in their own direction>>
Again, I'm just guessing, but I say that Google would NEVER accept anyone into the Rater program that:
Has a site
Had a site
Designed a site
Hosted a site
Is/was employed by a site
Contemplated developing a site
Registered a domain
Read Webmasterworld (yuk, yuk, yuk)
So, in other words, I'll bet Google goes to great lengths to make sure their raters don't have "their own direction."
|GG: Why are you surprised that we would pay people to rate search results? The job posting has been public, after all. |
I am surprised anyone is surprised. Monster has been plugging these jobs for a while. I was shocked to see some Manchester, UK jobs for Google (although not quality control) available too.
This whole exercise shows one thing - content will rule out... it's hard giving up spam, but building a set of daily updated QUALITY content sites is the ONLY way to go if you want to avoid being picked out either by the automatic algo or flagged by a team of quality results agents.
|That's the reality and being "transparent" is not a smart move. |
Agreed. But we shouldn't all be tarnished with the same brush.
The excuse in the past was that it was an algo;we can't alter it. Now it seems it is a human tweaked algo.
(I always suspected this.)
I think Google do a fine job - but my site has been removed from the serps and when users can't find it - what do I say?
It's the rubbish Google algo?
It's the mystery people who tweak the aglo (who, up until this point, Google has always denied existed!)
<<too many webpages out there to evaluate manually>>
I think they only do the first 20 results for each KW search don't they?
Random thought: What if Google somehow hooked this rater program to their spam report system, so that an excessive number of spam reports would drop that KW search into the que to be evaluated by the rater force.
That would be what many people have been screaming for.
|I think they only do the first 20 results for each KW search don't they? |
This could change small parts of the world economy couldn't it?
Seriously though : what is important about this is that GG has effectively confirmed that there is human input into the serps.
I don't think this is direct input - it is almost certainly indirect.
My gripe is this :
1) this human input has always been denied.
2) if there is a human input, then maybe more notice should be taken of individual webmaster queries:
e.g. "my site has disappeared" can no longer be put down to "it's our math algo, sorry pal."
The best google news in years and Henk takes a beating.
Lets get on to the important stuff like the cool google logo for St. Abercrombie day!
I guess we're fixin to find out how G$ feels about copyrights after all.
|More interesting urls: |
try search for:
:) - but those sites may not even exist - If Gbot found a link to one of the above it would show as a url only - Gbot is capable of listing sites based just on a link to it (even though it may not exist)
EG - it is possible that if I type
That the above url will appear in G search :) following a crawl of WebmasterWorld.
Although it probably wont because of the * - but you get the idea
Just found an example for you :) site:wwww.google.com [/edit]
[edited by: Dayo_UK at 8:29 pm (utc) on June 6, 2005]
|Henk, feel free to include this entry on your blog, but if you do, I'd appreciate if you'd quote the entire post. |
Sure, as I did the last time.
|eval.google.com was for "eval," i.e. passive evaluation. |
Please explain. I saw in Eval several duo-list based on the same search terms. Most duo-lists show a different order of answers then the other list. The raters were asked to choose which answers were the best. If this is not filtering, what is it then? I have many other examples.
|That's not accurate: I don't have reservations about you personally, Henk |
Good. Changed "I have serious reservations about Henk van Ess" to "I have serious reservations".
|I'd appreciate it if you would stop posting these documents. |
Got a load of unpublished documents with (C) in it. The one I published hadn't any (C) in it. Perhaps the agents removed them before they send them to me. The NDA Reminder on the screenshot is just that: a reminder, not the actual disclosure. I wanted to show this to improve authenticity. Already now some SEO's say the story is fake. The same reason why I mentioned one of your employees: to improve credibility. Now you confirmed the story by this post, the name is not necessary anymore. I changed it to:
The document is written by an employee of Google, as confirmed by Google Guy.
Don't worry, told Search Engine Watch that this was the last post anyway. To avoid the thought that I do anything you say, I will publish one more secret document - the one that reveals your true identity. Sorry Larry, it had to come out. Just kidding :)
Google are between a rock and a hard place.
There is no doubt that they believe in information and want to free it up.
They also make a large amount of money - more than $5 per hour each ;-) but that is a by-product.
Problem is: they are in a position of huge dominance - perhaps greater than they expected - and I'm sure they are acutely aware of this.
So maybe they could let the small sites through?
(I feel like the mother of a missing son appealing to a dictator here :)
In short: don't f*ck around with world culture too much my billionaire friends :-)
|If the material was copyrighted, they really would have told me.. |
You are supposedly a reporter, yet you do not know ever rudimentary copyright law?
Every work of sufficient length, that has not passed into the public domain, is copyrighted.
As a reporter, you would have some rights to use that under your own country's version of Fair Use. But you have to have a need to use it, and it still might count as infringement.
|who was just doing what a good reporter can be expected to do unless releasing a leaked document goes against the public interest or is likely to hurt someone. |
A good editor would make that good reporter run that information passed corporate legal.
It *is* Trade Secret information. If there is no legitimate news reason, and the journalist had reason to believe that it was still trade secret, they can be held liable for damages.
|Technically, it's Google's fault for not protecting the info better. I know the student screwed them, and they should take that up with the leaker. |
Trade Secret status does not automatically disappear if something is published, if the company was taking reasonable steps to protect the secret. An NDA with the leaker counts as reasonable.
I could take an Intel Red Book to the NY Times. If it contained information about an important bug in a processor, they might use that information because it is "news". But if a reporter asked counsel about it, he would be told to only read the pertinant page, don't go fishing. (of course, we all know that the report would read the whole thing, and not understand a bit of it)
|if this was no news to you, it was to me and I guess many of the "dumber" WebmasterWorld members. I find this very interesting. Next time you know of these things in adavance, feel free to post about it in detail; share it with us. |
I'm certain that I read about them multi-lingual human evaluators right here on WW at least a year ago.
It would also make a lot of sense if you go back and consider that google was working on a browser that was designed to highlight certain spam techniques such as hidden text.
These evaluators are not going to be doing a view source on every page and trying to figure it out.
I find it interesting that the people that aren't bothered by this all seem to use the term "quality control".
We all know that google engineers check a bunch of search terms when they change the algo, why would it seem strange for them to have a large QC department?
A bunch of college students plugging away at a bunch of search terms sounds like just about ever software QC deparment that I have ever had checking my software.
They dig deep, doing the repetitive work, and when they find something wrong, they report it to someone that can do something about it.
As QC it is no big deal, and no one should be surprised that it exists.
On the other hand, if they are actually able to influence things without oversight, then there is a real problem with the model.
The reports on the blog count as interesting, but we don't all have an automatic right to that information just because we are interested in it.
Let's not get too silly about copyright! It starts to look like smoke and mirrors.
(My observations are copyright free and for good reason: they're just an opinion with no intellectual content.)
| This 201 message thread spans 7 pages: < < 201 ( 1 2  4 5 6 7 ) > > |