| This 201 message thread spans 7 pages: < < 201 ( 1 2 3 4  6 7 ) > > || |
|eval.google.com - Google's Secret Evaluation Lab..|
"Rater Hub Google" Rumours?
On Apr 19, 2003, some members had spotted referrals from
followed by a question number and couple of different email addresses. You can read earlier threads here,
According to some [slashdot.org] sites [searchbistro.com],
|It's one of the best kept secrets of Google. It's a mystery on Webmasterworld. Also in Europe (France) they don't know what to expect from that odd URL [eval.google.com....] Click it and you get ...nothing. The site reveals itself only if you have the proper login and if you use a network known by Google. Residues of Eval.google are found on the web, but the full content of the mystery site has never been published before. Here it is: the real story about Eval.Google. They use... humans! |
The site claims it is some kind of the secret google evaluation lab!
GG's concerns are understandable given that this creates problems for Google and that the "sinister" nature of this information has been exaggerated.
What has NOT been exaggerated is that Google has almost certainly misled the community about how important human decision making is to the process of determining spam.
I think that human intervention makes a lot of sense, but I've been frustrated yet again after taking Google support notes at face value - only to find that they are misleading me about what's up with the ranking process. (I'm not saying GG has misled anybody - I'm talking the emails that are sent by Google support that clearly imply that humans are NOT evaluating sites).
|You exposed me! Guess you knew about the state of my first site. |
Interesting reaction to my post, makes me think I struck a cord.
|I can appreciate the irritation that GG has about this info being made public and I think the fact that he is, or appears, so irritated also speaks for itself. |
Should I apply this same rule to voelspriet's reactions? I'm actually surprised, what with all the teasing done to GoogleGuy.
If I'm so off-base then you might want to enlighten us all as to your motives. (And once again, for the record, I'm not taking GG's side for browny points.) I'm just trying to figure out why a journalist would have posted a trade secret. You're not selling Google short are you? Perhaps you're just a fan of Y!
>> Maybe I just misunderstand your point with this post.
the point was: just because something is "secret" (incriminating, or just embarrasing) doesn't mean we shouldn't see it, or that is illegal to print. All of the above things I mentioned were "secret".
I also included the "and thousands of other (relevant or not) internal memos from other companies" to make it clear that it doesn't have to be an illegal act. It can be a movie star's demands for the set, an embarrasing cafeteria policy for Acme Inc., and so on.
It was NOT meant to say that voelspriet is Bob Woodward, or that GoogleGuy is Nixon. ;)
[edited by: walkman at 2:08 am (utc) on June 7, 2005]
"I'm just trying to figure out why a journalist would have posted a trade secret"
For Pete's sake, it's not a trade secret. It's a simple white paper on how to detect spam. It has nothing to do with the Google algo. A jounalist isn't bound by any non-disclosure agreement an employee may have signed with their former employer. And since he's not revealing his source, and there's not a court in the U.S. that would make him, there's really nothing Google can do about it. I find it amazing that with the supposed thousands of starving college kids that have had access to this that this is the first time this has come out.
It seems to me that some of the most interesting threads always seem to degenerate into an "ethics" debate. If you would like to start a thread about the ethics of online journalism and how it relates to search, feel free to start another thread.
This thread is about the revelations of the workings of eval.google.com and what we as webmasters can learn from it. Period.
OK pmac -
GG if you are looking on _please_ answer MikeNoName's question concerning "reevaluation of sites" that have been left out or downgraded. You've talked about "reinclusion" but never about a "huge downrank" system which would seem to address much of the confusion in this and many other threads about sites that still appear but go from first to last for many queries.
Good question. Is the domain dead?
I just finished reading G' document re "Spam Recognition Guide for Raters".
Seems like the chickens have come to roost. Our site ranking now depends on some collage kids reviewing our pages (and may or may not clicking the right ranking button).
They are being provided with a long list detailing spam technics that may or may not be spam at all.
The list is somewhat overwhelming to young people with limited SEO or web design knowledge/experience.
Clicking the wrong consul ranking button is a very real issue and can send your site south.
Bye bye Google algo. Hello biased reviewers opinions. That's what i read into it all.
I would just like to throw in a few more suspect pieces of circumstantial "evidence", presuming the document is indeed valid (still not entirely convinced of this) FOR the case that this is a MANUAL evaluation and degrade vs. someone elses claims that it is simply used as INPUT for the automated algorithm. I suspect that some basic level of the eval results probably ARE used to form an algorithm for clearing out the MOST obviously spammy sites "on sight", but then there are the rest...
1. When I started to suspect, early last week (you can search the back threads of the Bourbon 4 update and other threads for my timely remarks) that Adsense might have something to do with it. I took a cross-set of pages which had been dumped in the update and removed ALL references to Adsense (as well as all links external to the site - coincidently, TWO of the points mentioned in the spamminess eval definition). This is the only significant changes made in the last 6 months to many of them other than updating static info. All these pages were being Googlebotted at least once a day. I even gave them an extra couple days making sure the caches reflected the newest changes. If it was indeed STRICTLY an algorithm decision, this would imply that the algorithm ONLY decided whether they were spammy and deserving of the degrade. Nothing happened! They stayed right where they had been dumped to. If the algo was deciding it should have re-instated them in their old positions upon re-spidering and re-indexing (how many times have the serps been reorderd over the last week?). Therefore, it would appear they ARE manually degraded and once degraded, stay there until MANUALLY reversed. Likewise, I suspect the earlier allegra degrades were a weaker version of the algo and earlier batch of manual blacklisting, and since almost none of those tanked sites have come back, I suspect those tanked this time are in for the same long haul for the same reason.
2. There was a report of some certain site which was alleged by the document specifically as "whitelisted", first dropping in the ranks, and then, early on in the update discussion as being RE-INSTATED very close to their prior SERPs. My purely circumstantial guess is that this is when the algo-dumped (i.e. those dumped by the generic algorithm as being spammy by the definition internal to the base algorithm), but previously MANUALLY whitelisted sites were re-incorporated/re-authoritated into the database.
BTW, I concur with someone elses statement (and this is my reason for wondering about the validity of the document) that it is a surprise, with the number of students supposedly having worked on this system, that this has not come out much sooner. Any employer knows, other than the gov. of course (who only under the threat of federally treasonable charges extracts obedience to secrecy clauses), that students in general, of all people, have the least common sense and loyalty, and the least to lose and the most to gain by revealing these kind of secrets. I'm curious, if it's indeed real, if it was revealed for extra school credit, because they were REALLY P'd off at being recently fired, or for having THEIR website tanked. :-)
Actually, it certainly does qualify as a Trade Secret under the definition in the Uniform Trade Secrets Act
|(4) "Trade secret" means information, including a formula, pattern, compilation, program, device, method, technique, or process, that: |
(i) derives independent economic value, actual or potential, from not being generally known to, and not being readily ascertainable by proper means by, other persons who can obtain economic value from its disclosure or use, and
(ii) is the subject of efforts that are reasonable under the circumstances to maintain its secrecy.
There is certainly economic value in knowing exactly what a search engine considers SPAM.
Even if there is a lot of accurate conjecture, that is not the same as knowing exactly what it is.
As for whether it is only the person that signed the NDA that is violating the law,:
|(2) "Misappropriation" means: |
(i) acquisition of a trade secret of another by a person who knows or has reason to know that the trade secret was acquired by improper means; or
(ii) disclosure or use of a trade secret of another without express or implied consent by a person who
(A) used improper means to acquire knowledge of the trade secret; or
(B) at the time of disclosure or use, knew or had reason to know that his knowledge of the trade secret was
(I) derived from or through a person who had utilized improper means to acquire it;
(III) derived from or through a person who owed a duty to the person seeking relief to maintain its secrecy or limit its use;
And as far as freedom of the press trumping trade secret, the judge in the Apple case last month seemed to be of the opinion that freedom of the press was no protection from violating the law.
Of course, this is all US law, not Dutch, but it certainly shows the possibility that courts just may not agree with what people posting on WW may "know".
Whatever row you are having with someone (which I don't understand; or indeed care about.)
This thread does suggest that Google has a human input.
It may not be direct human input, but it exists, and
has always been denied.
Has Google been untruthful about this in the past?
Julian, ease up on this guy a bit.
Here is what i'm hearing (in sum):
Google uses humans to test its algorithm.
Hence the difference between active and passive implications to SERPS.
The bombshell would be that human editors could directly affect the placement of one website in the SERPS relative to another.
What we see here, however, is that Google is using glorified focus groups to figure out how they can improve their algorithm to improve SERPS.
How else could Google get this information? Math can do a lot of things, but it cannot provide subjective, qualitative feedback. No reasonable person could ever doubt that Google employs humans to test its algo!
Google Guy must be pulling his hair out because such a simple concept is being lost in a bunch of drama and controversy.
Julian, you also had a couple posts about one of your new sites being picked up by Google, and then dropped. You related that to the Sandbox theory. I think your problem has more to do with the fact that Google has 50+ data centers and thousands of database servers that must push content through to the SERPS, and sometimes results are inconsistent (at least in the short-term). That's randomness for you... why two sites who are similarly situated will be treated differently in the short-term.
...ok, so clearly there is a lot to digest with this thread. ;)
Any coments on whether this could develop into a publication standard? What about Google's take on secondary search results?
|One comment and a couple question for the group: |
First, it seems to me that it makes sense that Google would have some human editorial / QA process. No SE technology is perfect. It seems like most of the buzz behind this is in response to seeing the details – the UI, the SPAM guide, etc.
Secondly, speaking of the SPAM guide, I’m curious what other SEMs / SEO’s think of the SPAM guide posted on searchbistro, (presumably a Google doc.) For the sake of argument, let’s presume that it’s genuine and these guidelines represent Google’s view of SE SPAM. Most, if not all of it is not really news per se. I’m curious if anyone on SEW board thinks this may feed back into the SEO world in the form of a publication standard or ‘legitimized’ SEO technique?
Lastly, I thought it was interesting that they addressed domain squatters and secondary search results / PPC pages in this guide as well. It seems like these pages become more popular as companies start to gobble up expired domains. Does anyone think that the publication of this guideline may influence this industry in any significant way? From a QA standpoint, does anyone think that some of the criteria they outline in their “secondary search results” section could be used to identify certain blogger’s pages as “offensive” if they depend on too many RSS feeds (non-unique content) and/or PPC ads?
Thanks for the feedback.
Not to disagree with BigDave, whom I have the utmost respect for after reading his posts for the past several months, but it depends on your interpretation of the term "economic value".
Economic value generally refers to something a competitor could use for fiscal gain. Not a bunch of webmasters trying to make a buck on the internet. There's no value to Y or MSN in that paper that I can see. There was much more useful information in the papers that Google itself disclosed several months back.
The secret recipe to Kentucky Fried Chicken is a trade secret. An employee manual from KFC telling employees they should say good evening when someone walks through the door is not.
I'm not a lawyer and I don't "know" nearly as much as most of the webmasters on here, but i do know something about busines.
Anyway, the mod asked not to go down this path so I'm done.
Let me understand this:
Symptom: Site/pages lose rankings during Bourbon update.
Hypothesis: Google is punishing sites for using AdSense.
Action: Adsense removed from pages that are no longer ranking.
Result: Pages continue to rank poorly.
Amended Hypothesis: Google is *manually* punishing sites for using AdSense.
I'm sorry, but it just doesn't make sense. Just going out on a limb, here - but maybe it's *not* AdSense that is making the pages rank poorly?
I think you're barking up the wrong tree - how could some sites in the same SERP get hit by Bourbon, while others remain unscathed, when both use AdSense?
With the fake registrar business (not even attempting to uphold the duties of a legit registrar), gmail, site feed, forever cookies. etc... G$ knows all about you, so what if we know a little bit about them. They get their data by hook and crook, thats no secret either. I wonder how much NDA stuff is floating around the google cache with copyright notices.. wah wah. The bully morphs into victim and people go crybaby over it.
I'm done discussing this, because as the Moderator reminded us, this thread is not about the ethics. The news is how Google uses the network of raters. We've seen Google job postings, here and there, some speculations etc., but this is much more detailed info.
If the facts are not in dispute, you're right, but something tells me that if it ever went before a judge, everything would be disputed, from damages suffered, to free speech (since Google says results are automated--we all have gotten those template e-mails--and one can wonder what impact human reviewers play) etc. The main thing on the "trade secret" seems to be the value added features on pages to differentiate from spam. I remember reading last week about them, right here on WebmasterWorld. Some trade secret...
Also, on the Apple lawsuit (which is on appeal), you forgot to mention that Apple just wants the blogger to reveal his source; no damages or anything like that.
I'm out :)
> I think you're barking up the wrong tree - how could some sites in the same SERP get hit by Bourbon, while others remain unscathed, when both use AdSense?
Please go back and read, in particular, earlier in this thread (including the links on the first page which includes the posted Google document), ALL of the bourbon update 1-4 thread from the beginning, as well as the threads about reasons webmasters were hit, etc. (you could also go into my profile and see the places I posted recently for a road map) and do your due dilligence in putting together the pieces. It's as clear as day in retrospect. I'll outline it EXTREMELY briefly for everyone to catch up.
1. The "webmasters hit" thread took a poll and determined almost everyone hit had adsense (not all). Nor was everyone with adsense, as you point out, hit. So there MUST be another factor. I full-heartedly agree.
2. This other factor was pleasantly provided by the eval doc which says (read it closely) spamminess is determined by a number of factors, including (I hate to say I told you so, but I DID guess and post about most of these about a week ago in the bourbon thread) lots of external links accompanied by short descriptions "copied" on other sites, affiliate links with little additional original content AND most-conspicuously on the border between pages 5 and 6 of the eval doc it says the mitigating factor, if all else is present is the obvious presence of PPC's! Adsense is considered a PPC, no? If there is no PPC then likely it's NOT a spam site afterall.
Do I need to make it any clearer? According to these rules you could have two sites with the EXACT SAME CONTENT WITH AND WITHOUT ADSENSE, and the adsense one would be considered spam while the NON-Adsense one would NOT BE. Even if the "copied" descriptions happened to be copied FROM YOUR site by scraper sites, (or if they happen to be impressive looking, handwritten ad copy and the rater happened to not bother checking if they were even copied as in our case I suspect)! Please go and read the doc to see I'm not making this up.
There is your missing link to why most sites with Adsense are fine, but nearly ALL the sites dumped (as determined in the webmasters hit thread) DO HAVE ADSENSE or another PPC. So if you're on the borderline, at least according to their faulty algorithm, eliminating Adsense will probably convince them you are NOT SPAM.
My guess is Allegra and Bourbon was the first algorithms intended to automate this eval process. Allegra failed a little. Bourbon failed more.
I'm not claiming the process is necessarily a bad one either. It's rather concientious of them and a brute force attempt at improving their database. I'm only saying it probably should not have been automated (if in fact it WAS), their raters needed a lot more training and they DEFINITELY need a much faster and efficient re-evaluation process.
|This other factor was pleasantly provided by the eval doc which says (read it closely) spamminess is determined by a number of factors, including... if all else is present is the obvious presence of PPC's! Adsense is considered a PPC, no? |
You are grossly misreading that document and twisting it to fit your personal theory. Here is why your theory is incorrect.
This is what the document actually says:
|Secondary Search Results / PPC |
We want to mark as Offensive the pages that are set up for the purposes of collecting pay-per-click revenue without providing much content of their own. You will see such cases most frequently in conjunction with “search results” feeds.
It's not the fact that the page is showing PPC, it's the fact that they are showing the PPC ads within the context of secondary search results with NO REAL CONTENT apart from those feeds.
The defining characteristic of spam, according to the document, is that it has no real content.
Clearly it is not the fact of having AdSense in itself that would trigger an evaluator to flag a site, but the fact that there is no real content beyond advertising.
This fits perfectly within Google's historical goal of showing sites with content that is useful and what that document shows represents no change in policy at Google. In no way does it demonstrate that a site showing AdSense will be demoted, penalized, tagged as spam, or whatever you want to call it.
I am getting more and more frustrated reading this thread as everyone seems to be in agreement that this is a passive system when in fact it isn't passive its active. It's simple these human editors are being used to " Tank " or ban websites not by page! BUT BY ENTIRE SITES! I would of been more than happy to of had a few pages banned on personal websites but to have them completely removed?
This never happens between PR updates unless a site is penalized and if a site doesn't have any " Black Hat " Seo done to them on site these " RATERS " are tanking these sites or creating the result of this on their personal ethics and opinions when on site black hat methods are not used they can submit urls of pages with spam going to these sites and then someone down the line reviews these urls and desides that only the owner of the impending black listed site must of created it on a 3rd parties website.
Its simply the power to delete sites outside their algorythm.
All it now takes for a competitor to get your site banned is a good google bowl spam of say your site map etc to make a post look exceptionally spammy and this human factor now comes into play.
This is even more destructive when egos are at play and you insult one of them directly. For me its miller time right now but over all I am completely discussed with google right now seeing their worst side yet.
Unfortunately for me the human face of google which too me is very ugly.
- trust me peops this is not passive its full active
- its not just putting sites in a whitelist its plonking sites completely.
- its soon to be front page
No it is YOU who are not reading FAR ENOUGH. Continue down that same page you just quoted (pg 5 - Notice how the first paragraph ends with "PLEASE READ THE WHOLE SECTION?", I guess you did not) to the very end and ONTO THE NEXT PAGE (PG 6) EXACTLY WHERE I SPECIFIED. It says: "As you see, pages with the same content may be assigned vastly different ratings based on the absence or presence of a ppc program."
Need I say more?
Am _I_ the only one who actually READ this thing?
from what I'm gauging it IS on a page by page basis, but only within certain domains. In other words, once they find a domain to pick on, it seems they go through it on a page by page basis, as I think it mentions in the doc as well. I don't know, maybe if you get enough bad pages they eventually tank the whole domain, but not in our case. We have a couple pages on our tanked domain, which have retained #1 positions since they were full of content and did not have adsense ads. While ALL the pages on our entire other domains have been untouched even though they follow the same pattern as the tanked ones (except most don't have adsense) and are even interlinked with them. I'm guessing they either simply haven't gotten to eval them yet, or already decided the domain was harmless, or possibly because the home page is redirected to the tanked domain and they missed it.
[edited by: MikeNoLastName at 7:56 am (utc) on June 7, 2005]
|Am _I_ the only one who actually READ this thing? |
Guess your did your homework :)
Then why didn't they only tank the 3 or 4 pages in eval right now?
This of course is according to deep throat who is actively bloggin about this right now
Not sure what you mean DC?
3 or 4 pages? whose?
I also forgot to mention, on the pages which are still showing up in the #1 positions, they do NOT show up for a search on their full titles, leading me to think there may still be some penalty there, just not as popular a term, or they were so high above their surrounding SERPS that they still managed to rank.
I don't agree that this is not a newsworthy item as GG has tried to suggest. If it's something that everybody knows about why try and keep it secret?
The fact that Google has always made a big deal about its algo being clever enough to rank pages without human review has been proved wrong. They have admitted defeat to spam and brought in a bunch of students on $10 an hour to determine the results in the serps.
This is hugely significant.
I think you do need to read the whole of the available information on the eval system before forming an opinion on it as snippets and posts etc can really mislead.
All you can do is read it and then form your opinion - maybe we should have a simple vote - Do you think this system is PASSIVE or ACTIVE? (active doesn't mean these guys are actually editing SERPS - it means that their scoring could directly affect a sites ranking).
My vote = ACTIVE
My view on this is he was right to leak it. Google might have an NDA, but their beef is with the leaker not the person they leaked to.
I also think they should have that information openly available on their site. A clear statement of what they stand for.
It does seem to have several problems.
Search engines suck, theres no other way to put it. You visit a luxury goods site and see colours designed to stimulate memories, food pictures to water the mouth, cultural references, jazzy styles to set the mood. Poetic text or use of slang? Clean cut typography or antiquated fonts? Google sees, table tags and strings of dictionary words and not much else.
Google is not god, it is a blind-deaf person with a poor grasp of language and no awareness of culture.
It needs a walking stick for many non text sites simply to be able to obtain a few words to even get a hint of what the site is about.
In the document they give several examples of hidden text, but nowhere do they make the distinction as to whether the text is there to aid the search engine or to deceive the searcher.
The Marantz example they gave seems to have been changed to remove the hidden text. So now Google thinks that page is about "Global home page. All Rights Reserved".
What keywords are you thinking from the paragraph I just wrote? I'm thinking "baby bathwater", yet I didn't use those words in the paragraph and Google is unaware that 'throw' and 'threw' fit with that keyphrase.
A meta comments tag is hidden, links to the page are hidden, but are they there to deceive or to explain in terms the search engine can understand? Are they the white stick for the blind search engine?
ODP+PPC, if the site adds value to ODP then how about putting a meaning on that value.
So for example, a medical site with some ODP page that ranks for k1 k2 k3, shouldn't rank for k1 k2 k3, it should rank for medical words + k1 k2 k3.
Likewise if a site is added value to another site, what added value is it?. The visitor goes to that site over the other site for what exactly?
They should have pre-determined sets of keywords for "purchase, sale, ...."
"reviews, studies, guide, research..."
"ideas concepts novel..."
The reviewer could define exactly what the added value is, either by specifying the words themselves, or by selecting from a pre-determined group*.
This afiliate site has value beyond the thing it affiliates to because it adds "...." related usage.
* disclaimer, I have an ODP+PPC site among my set. I'd like to tell the search engines to rank it for (set of words) + ODP, rather than for ODP keywords itself for which it would be spam. It is *not* spam, for "k1 k2 + ODP words", it is the only site on the net for that!
How do you give the blind man his white stick if he refuses to accept white sticks from anyone and prefers to stumble around.
Plenty of people have posted suggestions that Google should review results manually. It comes as no surprise to me that they do just that.
For those that think it's a possible source of corruption - I doubt it. Presumbly, agents have to fill in questionaires on how good the results are overall for certain searches. If they make specific spam reports, presumably the sites are reviewed before being delisted, or whatever.
The only story here is that Google got caught doing something sensible. Many organisations (e.g. governments) consider it wise to keep certain operations secret, but we all know that these operations go on even if we don't have any knowledge of the details.
Of course, if Google wanted to keep this secret, they should not have allowed eval.google.com to appear in logs.
I think that this is a great development and given the environment with the mega html spam that has been brewing for years now that G would be nuts not to do this.
The only people that fear stuff like this is the folks who have something to hide. You guys need to stop freaking out about stuff like this unless of course you have a reason to worry:)
there are a lot of webmasters from your country that run superspam pages with hidden text and all black seo tec and rank top 10 in google,what is your coment about that?
No one screams louder than a SEO when Google optimizers its own site :-)
there are a lot of webmasters from your country that run superspam pages with hidden text and all black seo tec and rank top 10 in google,what is your comment about that?
Hate that. We expose them every month. Here is an <DELETED> of one of this deceiving firms. It's in Dutch, but you will get the picture.
[edited by: engine at 2:25 pm (utc) on June 7, 2005]
[edit reason] TOS [webmasterworld.com] 28 [/edit]
| This 201 message thread spans 7 pages: < < 201 ( 1 2 3 4  6 7 ) > > |