|Do they grab say the top 20 websites in the SERP for all say...search terms over 50k searches a month and let the EWOQ team classify these sites? |
I'm sure that your description is one kind of project that is created and assigned to human quality raters. It's very much the kind of thing that the 2006 patent described. Quality raters are assessing the SERP itself.
In that regard, I think they may often be evaluating a proposed algorithm tweak in a kind of experimental mode - before it ever gets even a low level live test. Somewhere recently I read that something like 5,000 changes get proposed every year, only 500 of which actually get pushed into production.
I think EWOQ is much more than just a way to test algorithms or to catch spammers, but rather an extensive classification system that features prominently in the SERP's.
If this was just to catch spammers...then why have so many relevancy levels and this elite 'vital rank' they hand out they? Why cross reference these sites with specific queries?
If it is just to test the formula of the month...then why the extensive instructions on how to spot and label adult sites/scrapers/thin affiliates/hidden text/etc..? This sounds like stuff that goes on your permanent record. Plus it makes sense that some websites get this 'brand'/'vital'/'ownership' claim for certain keywords based on these manual reviews (such as the celebrity examples the guide gives).
Did some more searching on this program and found some interesting stuff... Apparently google doesn't hire these raters directly but uses a third party companies (like Lionbridge, Leapforce and Butler Hill) to do recruitment (apparently mostly targeted toward stay-at-home moms). The raters apparently do ad quality ratings and web quality ratings using a firefox plugin called 'EWOQ Mobile User Agent Switcher'. Apparently some websites have reported a telltale 'https://www.google.com/evaluation/search/rating/task-edit?task=#*$!#*$!' in their weblogs which may indicate they've been reviewed.
Perhaps in google's eyes, these stay-at-home moms are the 'ewoks' that will take down the imperial at-st spam walkers... Although their results are cross checked with each other, I doubt they're doing a good job. Many of these reviewers don't seem very technical and frequently get fired after poor performance reviews. Could explain why Cutts got so mock flustered during one of those google round table meetings as he described how difficult it was for reviewers to spot spam sites that were quite obvious to him.
I actually don't think the web is too big for a project like this. Yeah, there are probably upwards of 200 billion websites in the world, but actual relevant, local websites the number will be a lot smaller and google would only really need to evaluate some of these sites at once. There are too many search queries too check (especially long tail ones), but we know google keeps track of how many times each phrase has been queried, so it would be easy for them to say only check into queries that breach 50k a month (which would elimnate the vast majority of queries and allow google to find the types of queries most targetted by spammers). Google could also use an internal audit mechanism to flag suspect sites for review, but it wouldn't surprise me if 99% of sites say page PR3+ have been reviewed by the EWOQ program.
This is precisely how I feel about what's happening and the direction Google is going. And I mean EXACTLY. I see Panda as the manual review algo. You can't polish a turd unless you exist in the bowels of Google keyword searches. Important or trending searches? Forget it.
People are overlooking what's really at work now. There are three main manual review elements. First, people. Google is hiring right? Thousands of eyes with clip boards in hand. Second, Chrome browser. Telling me that isn't a jackpot of data? You bet it is. I can't confirm this however. And third? Google+. Add up all three, throw those into an algo and you have Panda.
You will, or at least I won't bother trying or think that I can gain or maintain organic traffic on any keywords that actually matter. Sure it could change, but why invest in this insane stock market.
I will tell you what qualifies a term. Importance. Trending. Big money. Medium money. The key in my opinion? If you start raking in the money, guess what? You better be good. Your site better stand up. If you are a high school team, you're about to get thrown up against a major league team. If it's too good to be true? Get ready, it's coming.
Those are my thoughts. If it's anything keyword wise that's hitting headlines etc, your review is coming. It's only been the past couple days of my enlightenment. It explains a lot to me on what I see happening. SEO a human review? You've got to be kidding me. Why do you think so many people are down in the dumps?
The lasting question I really have is whether it's site vs site or if it's page vs page. In my gut I feel like it's site vs site. It's the reason those tech blogs are now overrun with the "major league" CNET, Engadget, etc. Your specialty site? Yeah, sure you have a chance in that game...
I'm just putting this out there. But is it not possible that Google has decided to review each of its Adsense publishers? Pretty easy to do. How about getting an affiliate ID and simply finding all the sites with that ID and evaluating? Afterall, it's safe to say Google doesn't like people running multiple sites and Cutts has said you can't have 50 good quality sites.
|Afterall, it's safe to say Google doesn't like people running multiple sites |
Its only "safe" if you think it is "safe" to make wildly speculative, unfounded statements..
Cutts is on record saying to the effect that no one person can run 50 (around there) quality websites. If that's his feeling, and he's the QA guy, I would say it's not something you would want to be open about.
Multiple.. means more than one ..like two ..or seven ..or 5 ..or 13 or 27 ..all of which are considerably less than 50..
Many here run multiple sites ..have done so for years ..have had no problems ..neither before nor during Panda runs..and will continue to have no problems just because they have multiple sites..
Beware of inaccurately extrapolating misunderstood commentary molehills into mountains of FUD..
Having "multiple sites" ( and all in different "niches" ) means you're safer from Panda or "whatever" ..because if one is hit ..you have the others ..providing you didn't interlink them to death and use all the same templates etc..if you did ( dumb !) then if one falls it will drag the others down to a greater or lesser degree..
@Leosghost, it's all speculation. Google doesn't hand out booklets to webmasters about what not to do or what the tipping point is for penalties. Sure, quality guidelines is what they offer.
The thread is about what qualifies to initiate a human review. Google didn't tell us, so it's speculation. People post here and speak with their own knowledge and experience regarding Google.
So regarding this thread, it make sense to me that a flag might be sent up regarding many websites using the same account. If one is spam or thin, wouldn't it be within reason to assume that other from the same account might be?
But to anyone reading, understand one thing. I realize this is slightly off topic, but if you are dabbling with many multiple sites, be fully aware of the chemicals you're mixing. It's fragile. You can never assume that you are outsmarting Google by being sneaky about it. The fact is, if you push the envelope too hard, you will wake up with you sites, yes sites, wiped off the face of the internet. Ask yourself then, was it worth it? Take my word for it. Then moving forward you must ask yourself if Google has a record of your history and looks at you differently from that point forward.
This thread is specifically about the EWOQ users - the hired hands that Google outsources to use for editorial review of various kinds. Those people are not involved in the process that finds connections between the many sites that one person might own. That's for the full time staff.
I recommend working through the patent I linked to for more insight into how the process was originally set up in 2006.
It's a shame the original source had to take it down three days ago. Anyone has a still-working link to the doc?
I would love to learn a bit more about those "footprints". As I started reading about this on Jennifer's site (potpiegirl.com, the original source), I stumbled upon a post describing how having just one link in the footer of your template can get you banned (“modified for 100k blueprint”) which, if you think about it, sounds like a completely crazy idea. Funny thing though - the post was published two days before I lost about a dozen sites to a manual ban. I didn't have THAT footprint but apparently tripped some other hot wire and would love to learn more about it.
|Those people are not involved in the process that finds connections between the many sites that one person might own. |
They are being asked to find the connection, in certain situations, by using the whois tool.
Yes, I did mention that the fate of websites hit by panda were determined by the likes of those working for lionbridge (guess it was 6 months ago).
Everyone here might know that people were complaining a lot on how the copycats were ranking above them for any search on complete sentences, on their pages. This observation is valid to this date. There are lots of clues there on what triggered this post panda.
It is amazing that Google seem to have advised the EWOQ team to recognize copied content by searching
for an exact sentence from the text on the page. If it appears on one or more sites and if you find PPC ads surrounding the piece of copied content on the page (or site) being evaluated, then it would get automatically classified as "not useful" by this team. The presence of PPC ads seem to be a major hint in determining "usefulness" classification. It is definitely a major factor in the new panda world.Webmasters of eCommerce site have to think twice before deciding to place PPC ads on their sites.
An excellent read and it becomes obvious why you see a lot of foreign junk traffic and why several websites got quality demotions. Your pages either need to be "vital" or "useful" and not just "relevant". But this is determined by the "work from home moms" and the Google machines learn their work. Ridiculous!
Thanks again for the link and yes, the overwhelming dominance of the phrase "PPC ads" (58 instances on 125 pages) in a quality-control document crafted by Google is simply ridiculous. Are the raters supposed to know the difference between a PPC ad, a CPM ad, a simple 300x250 graphic linking to another relevant site that may or may not have been paid for or maybe a form that looks like it fills in a square of that size? I don't even think they have time (or motivation) to analyze what this 300x250 square does - after a few hours sitting in front of that screen it must all be one big blur to them.
Anyhow, I do realize that Google frowns upon any site that's setup "to make money", and they say as much in their raters' guidelines (and we're supposed to assume Google themselves is a non-profit, ha?) but to make their raters negatively respond to any 300x250, 728x90 or 336x280 square in a page layout is a bit too much.
1sript, Though they have used the term "PPC ads", they might have actually meant all types of ads. Even if they didn't mean so, the raters have no clue to understand it otherwise.
In any case, PPC ads might be a big "no no" for eCommerce and affiliate business models as the chances of getting qualified for a human review is high.
Would all those who kept telling me since February of 2011, that ads, and ads to content ratio, and and PPC ads, and ad placement have nothing to do or are "definitely not / can't be major factor" with how Google raters, and therefore Panda, rates your pages.. care to have a rethink about their insistence that I was "wrong about ads and their importance in Panda"now ..? ..
[edited by: Leosghost at 4:31 pm (utc) on Oct 21, 2011]
@indyank: I totally understand that they actually meant "ads". Period. But whoever prepared this document had gone through the trouble of typing "PPC ads" 58 times instead. Sigmund Freud would have a field day with that!
You might have original and useful content but if it is copied by others and you have ads on your pages, you're toast. Your pages don't stand a chance of getting classified as "useful" but stand a great chance of being qualified as "spam".
Tedster, I know you being able to change my subject title is your prerogative as an admin, but in this case 'human review' is a bit broad and misleading as a topic subject. Could you by chance change this back to 'EWOQ Review'?
For those looking for this document, it's been officially pulled the and the original whistle blowers have pulled their links at google's request.
You can still download this using a third party download service. This one worked for me:
If you wait like a minute or so...you don't have to pay for this.
Is anyone paying for this? It is all over the web.
|Could you by chance change this back to 'EWOQ Review'? |
I do see the ambiguity that my title edit created. I was mostly concerned that not many people would know what EWOQ review means and hence your thread wouldn't get as much readership. Once people read your opening post, things seem better clarified.
I changed the title again to read "human EWOQ review."
For me the fact that Google - a MULTI BILLION POUND BUSINESS - outsource the data collection that their algorithm is in a large part based on to untrained people working from home is just completely insane!
I can't work out if it's incredibly clever and cynical or just standard short-sighted huge corporation tight-fistedness. Either way, it sure does explain a lot.
Forget about the work from home moms. This guideline document and the advice given there to detect spam stinks. No wonder we are seeing so many hits.
If your page content is copied by others, it no longer enjoys "Original" status. PPC ads or affiliate links on such pages will automatically induce the "work from home moms" to mark them as "Spam". Reading 4.1.7 along with how they are advised to detect copied content makes this very clear.
The very first technique to recognize "Thin Affiliate" in 5.1.1 is where you are not able to make a purchase on the affiliate webpage. (How many affiliates offer purchasing functionality on their own domains?)
If you use CJ links, you are more likely to be classified as using sneaky redirects and hence "Spam".
I dont know for sure as i dont do that quality rating for search, however I do quality rating for google local results, so my guess is when enough raters tell a site is not a suitable result it gets flagged and reviewed by a rater from mountain view"....
The initial google local query that i get to rate seems a totally random pick, (as some are completely rubbish)...
viggen, the guideline document has been so framed that any rater using this as a check list will act according to what it says. I am sure that most of these raters would follow it as it is.
To recognize copied content (4.1.7), the doc. guides the raters to search for exact text by putting quotation marks around it. If the raters find multiple sources for it, they either are going to assume that the first listed page is the originator which might not always be true or just go on to step 2 to confirm the page being rated as not useful or "Spam", if it had PPC ads (Most content based sites will be having PPC ads).
This also holds true for the guideline to recognize thin affiliates under 5.1.1
I have always thought that the most important thing Google can do, is to keep track of original content.
This thread is really scary.
Multiple sites with the same content?
And yours is running an ad top center? Exactly as AdSense wants you to do?
This could profoundly affect AdSense, as "originators" are weeded out, and replaced by scrapers.
Bye-bye quality. Hello cesspool.
Very scary indeed.
The better the original content, the more it is scraped, and the harder it is for Google, using their moms, to sort it all out. What if someone copies 100%, and places NO AD on the page. Do THEY get the top slot?
Schmidt's comment about the cesspool may have been correct. The question is, what is he doing about it, besides using flawed techniques to establish and MAINTAIN canonical content? Punting legitimate AdSense users who contribute strongly to Google's bottom line is hardly the way to go.
|I have always thought that the most important thing Google can do, is to keep track of original content. |
Google was in fact doing it much better until they started to deal with this through these new "Quality" algorithms.
From here on, I'm thinking aloud on their approach.
I feel that someone has to tell Amit (who I suppose is the Head of quality within Google), this approach of giving a guideline and asking moms working for third party vendors to rate results won't work. They can do far better than that.
1) If you want your machines to learn your users, model the machines on those users without giving them a guideline on what you think serves "quality".
2) To do it, make a fresh start and ask your users to rate results they click through for their queries.
3) A site-wide "block pages from this site" isn't useful either.Make it clear through your terms with webmasters that occasionally you would collect user satisfaction information for pages on their sites by framing them.
4) Occasionally frame the results for random users during unexpected times and locations (countries). Have a link on top of the framed result that would lead them to a rating page where they can rate it as "vital", "useful", "Relevant", "Spam", etc. Don't give them examples or make crazy suggestions like those relating to PPC ads and affiliate sites in the form of a guideline. A detailed guideline would never give you unbiased user opinion.
5) To make it interesting for users, give them points that can be accumulated in their Google accounts and redeemed for USEFUL gifts. :)
6) Use this data from the real users to model your machines and improve your algorithms.
Google might have different reasons for following the current approach and the above suggestions might or might not work. But if they really care about their users and willing to listen to suggestions, this could be suggested to Amit in Pubcon. Let me know if someone attending Pubcon will raise this before him. Pubcon is a great chance to grill or quiz the Google quality HEAD.
It is entirely upto google to make a choice but this approach might be far more useful and transparent.
The Quality team might be fairly new within Google but they need better approaches and direction.
I am just thinking aloud here and I might even be completely wrong but I guess this might work much better than what they do. Whoever released that confidential document in public domain must have been so upset with what they are doing. Most of the guidelines in that doc. doesn't seem to make any sense for quality.
Interesting thread, as I wasn't aware that doc got leaked, and what an interesting read that is!
For a human to be reviewIng in such a manner, could quite easily label websites based on it's appearance. And to label websites that 'appear' to exist to make money is pretty broad, as pretty much all websites either directly or indirectly exist to make at least some money.
Trying to find the document has also raised something else I'm now certain of, that google has people actively trying to remove this document from the Internet. Do these people have a list of other things to remove? Do the government pay google to remove stuff they don't want published? Does google then have a department of which it's decides what stuff should be removed or which request are denied? Mmmm..
To answer the original question Mr Smith, I personally would estimate it to be the first 5 websites for terms from around 5k per month.
|And to label websites that 'appear' to exist to make money is pretty broad, as pretty much all websites either directly or indirectly exist to make at least some money. |
I don't think it's that hard to differentiate between a web site created to serve a purpose which is also monetized in some ways and a web site created with the only goal of making money.
|I don't think it's that hard to differentiate between a web site created to serve a purpose which is also monetized in some ways and a web site created with the only goal of making money. |
I don't think it matters as long as the end user is happy. If its users is what is important for Google, let the users give an unbiased opinion, without any guideline, on whether a site existing for a purpose or for making money is USEFUL for them. There is even a probability that a site existing only for making money can satisfy users.
[edited by: indyank at 9:37 am (utc) on Oct 23, 2011]
indyank, I'm not saying we can tell the intentions behind the creation of a good site, it might just be for money also, but the good site serves a purpose and try to satisfy users before it's trying to make money, while a spammy site will be frustrating to users in the hope of making easy money. Just by the look of it, the layout of the ads, the quality of content (if it's not stolen from a good site), it's usually easy to say what's spammy or not.
| This 50 message thread spans 2 pages: 50 (  2 ) > > |