Winners and Losers in New Google Algo - Analysis - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Winners and Losers in New Google Algo - Analysis

martinibuster

8:18 pm on Mar 2, 2011 (gmt 0)

New article in Silicon Valley's San Jose Mercury [mercurynews.com] published a list of winners and losers in Google's new algorithm. A UC Berkeley researcher criticized Google for not going far enough. DemandMedia is still chugging along and they basically had no comment. Those behind the losers WiseGeek and HubPages had plenty to say, primarily that Google went too far. Do WiseGeek and HubPages deserve their demotion?

Here is The Mercury News list of top losers from Google's latest algo change:

wisegeek.com -77%

hubpages.com -87%

yourdictionary.com -74%

associatedcontent.com -93%

shopwiki.com -91%

answerbag.com -91%

fixya.com -80%

Here is the list of winners:

popeater.com 24%

sears.com 20%

britannica.com 18%

ehow.com 15%

linkedin.com 15%

hgtv.com 14%

marthastewart.com 14%

loc.gov 12%

facebook.com 12%

martinibuster

7:01 pm on Mar 3, 2011 (gmt 0)

Does a larger percentage of "How to" titles help in hoodwinking this algo?

I noted the how-to nature yesterday, too. I'm not sure that it's about tricking the algorithm but more about hitting the sweet spot of what a significant amount of Google users are searching for (how do I...).

I'm not sure if the context of the link to a how-to article versus a non-how-to article played a role, just throwing it out there. Perhaps more importantly, many Google users post "how to" questions in the search box and the content thus follows Matt Cutts recent encouragement to "Chase the users" not the algorithm.

Originality of content
I like what swanson posted about content originality. If you haven't read those posts then I encourage you to scroll up and read that members observations. Here's a quote:

1) Duplicate content (within a percentage uniqueness) is being demoted.

2) Thin content is being demoted.

3) unique content is being promoted.

4) A different algo is being used to determine the attribution of the original content.

Swanson's points make sense. While content originality may not be the sole factor, swanson makes a good case for it being a strong factor. Content written by quality writers, even those who are simply researching the topics, will trump content written by muppets for whom English isn't their first language, some of whom may rely more on copying content. Perhaps not coincidentally, stolen content posted by non-English speaking muppets happens at eZine articles. While I can't praise eZine Articles enough for their zero-tolerance policy on plagiarism and how easy it is to get that content removed, plagiarism is still a problem at that site.

[edited by: martinibuster at 7:09 pm (utc) on Mar 3, 2011]

tedster

7:06 pm on Mar 3, 2011 (gmt 0)

The more I look at loser sites, the more I notice on pattern - not the whole story, but possibly a part. That pattern is whether there is useful content ABOVE THE FOLD.

Whether this is an accidental correlation or one of the direct causes of a loss I can't say. But it would make sense to me, because the algorithm is chasing user satisfaction. If no useful content is visible without scrolling, the user is much more likely to be displeased.

<added>The poster child for this observation is daniweb - who is reported to have lost 40% of its traffic. They have tons of unique content, both UGC in the discussion forums and article with comments. But on my laptop with over 700 pixels of browser window height, the headline for the unique content barely shows on the first screen load. It's all header and ads.

CultOfMac, originally hit by the algorithm but now recovered, has a similar pattern to daniweb.

[edited by: tedster at 7:24 pm (utc) on Mar 3, 2011]

limoshawn

7:11 pm on Mar 3, 2011 (gmt 0)

WikiHow is ranking No.1 on Google with articles about �how to eat a banana,� �how to eat a sandwich,� and �how to make toast.

does it really matter? who is searching for "how to eat a banana"? I think that the wikihow article is the perfect page to be displayed for that search.

JohnRoy

7:25 pm on Mar 3, 2011 (gmt 0)

So what does Ehow publish that looks like a positive signal to a machine?

>> Nothing. Plain and simple, it's favoritism through manual placement. No analysis required.

fact or fiction?
- Are you a Google or eHow insider?

JohnRoy

7:28 pm on Mar 3, 2011 (gmt 0)

> I think that the wikihow article is the perfect page to be displayed for that search.

How to go bananas while eating a sandwich.

JohnRoy

7:31 pm on Mar 3, 2011 (gmt 0)

The more I look at loser sites, the more I notice on pattern - not the whole story, but possibly a part.

And maybe only part of the update ?!
isn't it due to be finalized?

tedster

7:37 pm on Mar 3, 2011 (gmt 0)

So what does Ehow publish that looks like a positive signal to a machine?
Nothing. Plain and simple, it's favoritism through manual placement. No analysis required

Unless you were in the room when a deal was made, I think that's a dangerous position to take. It's dangerous because eHow is one of the anomalies in this update, an obvious exception. Unexpected results have a lot to tell us when we put in the effort to do analysis. In fact, analyzing the exceptions that slip through has been one of may favorite ways to learn about Google for many years.

Here's one thing I've noticed about eHow pages. Their content appears above the fold and ads appear in a side column, leaving room for whatever content they have. Contrast that with many false positive sites, where the top of the page is completely dominated by ads and the template's masthead area.

elsewhen

8:34 pm on Mar 3, 2011 (gmt 0)

i do not subscribe to the view that ehow has some kind of behind-the-scenes agreement with google.

sure, ehow would love it, but think about it from google's perspective.

my understanding is that demand media represents less than 1% of google's revenue.

just imagine if someone leaked a hidden search deal between demand and google - the pr damage alone would render this type of deal unworkable from google's side. sure, google might trust its own people to keep tight lipped about it, but do you think they really trust all the people at demand who would have to know about the deal?

google makes the lion's share of its revenue by making the best search product it can. they are working with incredibly difficult problems, and i think they are trying their best... the unfortunate thing is that their best includes inordinate numbers of false positives/negatives.

in my view, ehow was spared because they did not pass whatever thresholds google has set for "quality."

Robert Charlton

8:45 pm on Mar 3, 2011 (gmt 0)

The more I look at loser sites, the more I notice on pattern - not the whole story, but possibly a part. That pattern is whether there is useful content ABOVE THE FOLD.

I concur and had made some similar comments in the MFA discussion... [webmasterworld.com...]

I didn't think that Google was directly going after sites running AdSense. It was going after sites that were keeping users around, and to that extent, I felt how AdSense (and other advertising) was used on sites that had dropped was a likely factor....

I'm also thinking that AdSense and how people build AdSense sites can be an odd component of user engagement, which is an important factor for Google....

...For the queries I've run, I'm seeing the pages now at the top providing much more useful information. Many of the newly surfaced pages still run AdSense, but, on many of these higher quality pages, the styling of the ads is different. The pages which are designed to keep users around have the ads in clearly delineated boxes with tinted backgrounds. Some of the best have the ads physically separated from the discussion.

There are multiple reasons that I'm assuming that user behavior was a factor in this update, but this is one of them. I don't think that Google is directly looking at AdSense styling or placement... but it's likely that the intention of design and of the overall site, to the degree that page design may have affected user involvement, was measured and factored into the new algorithm.

tedster

8:54 pm on Mar 3, 2011 (gmt 0)

I've had the opportunity now to see a number of one week traffic graphs for loser sites. Many of them are bowl-shaped. There's a dramatic drop when the algo first hit, followed by a more gradual climb back up. Several are even approaching their pre-update traffic levels.

netmeg

8:59 pm on Mar 3, 2011 (gmt 0)

The more I look at loser sites, the more I notice on pattern - not the whole story, but possibly a part. That pattern is whether there is useful content ABOVE THE FOLD.

(Glad I moved almost all ads to sidebars this year!)

LostOne

10:13 pm on Mar 3, 2011 (gmt 0)

I don't agree with content above the fold, or it doesn't apply to me. I've always made a point to keep content above the fold, but it does have an adsense block in it. Adsense rep thought it was fine(awesome)...when they had phone invites a few months ago.

nomis5

10:30 pm on Mar 3, 2011 (gmt 0)

Robert Charlton - can you expand on "... much better structured and conceived overall than the losers are". Structured as in code, navigation structure or what? "Conceived overall" ?

Jane_Doe

10:48 pm on Mar 3, 2011 (gmt 0)

Adsense rep thought it was fine(awesome)...when they had phone invites a few months ago.

The problem with the Adsense reps, though they are all well intentioned and try to be helpful, is that they only look at the short term effects of placing ad blocks in certain positions. The fact that understated, more obscure ads may attract more links, likes, bookmarks and return visits isn't their concern.

Robert Charlton

11:12 pm on Mar 3, 2011 (gmt 0)

Robert Charlton - can you expand on "... much better structured and conceived overall than the losers are". Structured as in code, navigation structure or what? "Conceived overall" ?

They generally have well conceived hierarchical nav structures. Targeting and internal linking is prioritized.

There's purposeful coordination among articles, product pages, etc.

Page design as well as nav structure is intended to direct visitors to useful content.

Fribble

11:29 pm on Mar 3, 2011 (gmt 0)

< moved from another location >

Ok guys, I just spend a couple of hours checking out some of the sites listed in the Webmaster Central post here: [google.com...] and have come to share my observations. If you're gonna check out those sites, now is the time because those people are desperate and will probably change things up - one was being modified while I was crawling it.

I looked mostly at on-site things as these seem to be what is triggering the shift. Here's what I observed, and I am an absolute amateur, so please feel free to correct or add to the list here:

The majority of sites listed suffer from:

1. - Code bloat.
Tables, inline styles, WYSIWYG code, Embedded stylesheets, you name it, I found it.

2. - Lots of fluff and light-content pages.
Most of them have a significantly higher number of pages indexed in G, then they actually have unique pages on their site. Most of the sites with sitemaps/xml files had more than double the listed amount of pages indexed. Further investigation revealed these to be fluff pages, archive pages with dupe content, navigation pages, tag pages, etc...

3 - Many of the sites have a high template-to-content ratio.
That is to say that the template and content that is ever-present in their header, sidebars, and footers is several orders of magnitude more than the unique content on the page.

4. - Lots of Ads
About 60% of the sites I looked at had more than 9 ads on deep pages, from more than 4 different advertisers

5. - Tons of dupe content
both internal duplication and syndication. I did see a few that had gotten scraped, but not many.

6 - Lots of Links
About 50% had tons of links on each page. Like 60+

I looked at Daniweb as well, and it appears that many of their pages contain code examples posted by their users. I don't know, but if I were an algorithm, and these code snippets were in use on the web, I might think this site is a scraper, or at the least full of dupe content. They also have lots of useless tag search and tag cloud pages indexed.

If anyone else cares to share their analysis of both the sites that tanked and the sites that flourished, please do so.

[edited by: tedster at 12:06 am (utc) on Mar 4, 2011]

dickbaker

12:42 am on Mar 4, 2011 (gmt 0)

Most of them have a significantly higher number of pages indexed in G, then they actually have unique pages on their site.

I don't understand what you mean by that. Could you please explain a bit further? How could there be more pages indexed with Google than a site actually has?

Also, when you're talking template-to-content, are you talking code-to-content ratio, or something else?

martinibuster

12:51 am on Mar 4, 2011 (gmt 0)

I think he means more pages indexed than they actually have that are unique.

Fribble

1:00 am on Mar 4, 2011 (gmt 0)

From the second example in the G thread:

Google Site: search reveals 3,970 indexed pages

The site's sitemap.xml file contains only 1,104 entries.

Now if you poke around the site: search results you will find a ton of paginated category navigation pages, empty forum category pages, empty localized pages using a subdomain, and a bunch of other stuff that is not helpful or unique (or in the sitemap). These are the 'extra' pages I am referring to.

Re: template-to-content. I didn't go as far as to count words, I just eyeballed the rendered page and I made a note whenever a site had a cluttered template that would cause the minimum page length to be greater than 2000px tall AND where most of the content I sampled didn't even span half of that.

The sites with templates that are bloated with code I grouped under my "code bloat" sites.

Content_ed

1:51 am on Mar 4, 2011 (gmt 0)

@Swanson

I agree with you 100% on duplicate content. It seems they've simply lost the ability to differentiate, so are taking trust away from the original source as well as copies.

@Tedster. Singhal said two interesting things. One was the bit about low quality somewhere on the site hurting the whole site. Since Google's only way to judge good quality content from bad is external links (it would require nonexistent artificial intelligence otherwise), he may be talking about the coding or the site structure. That plays into the other thing he said about Google wanting to promote sites a person would be willing to give their credit card info. Could explain why Amazon, Sears and the like are big winners. Google may be awarding quality for secure shopping carts, things like that.

It would also help explain why so many old mom-n-pop sites got killed, including quite a few who just joined webmaster in recent days to look for answers. Those of us who've been using the same site design software for ten years had no reason to change as long as our visitors were happy. But the new algo may be basing quality on some programming metrics that speak to time and money invested in building the site infrastructure, as opposed to the content.

Lapizuli

2:36 am on Mar 4, 2011 (gmt 0)

I have an article on HubPages that is down about 60% and it has about 50-70 words of introduction above the fold, depending on browser window size.

An eHow article that has gained traffic slightly has about the same number of words above the fold.

A Suite101 article that has lost about 30% of its traffic has about double the number of words above the fold. (100-120 words)

The text for all articles showing above the fold is classic intro text - drawing in the reader and announcing what is in the article.

My most popular personal blog article has about 50 words above the fold and it's the same thing, intro text. It has shown no change in traffic.

tedster

3:58 am on Mar 4, 2011 (gmt 0)

Content_ed, I think Google has other ways to decide quality besides links - in fact, I think links would be a VERY noisy signal for measuring quality.

The exercise that Matt and Amit described [webmasterworld.com] was all about finding other signals that would correlate strongly with quality as perceived by a human user. And I'm sure they ended up with a complex stew, not one or two ingredients.

I think the structure of this Farm algorithm may be a programming tour-de-force. Once Google does some triage on the innocent bystanders, it will show it's nature more clearly.

As far as I can see, we can at least appreciate that they've moved on quite far from the notions we developed based on just a little bit of evidence over the years. Most of what we think of as algo factors come from the historical data revelations of 2005 and the phrase-based-indexing technologies of 2006. Even what we learned about Caffeine didn't reveal algorithm details, just computational architecture.

Richie0x

10:45 am on Mar 4, 2011 (gmt 0)

2011 was all going so well until the algo change.

January traffic up 13.7% on January 2010

February traffic up 2.4% on February 2010

March traffic down 26.4% on March 2010

Thanks Google. Thanks a lot.

Content_ed

1:22 pm on Mar 4, 2011 (gmt 0)

@Tedster

I'll admit I'm old fashioned, I've been using the same HTML editor since 1995. But the thing is, this is the first time we've ever been hit by an algo change. So whether they've been rethinking quality since 2006 doesn't really matter. The question is what they changed since last Thursday.

From what Singhal and Cutts said, it sounds like they tried to change the algorithim to agree with with human testers impression of quality, based on aesthetics and factors like brand recognition (whether or not they realize that). That's a different definition of quality than anything I've ever heard of for search results - it has nothing to do with actually finding what you're looking for.

Google could have found all sorts of signals that correlate highly with human quality perceptions and mean nothing, like the Super Bowl indicator for the stock market.

In terms of traditional search quality, ie, whether or not people find what they are looking for, I can't wrap my mind around an algo where the bottom line doesn't rest on quality incoming links. It simply isn't credible that Google can use computers to determine whether or not something written in a language that computers don't understand or displayed in graphics computers can barely see is quality or not. Those links were always the human judgement.

Are they using toolbar or Analytics data to look at bounce rate or time on page? Those measures mean different things for different types of sites and are easily manipulated by SEOs. All it takes to lower bounce rate is something that looks like the answer is just a click away.

By relying on human perceptions of pages displayed out of context, they are clearly tipping the balance to clever marketing. It means they have strayed from trying to be engineers into trying to be psychologists. What was the Asimov book series with social psychologists being the ruling force behind the galaxy, Foundation?

masterchief

1:37 pm on Mar 4, 2011 (gmt 0)

@tedster, involving human testers to look for answers to "Would you be comfortable giving medicine prescribed by this site to your kids?" IMO is not the way you improve results... Does this means that a site that look more appealing even if the info in it has been scraped (or unaccurate) will get a higher ranking than the original?

econman

3:36 pm on Mar 4, 2011 (gmt 0)

1) Duplicate content (within a percentage uniqueness) is being demoted.

2) Thin content is being demoted.

3) unique content is being promoted.

4) A different algo is being used to determine the attribution of the original content.

Item 3 is not consistent with the stated intention of the update, which is focused on demoting "low quality" content. Absent evidence that Google is attempting to detect and promote "high quality" it's probably best to assume the update is purely focused on scoring and demoting "low quality."

Given this assumption, if certain content moves higher in the SERPs, this is a secondary effect.

For instance, if eHow's pages have moved up, perhaps that's because those pages are above the minimum quality threshold (perhaps just barely), and/or perhaps eHow tends to compete against even lower quality pages/sites.

"In the land of the blind, the one-eyed man is king."

So, it's not a question of whether a particular page is of high or low quality. Rather, I would conjecture that it's a question of

a) whether or not a page is scored below the "low quality" threshold,

b) whether or not the page is located on a site containing a high proportion of pages that are scored below the "low quality" threshold,

c) whether the page mostly competes against pages that are scored below the "low quality" threshold or,

d) whether the page mostly competes against pages located on sites with numerous "low quality" pages.

If this reasoning is valid, then pages can move up even if they are not of high quality, due to the ripple effects of changing the ranking schema to take in to account the direct and indirect impact of scoring certain pages/sites as having "low quality".

For instance, mediocre quality documents/sites can benefit just as much as high quality ones -- upward movements are more a function of the competitive environment, rather than where a particular page/site stands along the continuum from high to low quality.

they tried to change the algorithim to agree with with human testers impression of quality

Exactly. Undoubtedly they haven't fully succeeded at this point (plenty of false negatives and positives).

The key takeaway is that Google has started to focus on an entirely new goal. In the past all they cared about was "relevance" which was the only aspect of "search quality" they cared about. Not surprisingly, this created an incentive structure that led to millions of low quality websites, containing billions of low quality pages.

This is a fundamental paradigm shift, in which Google is for the first time balancing relevance and quality, thereby creating an economic incentive for Webmasters to stop focusing entirely on quantity, and to start paying more attention to quality.

That's a different definition of quality than anything I've ever heard of for search results - it has nothing to do with actually finding what you're looking for.

Finding what you are looking for is mostly a matter of "relevance." But, if the human is looking for a well structured page, attractively presented, with solid, detailed, reliable information, then relevance alone is not an adequate measure of what they are "looking for." If they are seeking a high quality page/site, they will be disappointed if most of the listings are "low quality." Google's trying for the first time to deal with that aspect of search quality.

This was explained by the Google employee quoted in the Feb 5 thread [webmasterworld.com...]

The central issue is that it's very difficult to make changes that sacrifice "on-topic-ness" for "good-ness" that don't make the results in general worse. You can expect some big changes here very shortly though.

In other words, this update is different from all prior updates, in that Google is attempting to score the "quality" of the document or site, as opposed to just worrying about "relevance" (how closely each document matches the searcher's intent).

tedster

5:21 pm on Mar 4, 2011 (gmt 0)

this update is different from all prior updates, in that Google is attempting to score the "quality" of the document or site, as opposed to just worrying about "relevance"

Exactly - well said. And that means looking for a simple fix based on what we know was already in use for relevance scoring probably won't cut it.

The information we have so far indicates they've got a new document "classifier" in place. And Google has shared a rough idea of what that classifier tries to measure. As this evolves, we want our pages to line up with the intent, and not just how it works this week.

sabrebIade

9:29 pm on Mar 4, 2011 (gmt 0)

Lots of well-written and well-backlinked Hubpages are tanking as well now.

econman

9:56 pm on Mar 4, 2011 (gmt 0)

Lots of well-written and well-backlinked Hubpages are tanking as well now.

Another thread mentions an example of a very high quality/popular page that is hosted on a site that accepts content from numerous contributors (doesn't name the host site but it could be on hubpages) that has dropped in the SERPs even though the quality of that particular page is not low.

I'm guessing this drop is because it is located on a site that has numerous other pages (or a high proportion of other pages) that have been scored as poor quality.

If so, it isn't clear why this ripple effect is taking place. I see two basic possibilities:

Perhaps all pages on a "low quality" site are being demoted -- some sort of site-wide ranking change which is adversely impacting your page.

Or, numerous low quality pages on the hosting site are now being devalued by Google when it does all of its internal link juice calculations. In other words, perhaps your page was previously benefiting from being located on a huge site, and now it no longer gains that benefit. This wouldn't have to be a huge shift -- even a small shift in the ranking algorithms could be sufficient to push your page down several notches in the SERPs.

AlyssaS

10:12 pm on Mar 4, 2011 (gmt 0)

Lots of well-written and well-backlinked Hubpages are tanking as well now.

Yes, but the top of the page is crammed with spammy looking ads...

Have you ever read Malcolm Gladwell's "Blink"? Users make split second decisions about whether to trust a page and continue to read, or to backspace and try something else.

Google is trying to put itself in the shoes of the searcher (on the reasonable grounds that the only way the searcher is pleased is if you give them what they want).

It's no good having the best written text in the world if the user won't even pause to read it because they've been put off by the ads on top.

This 73 message thread spans 3 pages: 73