|Could Penguin be looking at patterns in a whole new way?|
It's my understanding that Penguin and Panda are very sophisticated algorithms that have to be run periodically in conjunction with the main always-running algorithm, and they may require seriously massive computing power. This suggests to me that they must both being doing something that is beyond the main algo's capabilities, or something that can't be worked into the main algo - otherwise, why have them? Panda, for example, was trying to weed out content that wasn't exactly dupe, but was still of little if any value - that's a pretty ambitious goal for an algorithm.
Therefore, it seems to me we should assume Penguin is similarly sophisticated - either evaluating something the main algo can't, or evaluating something much better or more profoundly than the main algo does. While Google's stated intentions with Panda made it pretty clear Panda was doing things the main algo had never even tried to do, their statements about Penguin were simply that it targeted spam and SEO tactics - tactics the main algo has always tried to detect and penalize. So what is Penguin doing that the main algo can't do, or can't do as well? What's Penguin doing that's worth the cost of its development and deployment?
Could it be looking not so much at the actions we take with our sites (which the main algo already looks at), but at the patterns those actions create? What if Google has identified some patterns of behavior on certain queries that (they feel) reveal spam on a whole new level? Let's say that nearly everyone trying to rank for "coarse ground widget" has a link to Wikipedia and green H1 headers. Some of these sites don't stuff keywords or manipulate their backlinks or do any traditional spammy stuff that the main algo has always dealt with. But Google has noticed this pattern and decided it indicates ALL these sites are deep into SEO and may need to be dropped in rankings, depending on other factors? (I'm assuming that even if you trigger Penguin, it gets weighed against other variables and might not take effect if you have enough strong quality signals).
What do you think? Too far-fetched, or is it possible? And if Penguin's not doing something like this, what do you think it's doing to justify the cost of development and deployment? (I really want to know, even if I'm wrong!)
|I'm assuming that even if you trigger Penguin, it gets weighed against other variables and might not take effect if you have enough strong quality signals). |
I have doubts about that assumption. Keep in mind that penguin appears to include a backlink devaluation as one of its effects, even if it does nothing else. That's apparently why some pages on many penguinized sites only fall slightly in the rankings, or can even hold onto a number 1 ranking for their main keyword in some cases. One result is that some penguinized sites aren't hurt nearly as much as others. For example, both of my penguinized sites still get far more traffic from Google than from Bing and Yahoo combined. But in my opinion the backlink devaluation weakens all the pages on a penguinized site at least slightly, even those that hold on to a number 1 ranking for their main keyword.
If Google is using machine learning behind the algorithms, then YES, the algorithm is finding weird correlations like you suggest and penalizing for them.
When Bayesian filtering for spam first became popular I recall a similar example. The Bayesian filter picked out "FF0000" particularly spammy word. If somebody had an html formatted email with text styled red (with FF0000) it was almost certainly spam.
Google is discerning intent at the query level and delivering results accordingly. Their click revenue was up 44% post panda/penguin, this in a near dismal economy :0 doh! Oh and don't forget that google shopping was being rolled out at the same time so at some point you have to correlate the obvious.
If you are a threat to their revenue you will not be in the SERPS.
aristotle, fair point.
|When Bayesian filtering for spam first became popular I recall a similar example. The Bayesian filter picked out "FF0000" particularly spammy word. If somebody had an html formatted email with text styled red (with FF0000) it was almost certainly spam. |
Yes. And I'm thinking it could be even more subtle than that - possibly stuff none of us can even detect just by looking at out own sites. After all, you may not realize other webmasters are doing the same behavior you are - but Google could.
|If you are a threat to their revenue you will not be in the SERPS. |
I want to understand what you're saying here better. Are you suggesting that Google may be targeting patterns of sites that constitute a threat to their revenue?
|If you are a threat to their revenue you will not be in the SERPS. |
What kind of websites would be considered a threat to their revenue?
Just found something interesting today. Before Penguin, one of my sites had the #1 page for a phrase combining a specific word with a broad, popular keyword. Let's say "widget tutorials" (neither of these are the actual word). Then Penguin hit, and I fell a few notches, then a few more, down to page 3.
The page later got virtually de-indexed. It doesn't show up for "widget tutorials" anymore at all, not even in the omitted pages. I think this may be because for a while I had a link to a good page on a site I now recognize as heavily targeting a profitable keyphrase (d'oh!). I've removed that link, so hopefully that will get me back to at least page 3, but what's happening right now is really odd.
Today I got more creative in my searches for it and tried:
-- "my domain widget"
It came up #1.
-- "my domain widget tutorials"
-- "my domain widget tutorial"
Nada - disappeared for the index on all three of those.
--"my domain tutorials"
The page in question ranks #2, behind one of my other popular "tutorial" pages.
So it's something like this:
--Page ranks nowhere, not even in the "omitted" sites for "widget" or "tutorial", but add "my domain" to either and suddenly I'm #1 or #2.
--But put all 4 words together, and I fall out of the index again.
Worth noting: the word "widget" doesn't appear on my other pages, but "tutorials" sure does. All the other "tutorial" pages are ranking just fine.
So what word or words are the issue here? Have any of you seen anything like this before? Any thoughts?
Great question Diberry but if Penguin is doing anything along the lines you suggest it is probably trying to collate related pages from a site so that it can rank one domain 20 times in the top 25. This reduces the spam people find by pushing spot #2-10 off of page #1 where most clicks happen.
Lately when I have to add a -youtube.com to a search I'm finding a different site has 30 pages indexed, and another, and another. It takes computing power to create all these groups!
Thank you for your "signal".
Just tried with my site with a particular page (good traffic before penguin)... it's the same!
"my domain widget"
result -> #1
"my domain widget tutorials"
"my domain widget tutorial"
result -> this page is "nowhere"
Our Penguin effected site is very tightly focused, and until April 24th, led its specific niche in ranking for just about everything on the topic, where informational queries were involved. (consumer buying guides site, expert level content, we sold the products face to face for 15 years, however they are rarely sold online.)
Since April - almost every page of our site resides at "end of results", aka -950 penalty. (Although some kind of change seems to be in the wind since Friday and through the weekend, no ranking change, just placement changes)
In our case, we feel it's got to be more than just links or too much seo. The thing we see over and over is that the first page of serps is now filled with a few specific big-brand sites. Almost as if Google feels they have found the best resource for informational queries about these "the-widget", and all else is a distraction.
We're obviously missing something on our end. Google has said the site in question is not manually penalized, but the same pages which are at end of results in Google.com, are in the top 10 at Foxstart. (which uses a Google feed, obviously pre-filtered) We have a few 1500+ word buying guides for 2012 that don't even rank for on-page content, yet our Facebook page (with only the title of the page) ranks in top 3.
We have thrown everything at this site to try and help... It's the only site we manage that was hit on April 24th 2012, and unfortunately, it was also the site we thought we were doing everything right on. Truly makes you question your efforts of following all the guidelines. Also unfortunate, is that we have simply abandoned it at this point, and will not touch it again unless it shows signs of life. All our effort has done is cost us traffic from other engines while trying to please Google. We give...
@Splugged, now that is VERY interesting, that we're both seeing exactly the same thing. I wonder what else we should compare notes on? How about the niches? My "widget tutorials" is a modestly popular search term. When my page was #1 for it, I got 2000 visitors a day for that term, which is not bad but certainly not a hot, competitive phrase. There are only a couple of MFA sites targeting the phrase, and they only rose to visibility after Penguin. How does your "widget tutorials" niche look?
@mhansen, I'm in the same boat. I manage several sites, and only one got Penguinized. I carefully followed the Google webmaster guidelines as I understood them. I feel we're missing something, too. It's got to be something more subtle than the usual signs of spam, or else it's more than just spam in the way we're used to thinking of it. I keep thinking, many "SEO" tactics overlap with good marketing tactics - for example, repeating key phrases as much as you can without sounding unnatural can be a good sales or presentation tactic. I can see Google concluding this is SEO when the webmaster actually was just following good marketing rules.
As for the handful of big brands, it's the same in some of the queries I fell the hardest on. Now, if we do assume Penguin is about spam/SEO, then what are the brands doing differently from the rest of us? I had a meeting with a web development company the other day, and their main approach to SEO is link building, and they do it mainly with press releases. Dupe content with inbound links. What triggers Google to give some sites a pass on that behavior? Probably the authority of the site publishing the press release... which is high because it's associated with an OFFLINE brand (or newspaper, or whatever). If so, we may not be able to beat that without starting our own publishing empires or something, LOL.
But I do believe there's always a way to work around these things. It looks like Penguin is a total game-changer, so it's vital we figure out what it's about, what it's looking for, and what we can do to make our sites viable in the post-Penguin era.
Okay, my pattern changed today. I'm now #1 for any variation on "my domain widget tutorials". But I'm still deindexed for "widget tutorials" itself. Remember that I'm making steady improvement to the site - it's likely that getting rid of that link I realized belatedly was spammy has helped.
I may be looking at this wrong, but I'm thinking it's like this:
--Being still de-indexed for "widget tutorials" means Google's not ready to trust that page again yet for whatever reason.
--Now being #1 for "my domain widget tutorials", when I was totally de-indexed before, could suggest my domain itself is gaining strength with Google.
So, maybe (especially if we just had a Penguin update, as some have speculated) Google upgraded my Penguin score a little for the overall domain, but is still holding certain pages down.
I just found another page I can use as sort of a control query against "widget tutorials." Let's call the query for this page "green dog tutorials". This page used to be #2-3, but now it's in the 600s for both "green dog tutorials" and "green dog." Add the words from my non-EMD domain name in there, and it rises to the top.
It would appear that it's my "tutorials" pages which are suffering most. But why? It's not about quality of content - if it was, I'd have been hit by Panda. I got hit by Penguin, which means Google thinks I'm spamming. Do they think I'm over-optimized for "tutorials?" If so, how? I have extremely few inbounds that use that word in the anchor text, so it has to be on-page... but I don't use that word much (I have pages where I use it more that are still ranking well). Besides, as was my whole point with this thread, I think Penguin has to be looking at something more subtle than keyword stuffing, since the old algo could deal with that.
I just don't know. I do wonder if it's the whole "tutorials" niche being rearranged because of some broad pattern amongst hundreds of sites - in which case I'll never figure out what it is because I don't have access to all the data Google has.
We've recovered sites from Panda but Penguin is a nasty piece of work!
diberry, we're seeing similar results to you. It doesn't make a lot of sense.
What we are finding particularly is that if we search for our main term, say "green widgest", we get a page from our site come up on page 4, but it's not our home page. If we then search for that page itself it doesn't come up at all.
Penguin is certainly not just about links, in fact from what we can see links aren't a major part of it.
We are tomorrow about to do different experiments with two Penguin hit sites and see what happens. We are confident both will pull the sites out of Penguin but we think one of them may crash and burn again. I don't want to say on an open forum what we are doing (I've not seen anyone discuss this method on any forums or blogs) but I'll certainly report back if the results stick.
We've conquered Panda so now we're going after the Penguin.
Out of interest how did you conquer panda?
|I don't want to say on an open forum what we are doing (I've not seen anyone discuss this method on any forums or blogs) but I'll certainly report back if the results stick. |
I'm looking forward to that.
I've been analyzing my rankings. In all but a very few cases, Bing ranks me very well on all the phrases Google used to before Penguin, but has since dropped me anywhere from 6 to 100s of positions. I'm starting with any page I can find that both engines rank poorly. So far, those pages have had traits I can see the engines considering spammy (even though spamming was not my intent), so I've fixed them. I recommend this approach to anyone who feels they were a "false positive" for Penguin.
But is this getting at my Penguin problems at all? I'm not sure how much of my trouble is Penguin and how much may be other parts of the algo.