Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

What EXACTLY is the Penguin Algorithm?

         

martinibuster

4:02 am on Mar 17, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Read an article last month last month that asked a dozen "Internet Experts" what their opinion of what Penguin was. Many of the responses were clearly about on-page Panda issues.

Funny thing. Nobody discusses what the algorithmic foundations of Penguin are. Have you noticed? Nobody says it's link analysis and points to a patent. In fact, speculation of what the Penguin Algorithm actually is, it's totally missing. So please, throw your two cents into this discussion. Three if you have it.

I have my ideas about what Penguin is. But I'm interested in yours.

(Note: Facts and speculation only. Jokes and complaints are Off Topic)


[edited by: Robert_Charlton at 7:51 pm (utc) on Mar 17, 2016]
[edit reason] Moved description line to body of post. [/edit]

Andy Langton

3:38 pm on Mar 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to touch on something else that has been mentioned regarding the recipient of "bad" links.

In the UK, many major retailers use Pay per post, sponsored blog posts, advertorials, free products for links, etc. etc. Such links can, as far as I'm concerned, definitely be involved in Penguin penalties. However, major retailers do not suffer negative effects (apart from the occasional high-profile manual penalty). To the contrary, you can readily demonstrate the such links are contributing to rankings. Such links are used with wild abandon to create backlinks to product categories and similar pages that people just don't link to naturally in any numbers.

It may well be that these links are tricky for Google to diagnose correctly - there recent 'warning' press release about free products for links heavily implies that they are not handling this particularly well algorithmically. But what is protecting the big sites? A few possibilities:

  • Their SEO companies are 'just' smart enough to steer clear of Penguin. While I'm sceptical, techniques have certainly improved (no more link networks, varied anchor text, etc.)
  • The strength of links to such sites means that any link is good. Above a certain 'authority' you can do what you like. There might be something in this, but I'm not convinced - I think the idea needs to be refined.
  • It's about relevance - if a site is already relevant (e.g. is already chosen as a ranking contender for the keyword at the indexing stage) then it gets a free pass. What counteracts this is people who lost rankings post-Penguin. The suggestion would have to be that they were totally reliant on bad links in order to rank at all, which does not fit many cases.
  • Some combination of factors?


So, if you accept my claim (which I believe is supported by data) that large sites can acquire links that would hurt smaller sites, and these links help them, what's the explanation?

Jez123

5:09 pm on Mar 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, if you accept my claim (which I believe is supported by data) that large sites can acquire links that would hurt smaller sites, and these links help them, what's the explanation?


I would completely agree with that.

If I get time I will share my own Penguin experiences and add something that I have only just come to realise - pure speculation of course but makes sense to me at least.

Spiekerooger

6:29 am on Mar 25, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



Small poll:

Do you think, that Google uses machine learning in Penguin?

a) no.

b) could imagine it

c) yes, for sure.

I'm asking this as I'm a little bit confused after the Q&A yesterday w/ Andrey Lipattsev.

FranticFish

7:35 am on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



what's the explanation?

Reminds me of that old quote (was it Sugarrae's?) "Google doesn't want to make sites popular, Google wants to rank popular sites."

@Ebuzz - can you sure more details about what you experienced?
What sort of niche are you in?
Are your commercial keywords varied (apples, walnuts, sausages) or closely related (apples, pears, oranges)? What about your non-commercial keywords?
And what about the relationship between the two sets of keywords?
What sort of ratio does the site have for commercial / non-commercial content?
How do you monetise your commercial content? And do commercial pages follow a different format for content / nav / OBLs etc from non-commercial?
Have you analysed your link profile to see if you can discover what it might be about it that was re-assessed? For instance..
- analyse anchor text and then look for patterns with pages hit
- analyse topic of linking websites and look for patterns with pages hit
- analyse ratio of commercial/non-commercial content on linking sites
- analyse linking pages and place in commercial/non-commercial buckets
- analyse IBLs of linking websites and look for patterns with pages hit
- analyse linking websites with Spiekerrooger's 'quality indicator' list
... and then of course run the same sort of analysis on the pages and sites that replaced yours when they fell.
I'm sorry your site was hit. But one positive from the experience could be that you could gain some real insight into WHY.

Wilburforce

12:23 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Spiekerooger

I would say yes, probably: I wouldn't go as far as yes, for sure, but I think it likely, which is a bit stronger than "could imagine it".

What was said in the Q&A?

Spiekerooger

1:16 pm on Mar 25, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



@Wilburforce:

sounds like the 80-85% "yes" I would choose for as well.

Regarding Q&A, what I heard is yesterdays news already (where Andrey Lipattsev said that Google is exploring machine learning for webspam, which sounded like: "plans to use" instead of "uses"). And I would deem Penguin as part of webspam fighting efforts at Google which got me confused.

@RustyBrick has a correction out today from another Googler Murat Yatagan here: [seroundtable.com...]

martinibuster

4:45 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



"...Google wants to rank popular sites."


That PageRank was a measure of popularity was both a breakthrough and a flaw. Popularity and relevance are not the same thing. That's why subsequent algorithms and additions were created to fix the popularity bias that was inherent in the original PageRank algorithm. The focus turned away from web popularity and turned toward understanding user intent and generating SERPs that matched that.

Thus today in 2016, the aphorism that "Google wants to rank popular sites..." is irrelevant and no longer true. Google has changed. Your aphorisms and SEO best practices should also change [webmasterworld.com] to keep up with the Google of today.

FranticFish

5:00 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



{off-topic}
Has Google really changed? I personally think, if anything, they've gone for more of the same - or at least, that's what the results of their efforts look like to me. Less relevant pages from more popular sites outranking niche sites and host crowding are two examples of preferring an answer (or multiple answers) from a bigger site over an answer from a smaller site. I don't study many SERPs but my overall impression as a user is that there's less diversity in the results than there was.

{on-topic}
The quote was pure speculation about Andy's observations that the same links might help or hinder a site depending on its size and footprint. I didn't intend 'popularity' to mean 'PageRank' - or any other single metric really.

martinibuster

5:28 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Has Google really changed?


No offense intended but I'm of the opinion that if you have to ask the question (in a discussion about the Penguin Algorithm no less), then the search engines have already passed you by.

FranticFish

7:21 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm of the opinion that you're being rather condescending. It was you that encouraged speculation in this thread, was it not? So I speculated that the reason for the same links having different outcomes depending on site might be popularity. You quoted me to imply a meaning that was not intended there.

Next I asked a rhetorical question, which I qualified: whether it's PageRank or something else, Google still seems to me to be looking for popularity. That makes perfect sense. They can't be expected to know the veracity of information, or the quality of service offered. But they can measure who is citing a piece of information, or an entity. If they then look at who is doing the citing they can work out how much they want to trust that citation. It might be a lot more sophisticated than PageRank, but it's still popularity. Niche popularity, academic popularity, expert popularity. So, my point was: what has really changed? The method? Of course. But that wasn't my point.

Shepherd

11:14 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're a hammer everything looks like a nail, if you're a link builder...

martinibuster

11:34 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google IS measuring the quality of service. Research the phrase Sentiment Analysis. Google IS determining the veracity of information as well. Research the phrase Deep Learning.

No offense but it's quite possible the search engines are beyond what you think they are doing . I am not being condescending but sharing what I know for your edification. I don't have to share. Many do not and hoard their findings. So rather than chastise me for trying to shed some light perhaps you should thank me for trying to bring folks up to date because that is what I'm trying to do.

[edited by: martinibuster at 12:25 am (utc) on Mar 26, 2016]

Wilburforce

11:35 pm on Mar 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Spiekerooger

Thanks for the link. That probably puts it back into the 80-85% range.

A few years ago Google was buying up AI companies, and DeepMind's recent Go victory clearly demonstrates that they haven't been idle with it, so I would be astonished if they hadn't at least run some trials on search.

That doesn't connect it specifically with Penguin, or - looking at it from the other direction - qualify what they are doing in search as AI, but - loosely speaking - if it looks, smells and behaves like a pig...

Whitey

12:25 am on Mar 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Maybe I'm being simple minded, but i see this as a concept that we can all acknowledge.

Penguin is associated with link building. Google targets sites that are not popular without links. Links from poor quality sites, links with monetization terms, anything that is human manipulation is fair game for Google to identify.

The methods of identification would seem to me to be quite easy.

Google has far more veracious ways of identifying good signals that equal quality. The biggest of these in my opinion, are branding signals that counteract the need for aggressive link building. So if you advertise on TV [ even with diminished audiences ] or have digital video media and encourage your audience to put a keyword search with your brand name in it, this is the type of signal that Google will pick up, in it's "probability score" of a domain being popular. Google suggest has been around for a long time - making link building less important and brand presence more important.

If anyone asked the question of themselves, would my site rank without unnatural links pointing to my site, and if the answer is "no" , the "what is Penguin" becomes less important for where you put your SEO emphasis.

Link building is even less relevant with mobile. With over 50% of searches, 80% less screen real estate and climbing the game has been accelerated to take in mobile interactions for quite some time.

Now if you receive an exceptional link, from an exceptional site, on an exceptional subject that nobody else talks about, that is found on your site, that might be the exception. And equally if you are a truly exceptional site, that everybody talks about, then links are irrelevant.

The new dawn has been with us for 5-6 years now. Mobile, social, instant, visual, minimal content, small screen sizes. There's little room for spam, it's fixed, which is probably why Matt Cutts choose his time to step aside.

You are not going to rank big time with links. Penguin.

You are not going to rank with non-exceptional content. Panda.

People send signals now. Not links.

btw - i do not want to devalue the exceptional inputs here. Good reading and very thoughtful comments IMO

martinibuster

4:06 am on Mar 26, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Some good ideas Whitey, really good. :)

But I want to clarify one part. I love the rest of what you wrote but this part I want to add to. You have some good ideas about popularity and they are close to one of the most important things Google is doing, which is measuring users to see what sites (and also what kinds of sites) produce user satisfaction. It's user satisfaction that Google is measuring not popularity. A ranked site might be popular, but what Google is measuring to determine the kinds of sites that Google should rank, the metric, is user satisfaction.

If a site satisfies a user, is it because the site is popular or is it because that's what the user was looking for?

If a site satisfies a user and that user tells ten other users who seek out the site, is the site useful or popular? It's useful first and is popular because it's useful.

tangor

5:41 am on Mar 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At some point the fine scale that g is seeking will backfire, OR, worse, lose credibility as satisfaction is something people change as often as their underwear ... even in these humongous numbers and data points. Trying to predict user satisfaction (ie. what to display) most likely will hit a brick wall in the future as what works today will not work tomorrow. Having said that, there is a sense of relevance attached to all this fine-fiddling and I'm all for that. I think we all are, because we create our sites (usually) to provide satisfaction and gain some loyalty (and as a perk, income perhaps).

FranticFish

10:20 am on Mar 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@martinibuster
is the site useful or popular? It's useful first and is popular because it's useful

Yes. Perhaps the best word for this would be 'trusted' then? 'Popular' can be misconstrued, and I don't think 'relevant' is a great word either - as far as measuring usefulness goes. By that I mean that for most terms it's no good to just be relevant, you have to be SEEN to be relevant to people (citations, user data).

Thanks for the research tips, I will get reading. Do you think that sentiment analysis is in play already with Penguin (I see it's available as part of the Prediction API)? As for 'Deep Learning', did you see this? [bbc.co.uk...]

Wilburforce

12:59 pm on Mar 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Trying to predict user satisfaction (ie. what to display) most likely will hit a brick wall in the future as what works today will not work tomorrow.


This applies to Google's competitors too: however difficult it gets they only need to satisfy users better than everyone else. So far, they seem to be more determined and more resourceful than the rest, but nothing lasts forever.

martinibuster

2:09 pm on Mar 26, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



FranticFish, yes, trust is very important. The Quality Rating Guidelines was concerned about sites being trusted, in many senses of the word and I think in one sense, it could be said that, by result if not by design, Penguin and Panda create a set of sites that can be trusted upon to be in the SERPs.

And I agree, it's not good enough to just be relevant, one must be seen and that's something I have tangible experience with. I enjoy several hobbies and continually find that the SERPs tend to show the same sites over and over. Yet if I follow links from the blogrolls I am almost certain to be delighted in discovering other sites and blogs that are just as good but don't make their way to the top of the SERPs, most likely because there is no promotion behind those sites. If you sit back and wait, really the big things aren't going to happen.

Whitey

11:58 pm on Mar 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If a site satisfies a user, is it because the site is popular or is it because that's what the user was looking for?

If a site satisfies a user and that user tells ten other users who seek out the site, is the site useful or popular? It's useful first and is popular because it's useful.

A site has to be found first. Which is why Google is hellbent on owning as much of the content layer, so that you have to go to Google to find what you want as part of the brand experience.

With mobile devices reducing the size of the "real estate", Google's battle for supremacy has gotten more difficult - even with that content layer. And if it's difficult for their real estate to be found, then it's even more difficult for website's to be found in G's SERP's.

If there's nothing to rank for then it comes back to brand recognition, where easily finding what you want on that brand search was/is easier via Google. Google is the home page of all good sites. Link building in e-commerce doesn't fit into that paradigm so much these days for finding a site. And link building was rarely a brand building exercise anyway.

One big thing that occurs now with mobility, is that people are not locked to their PC's and laptops like they used to be. People are impulsive, but a query into their phone while walking and talking or watching something else. And to address that market, your technology needs to cater for that.

Link building is irrelevant largely to that need. And the returns are a lot less for link building than they were. Nobody's interested in participating.

The reason there has been no Penguin update, i think, is because it's not in the top priority of things in a fast moving World of Google. For that matter, i wonder how relevant SERP's for e-commerce and publishing will be down the line.

So, to keep this post OT, I'd say it's academic to debate "what is Penguin". Rather, "what was Penguin" and "has it lost it's significance ?" Same with Panda.

EditorialGuy

3:04 pm on Mar 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



One big thing that occurs now with mobility, is that people are not locked to their PC's and laptops like they used to be. People are impulsive, but a query into their phone while walking and talking or watching something else. And to address that market, your technology needs to cater for that.

Or maybe that's the wrong market to address, unless you're catering to local users in real time.

iamlost

6:37 pm on Mar 27, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some points to keep in mind:
Generally for any algorithm:
* not every potentially appropriate research paper process is used.
---nor used as described.
* not every potentially appropriate patent (in-house) is used.
---nor used as described.
* every potentially appropriate patent (elsewhere) is licensed...
---nor used as described.
* many things used, including those quite consequential, are not patentable.

* most site behaviours/qualities that search engines promote are NOT directly discernible.
---human quality raters are primarily confirming for algo testers those indirect machine usable inputs that best rate un/desired behaviours.


I believe that some in this thread are spreading their thought net much too wide, including things that while of importance are unlikely to be part of Penguin. I hold to my prior post view that Penguin is to links as Panda is to content.

Because search quality machine learning relies on indirect inputs for discovery a critical component is comparison; comparison at (at least) two levels:
* against seed sites chosen as exemplary examples.
* against the body of sites within a site's groupings.
Which raises the question of what is being compared?

I believe it is the shapes of it's link graph.
Just what inputs are associated with each node and edge are known unknowns. Given the time between released results I suspect more inclusive and therefore complex than exclusive and simple. And so I consider all presumed link value associations as possible simply varying in degree of probability (self-assigned, there being nothing official).
Note: you are free to disagree. Given the current paucity of knowledge we are each in a position of arguing from ignorance.

Not that I care particularly outside of intellectual curiosity as my sites have not been negatively affected by Google anything anytime (knock my wooden head) and I'm not in the recovery business. However, I do have an abiding interest in Information Science both from the theoretical and the practical, including an ongoing (5+ years) value rating of non-search traffic referring back links primarily to optimise contextual content delivery.


Whitey's post above is quite thought provoking. I had to 'like' it on that account alone. Whether or to what degree I'll be in agreement after things settle and percolate and... is yet to be determined.


Note: I would like to make one clarification regarding part of my prior post:

I tend to believe that Penguin is targeting links (their component parts, velocity, neighbourhoods, position relative to niche/vertical graph, etc.) and what flows from those links.

martinibuster wrote:

I don't believe traditional statistical analysis of anchor text percentages, of link velocity, and other similar old school statistical analysis has anything to do with Penguin. Have a great weekend!

First: I also do not believe that many/any of the usual link analysis stats are being considered. Such are quite simply artifacts of what publicly available tools have to offer. However, there are link values that most/all tools do not consider that I do, not to say that Google generally, Penguin specifically do, although... why not? :)

Second, the clarification: I define 'link velocity' other than convention and forgot that when included it in the post. Rather than rate of growth of back links, I mean the rate of growth of traffic through a given link. Apologies. Not that my version is necessarily any more likely than the conventional to be a P input.

Whitey

7:01 am on Mar 28, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The reason there has been no Penguin update, i think, is because it's not in the top priority of things in a fast moving World of Google.

I'd better moderate that above comment. It was my speculation, and I still hold to it, but it's worth checking out the following article and interview:

Recently Stone Temple's CEO Eric Enge interviewed Gary Illyes on:

What Is the Google Penguin Update?

When Is the Next Penguin Update?

How Real Time Will Real Time Penguin Be?

Article: [stonetemple.com...] Transcript: [stonetemple.com...]

It's something good to consume in the context of all the above points and discussions. Enjoy.

Whitey

9:00 am on Mar 29, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Top Two Ranking Factors
This was said by Andrey Lipattsev, a Search Quality Senior Strategist at Google, yesterday in a Google hangout :

Ammon Johns asked the question: "Would it be beneficial for us to know what the first two are (first two ranking signals are). Would webmasters build better sites..."

Andrey Lipattsev said:

"I can tell you what they are. It is content and links pointing to your site."

Ammon said "in that order or the other order?

" Andrey replied, "there is no order."
[seroundtable.com...]

Seriously? For brand / e-commerce sites ? What of Penguin then?

martinibuster

11:56 am on Mar 29, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Eric Enge interviewed Gary Illyes


That's a useful interview to read, even though there isn't anything particularly substantial that a competent SEO would not already know.

Whitey

2:17 am on Mar 30, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What of Penguin then?

@martinibuster - I wondered if you had any thoughts on that, since you seem to be close to the relevance of linking post Penguin.

My sentiment that i stated earlier was that links are not going to rank a site big time anymore, especially in e-commerce. This i believe because in my observation, there are both bigger ranking signals than links [ probability score /Google suggest on the back of brand related keywords ], and that to rank an e-commerce site using links for commercial purposes would be self destructive [ ie Penguin ], because you would need a lot of them.

So i am at odds with the inside knowledge of this Google employee who is encouraging link building. Which i think conflicts with Penguin in popular e-commerce/publishing verticals. Thoughts?

JS_Harris

6:44 am on Mar 30, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



BEFORE Penguin: My site ranked for both informational and commercial queries alike
AFTER Penguin: My informational site only returns for non-commercial keyword/phrases

I think Penguin is an usher at the front of Googleworld. If you want to buy something he sends you to Isle 1 but if you're not buying he directs you towards the library. There are products that informational sites can't rank well for and information that big brands can't rank for. I know this was not the declared goal of Penguin, in fact it was the declared goal of another algorithm, but wherever I diagnose Penguin issues I spot content that is on the wrong side of the informational/commercial barrier.

It's like a new search engine was created, we have Google Mobile, Google Image, Google Desktop and now Google Intent. If you have a product name in an informational page title you had better have content that supports the product, in a way people will actually search for, that doesn't cater to a 'buying' intent. Affiliates took a beatdown with this when Penguin came out.

Walt Hartwell

4:32 am on Mar 31, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



BEFORE Penguin: My site ranked for both informational and commercial queries alike
AFTER Penguin: My informational site only returns for non-commercial keyword/phrases


Most people don't pursue links for strictly informational sites unless they have some desire for fame based on that information.

Most links to commercial phrases are probably artificial in one way or another.

Is it so totally inconceivable that a search engine could devalue links based on commercial phrases while still allowing ranking for informational terms?

I would guess that every search engine has access to browser histories so they are probably very well aware of what sites are visited, what links are clicked, and how long users stay on any given site.

The keyword phrase thing isn't what I would focus on.
SEO people have been saying to focus on the user. Makes sense to me.

martinibuster

12:23 pm on Apr 1, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



My sentiment that i stated earlier was that links are not going to rank a site big time anymore, especially in e-commerce. This i believe because in my observation, there are both bigger ranking signals than links...


Well Whitey, since you asked for my opinion, here goes...
As I stated in a previous post in this discussion, there are some queries for which the answer most people are searching for is not a commercial answer- particularly short two word phrases. That's nothing to do with links. That's do with another part of the ranking formula that comes after the links are tallied and sites are ranked. It's a three step process that consists of indexing, ranking, and post-ranking modification. Wrap your mind around that and you will be miles ahead of the pack.

Penguin is a link algorithm and it affects all sites equally. What may appear to be a brand bias, where brands are not as affected, that's generally a site ranking where it should be ranked based on the quantity and quality of inlinks.

So i am at odds with the inside knowledge of this Google employee who is encouraging link building. Which i think conflicts with Penguin in popular e-commerce/publishing verticals. Thoughts?


The Googler is right. Your observations need refinement.

This i believe because in my observation, there are both bigger ranking signals than links...

As I mentioned above, it's important to get all the factors into their right box. Information Retrieval is a 3-step process made up of:
  1. an Indexing Engine
  2. a Ranking Engine,
  3. a Modification Engine.

Any number of things can happen in the modification engine, such as rearranging the SERPs in order to show a news snippet or rearranging the SERPs to show results that match the geographic area tied to your IP address. Those aren't ranking factors determining those SERPs. It's the modification engine. Remember, it's a 3-step process. Indexing, ranking, modification.

There are some processes that are part of the Modification Engine. Modification engine "factors" are not ranking factors. Mistaking a modification engine process for a ranking factor is a leading reason why SEO theories conflict with official Google statements.

Spiekerooger

10:49 pm on Apr 1, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



@martinibuster:

while this sounds true regarding the bigger picture of how a search engine work, I thought that here we are focusing on Penguin. In my humble opinion Penguin is not a modification engine (RankBrain and QDF are examples of that) and it's also not an indexing engine (it rathers uses the index w/ all its information).

So we'll have it as a special ranking engine with a sole purpose: cap artifically high rankings gained thru offpage spamming automatically. But - and that is Penguins mayor flaw - unfortunately it at least looks like if a lot of processing (power or time) is needed in automatically answering who is spamming (a Penguin positive) and who is not.

For blatant spamming adventures they have they own filters that work pretty fast (looking like hard edges e.g. regarding anchor text variations) and don't need much processing power or time. So Penguin is rather working in the shallow grey area of "soft" spamming.

As I stated earlier I imagine that Google is using some kind of machine learning algorithms in Penguin but I'm rather unsure about the data they use. For sure they use offpage data like links and probably mentions. But do they use user behavior data as well and do they use time as a vector in their input matrix? For example: finding the sites that gain huge amounts of links in a short timeframe without having first gained huge traffic from some sources (why should a site gain links without having visitors first that would link to the site?).

In my eyes (and as I'm unsure about input data for Penguin) the time needed for Penguin updates and the ongoing delays look like a machine learning process as described earlier here and much better described by @iamlost above. And it looks like a machine learning process that is slow in producing acceptable results, maybe overfitting or producing too many false positives.
This 102 message thread spans 4 pages: 102