| This 33 message thread spans 2 pages: 33 (  2 ) > > || |
|How I Make Sense of Google's Complex Algorithm|
In the past, we all knew that search engines were just a bit more complicated than they appeared, but we created mental models that did a pretty good job explaining the SERPs and we let it go at that... for a long long time. I like to think of those old models as the "punch list" approach - here's all the factors we think Google measures and combines into their recipe - let's make sure we hit each one.
Then, slowly but surely, something shifted. Keywords plus backlnks could no longer explain the rankings we started to notice. What on earth is going on? Here's what I've been able to put together.
WE all know Google loves data. I'd guess that they collect at least ten times the number of signals, compared to what they actively use in the algorithm at any time. And they never delete any of it ;) When Panda first crawled out of development, we started hearing a lot more about machine learning - but Google has preferred the machine learning approach from the beginning - and they let their machines free check out the BIG DATA pile just to see what correlates and what doesn't. There's a reason so many of their PhD hires are statisticians.
Today more 200 signals are actively used - and I'm betting it's FAR more. They know when any particular signal (say backlink anchor text) is natural or at least along the same lines as the rest of that market - and when it's been seriously manipulated. Lots of backlinks should correlate with some other mentions here and there. If it's too low (or maybe too high) then it might get devalued or even tossed out.
Read some of the Spam Detection patents - especially the one about Phrase Based Indexing. This statistics thing is really big.
TAXONOMIES - AUTOMATED!
Google has been automating taxonomy generation for a long time. Query terms are assigned taxonomies, websites are assigned taxonomies. When the statisticians play with their big data, I'm pretty sure that they look at statistical relevance with a given taxonomy - let's say within a market place. Clearly signals are used differently for a crafts website than for gambling, for example.
So when two URLs seem to have "the same" signals but one far outranks the other - it's more likely to be the way that signals correlate and interact - as well as signals you're not used to thinking about.
Historical signals are a big one. Remember that scary big patent full of possibilities? They've definitely been collecting and testing all those kind of data.
How about User Engagement signals of many kinds? All the search engines have been looking at that kind of data because it's so danged hard to fake. At the same time, when Matt Cutts says that bounce rate is "too noisy" a signal foir them to use - he's not just flapping his gums. He knows, mathematically, exactly how useful or not these signals are in generating good quality rankings.
And there are many thousands of correlations to be measured an watch - thousands I tell you.
[edited by: tedster at 5:28 am (utc) on Mar 8, 2012]
Another piece of today's algorithm is feedback mechanisms - feedback that comes from measuring user behavior directly, and human quality rater feedback that looks at the results and rates the algorithm on how well it's doing or not.
And there's no doubt that in some areas Google is not doing so well at all. The further down the long tail you slide, the more chance you have to see some dreadfully wacky SERPs.
I believe the long range hope is that this kind of feedback may be be automatically folded in, at least to a degree. That would be a real AI goal, and we know Google loves their AI. It's a kind of "self awareness" I suppose.
Certain kinds of ranking changes could be made on a small scale, "sampling" basis just to measure how well it's working. And when it isn't working well you just might see some "zombie traffic."
The long run, we've got to let go of our old mental models and find some that are more functional. As several members had shared here recently, Google may be so complicated that no one anywhere understands the whole thing any more - not even Amit Singhal!
I guess I think of AI as fancy language for learning from your mistakes. In that case, google ought to be learning at a record rate right now.
I think the biggest problem with google right now is that they have departed too far from on-page signals. We see it most clearly in the long tails, as Tedster pointed out. Their obsession with spam has caused them to throw out the baby with the bathwater. I think it is time for google to get back to basics.
I'd love to fly out and meet you. I've been preaching these data points to my employer for a long time. A challenge I find is expressing this information in a language they understand.
Moving along to some relevant information... I have evidence through a few sites in how 'inbound links' have been devalued while other sources are weighted heavier simply through the 'devaluation of links' in how Google classifies these links or the sources they come from.
Google's spam team actively working around the clock to engage in filtering the link building mess on the internet will only naturally cause a widespread traffic loss to those sites engaged in the 'old school link building menthods'. Naturally, brand signals, on-page optimization and other signals start to take a hold as prominent ranking signals vs. 'old school linking methods'.
In addition, there is a solution to building links via whitehat, if you can verify your site to Google, your linking classification changes.
Order of getting your outbound linking classification in check:
1. Verify the user
2. Verify their popularity
3. Verify their sites
4. Once this has been assessed, outbound link classification from their sites is changed and weighted heavier than non-verified sites.
|Google may be so complicated that no one anywhere understands the whole thing any more - not even Amit Singhal! |
I would like to know about the corporate culture at google. Is it anything like the ultra-secret culture at Apple? (There was an interview on NPR this last Monday at they had a story where some peoples cubicles at apple were covered by tents so that no other employees could figure out what they were working on.)
|Lots of backlinks should correlate with some other mentions here and there. |
Not necessarily. Based on the vertical as well. Tons and tons of websites do perfectly fine in verticals where little to no social presence, or "early" presence is just starting.
With all of the data Google does have they sure have a very difficult time with link relevancy and understanding of the quality of inbound links to sites.
|let go of our old mental models and find some that are more functional |
I've always had a mental image of Google as a steely-eyed librarian making judgements about which documents are worth shelf space and how they should be organized.
<cynicism>My image of Google in recent years has been of someone thinking, "How can I squeeze another few bucks out of this lot?"</cynicism>
|some peoples cubicles at apple were covered by tents so that no other employees could figure out what they were working on |
LOL. You would think that if it were that important to conceal those particular projects, Apple would have used it's significant resources to give them properly separated work areas.
Actually, I think the majority of modern webmasters are quite unlike the pioneers who had the original "mental model" of Google. A lot of current members have learnt the ropes through second hand resources, like WebmasterWorld. That is where the checklist approach came from- a communication or teaching technique that is easily understood.
The fact is, a lot of people are making money online without even the slightest clue about Information Architecture. Infinite URL space is a problem Google can deal with. Crawl budget, if even known about, is an unfair system that keeps the big boys on top.
That's got too many long words in it to be worth reading.
|Google has been automating taxonomy generation for a long time. Query terms are assigned taxonomies, websites are assigned taxonomies. When the statisticians play with their big data, I'm pretty sure that they look at statistical relevance with a given taxonomy - let's say within a market place. Clearly signals are used differently for a crafts website than for gambling, for example |
The checklist approach worked for a long time, because the hard-won intelligence behind the list was sound. Enough people were doing actual testing, with actual dummy domains* to produce actual guidelines. That, and the fact the checklist produced (and still produces) a Good Site.
The problem with the checklist approach is that anyone can follow it. And many millions have. Thousands of sites now meet the checklist requirements, so now Google is having to differentiate between many Good Sites.
Back when the checklist was still working, plenty of people were still trying new things. They were generally called black-hatters, or Grey at best. Many grey-hatters did really well in bursts, before being wiped out and starting again. No one much cared, because they didn't employ people. And in that time, a new Orthodoxy evolved.
Google had spent a lot of effort suppressing algo exploits. A new dogma emerged: "Stop Chasing the Algorithm". The checklist stopped being a battle-tested guideline for begginers. Instead it became the Commandments for success. And it worked, enough for True Believers to start basing business models on it.
But like the financial markets, people had begun to think of the models as reality. Following the checklist does NOT entitle you to rankings- it just stops you from avoiding the major pitfalls from the "coloured hat" era of SEO.
The problem is that we've left the "coloured hat" era, and entered the Age of Engagement. The checklist still works- but only if you understand WHY it worked in the first place. The checklist was all about the dogma of "Ignoring Google" to "Focus on Users" and deploying "Optimised Site Structure". The problem is that too many people are using the same recipe, with an off-the-shelf CMS. Making a "Good Site" is no longer sufficient- and post-MayDay, post-Caffeine, Google has been focussing on differentiation like never before.
In the past, there was an arms race of exploit-fix-exploit. Google's tools of choice were FUD and an assorted arsenal of penalties. These days, penalties are exceptionally rare (and are the risk of exciting ire, Panda is NOT a penalty, any more than having low PageRank is a penalty) and they haven't been spreading FUD for ages.
There is a new paradigm for ranking sites. And yes, its complicated, and yes, Tedster has unpicked some of those new factors. But to return to where I started, too few people are rigorously testing these factors.
Too few have truely unique sites, with unique techniques, unique development, unique footprints. As such, it's really very difficult to look around and see what works and what doesn't. There's too many counter examples available- an artefact of the covergent nature of many sites, combined with the complex multi-dimensional scoring criteria of the current algo.
Now the old recipe no longer works, I hope there is a new generation trying new things, free to experiment without the worries of maintaining a business. I'm not sure how they are going to test, or monetise, or rank. But I am sure that trying to create a new one-size-fits-all narrative of easy-access checklist SEO will not work.
As many have said before, I do not think Organic Search as the basis of a business model is dependable. In a more immediate sense than at any time in the past, it can all disappear, and it's unlikely that anyone will be able to give definite reasons why, or solutions to fix it.
*Very hard to maintain worthwhile test domains these days, due to the exponential proliferation of sites. In fact, I doubt it's possible to launch from scratch a test bed that ranks well enough to produce useful results.
One thing that stands out to me about tedster's analysis is that there is a VERY real problem of link OVER-DIVERSIFICATION.
We've always heard to get back links from a wide variety of sources. Unfortunately, that is NOT normal. Unless you are proactively getting links, then depending on the site, there is going to be a limited spread in terms of the kinds of sites that are going to link to yours. That is the way it is.
I am sure that there are some sites that will be linked to by news organizations, yet government or educational sites will NOT link to it, and vice versa.
That's my story and I'm sticking to it.
|And there are many thousands of correlations to be measured an watch - thousands I tell you. |
This is the killer factor that Tedster has identified. Because, if it's true, and I believe it is, Google has cracked SEO wide open. No-one can SEO a site with any reasonable certainity of success. The variables, and the interactions between them, are so massively huge that no mind can work out what is going on. Only a computer can do that.
Google's long ago advice of publishing sites that are useful for the end user have now become true. Gaming the system is now pointless because "the gaming" now has no guaranteed result.
The problem I see with the SERPS at the moment is the huge bias Google is giving to the "big" sites.
|Google has cracked SEO wide open. |
Well, there should still be some time devoted to thinking about what rote factors cause sites to rise in the rankings. However, I will say the amount of time you want to spend doing that should be drastically less than what it was years back; the return just isn't their.
Every hour you spend thinking about the algorithm and ways to tweak your site to rank better, is one less hour you put into just making the site better.
How to best allocate your time and money to achieve an increase in visitors has changed. I used to spend a lot more time on these very forums than I do now, I'm less fixated on the facets and changes of the Google algorithm than I was - I miss it, but I'm better for it.
There many rather a lot of algos which are not yet revealed by google, obviously google is sharp enough to keep its own integrity intact otherwise google will be owned not by google but the bloggers and seo-experts :) alot more interesting things which usually people do not talk about much that is author-identification, and article verification tools by google webmaster tools, i think if one could buy 10 domains and in a time of one year he can implement all google friendly techniques and can get results out of these techniques, this is not a one day or one month game.
Missed one thing in my previous post, we should keep in mind that google needsd us as badly, as we do need google :)
Recently I was improving one section of website to make it better, I expected improvement in ranking for those areas but to my surprise I got improvement in another area where fresh content or improvement was not done since long.
It prompted me ( some other incidents also ) to think google has one confuse module which takes care of its secretes and it never allows any webmaster to conclude anything or to arrive in any cause and affect conclusion. Probably after every evaluation process or ranking changes they may be executing this confuse module to protect their ranking factors from exposing.
So I decided not to focus in one area for long time and keep rotating or working on different areas of the same website so some time or other one improvement will be brining result.
It's that "correlation of many signals" factor that wipes out easy testing for cause-and-effect. You can have two sites that are apparently in the same situation (or at least very close.) A specific action seems to help one but has no effect for the other.
A big piece of this has causes in the area of taxonomies. Clearly, all kinds of signals will vary for different markets and different query terms.
Even something as simple as whether grammatically complete sentences matter, or as complex as the overall site history. The degree of variation that is allowed or expected is calculated regularly. If one slice of the web is a real gun fight, then all kinds of things may slip by that aren't going to pass muster in a quiet backwater.
It does seem that our time is better spent "chasing our customers" instead of chasing the algorithm. And really, for me that's a kind of relief. Search results are only a middle man - a mean to an end. Spending more time in direct work just feels right. But when something goes wrong with Google traffic, that's when the frustration kicks in.
|It does seem that our time is better spent "chasing our customers" instead of chasing the algorithm. And really, for me that's a kind of relief. |
I agree completely and for at least two years I have done no active 'link building' and have focussed instead on content and user experience (with mixed success).
But there are big problems with this 'user satisfaction' approach, due to the lack of a level playing field that I think can not be explained by any 'normal' ranking factors
- Google give a boost to big-brand sites out of all proportion to their content, returning almost empty or irrelevant pages or multiple pages from the same domain from a big brand rather than an informative page from a less known site
- many of our competitors are still doing widespread link building and freely interlinking existing sites, and getting away with it, and I am convinced they are benefitting from it (i.e. google is not just ignoring the links). We might have the moral high ground but they've got the income!
- even with excellent content it is now very difficult to gain traction for a new site without any kind of 'unauthorised' link building. If a page is never shown in the results it doesn't attract natural links!
So the challenge is, without a level playing field and with no realistic chance of employing SEO techniques any more, what should a site do to succeed?
The Human Factor
I just wanted to add this ever important piece of Google's puzzle into the mix. Gaining an important top 10 keyword ranking is rumored to guarantee a human review and even if your site is worthy it may not be allowed to hold that ranking. Why not? Because Google wants a mix of result types and you might just have ranked for one they feel is better covered already. In other words you do not fight 10 sites to gain top 10, you fight perhaps 1 or 2 for the given flavor of your articles. Also, algo's only remove malware and adult material from standard searches, humans dole out the penalties.
The above requires human eyes, always will.
@Sgt_Kickaxe: agreed, I've seen this and I've even seen affiliate sites move up to top positions whilst merchant sites they were delivering visitors too were devalued, because (IMO) the affiliates were delivering a better and, perhaps more important, very different visitor experience. This has happened for search terms where I know, from hard experience, that there is human inspection of the top sites. I believe that "me too" type sites will become progressively less valued in the future and unique ones will thrive, whether they are merchant or affiliate; I have no evidence for this, just a gut feeling. This doesn't of course apply to certain big brands which can still, at present, hold top spots as affiliates without offering anything of real value.
I notice that my own behavior in evaluating sites (whether my own, my clients, or sites I just visit or use) has become a lot more holistic; I've kind of been brought (or forced, ork ork) to look at the whole picture, including all the aspects of the site, the usability, the business model as one big whole, instead of focusing on one or two elements, like link profiles, on page, shopping cart, whatever. My poor brain.
Sounds like we all need SEO rehab.
Google's search department is in the business of identifying "good" websites, but if webmasters are creating websites that rank highly in Google, that creates a bit of a feedback loop.
"Create websites for you visitors, not for Google" isn't some kind of distraction they're throwing out. Sure, the vagueness of the statement can be frustrating, but I think that's because they honestly do not want to dictate to webmasters what makes a good, popular, relevant website. They don't truly know.
Basically, I get the sense that Google wants you to tell them what makes a good website by actually building a website that your target audience finds valuable. You're not suppose to make sense of the algorithm; the algorithm is supposed to make sense of the web. It can't do that effectively unless people ignore it and do what's right for their audience, not what what's right for Google.
Goo's algorithms remind me of the girl friend you didn't know was crazy until after you were with her too long. If you made her mad she wouldn't tell you why and you'd find her out in the parking lot ramming her car into yours. Ignore Goo totally, do what you got to do for your visitors, and do it well.
|So the challenge is, without a level playing field and with no realistic chance of employing SEO techniques any more, what should a site do to succeed? |
At the end of the process, the bathing beaches on the 'G search' lake are going to be closed for regular webmasters. IMO, We are now just leaving on the edge of a disappearing world (i.e. we, small web businesses).
Actually, I think the majority of modern webmasters are quite unlike the pioneers who had the original “mental model” of Google. A lot of current members have learnt the ropes through second hand resources, like WebmasterWorld. That is where the checklist approach came from- a communication or teaching technique that is easily understood.
I've been continuing to think about this thread - really pondering it. This area - making some sense of the algo - is the crux of what every webmaster who wants Google Search traffic needs to be focused on.
Gone are the days where we could build a checklist and trust the tradition of analysis that had been dome for us by others. Why is that? The algo is so complex that we cannot effectively test the whole thing, reverse engineer it, in other words. How can we when every market niche, every kind of site, even every type of keyword may be treated differently!
We don't even know what the individual factors are that Google measures. My guess is that they are measuring WELL over 1,000 items. Some of them are in use, some are not in use right now but are being watched and tested. Others may pop into the active algorithm tomorrow, and others may drop out or change their relative weight.
So what can we do - how can we conceive of this challenge? As others have noticed, Google is no longer a conventional "search engine". Instead, they seem to be building an artificial intelligence engine that measures how well each website does what it set out to do. This effort is in its infancy, and it certainly is subject to mis-steps. But I doubt that they are going to back off in any way.
It's not that the items on our conventional checklists no longer matter - they certainly do. It's just that all the technical tweaking in the world cannot do the whole job anymore when Google is actively trying to measure things like quality and user engagement!
What this means to me is that we cannot afford to cut corners anywhere. An online business used to look like a much easier way to make money than a conventional physical business. But it looked that way to a lot of people and the competitive floodgates opened up. No more working a ten hour work week and watching the bank account mushroom. That may have been the case early on, but not so much now - and probably not at all in the future.
So again, the bottom line for me is that we cannot afford to cut corners anywhere these days. We don't even know which corners are under surveillance!
|I've been continuing to think about this thread |
I have too. I think the issues you are raising are of great significance. I'm a little surprised the thread hasn't drawn more interest, but perhaps that's because there are no easy answers.
|The algo is so complex that we cannot effectively ...reverse engineer it |
Very true. Actually, I think that has increasingly been the case for years -- but for a while the illusion persisted that "beating" the algorithm was still possible -- because so many elements of the "checklist" continued to work, and neglecting the checklist was a sure path to failure.
As well, for at least the past 5 years, many of the most successful web businesses were successful because of their SEO skills while some of the least successful strategies seemed to be the ones that followed the philosophy of ignoring the search engines and trusting that if you build a good enough website, everything else would follow.
|we cannot afford to cut corners anywhere. |
Not sure exactly what you mean by "cutting corners" If you mean it is no longer feasible to to build a successful long-term online business while putting very little effort into key aspects of the actual website and focusing entirely on trying to "beat the algorithm" I think you might be right -- or at least this will soon be the situation, if current trends continue for a bit longer.
But the problem is much deeper than simply not being able to "cut corners."
We are participating in a worldwide market with extremely low barriers to entry, and a "gold rush" mentality which continues to attract a lot of new entrants even when most of the earlier entrants failed to find any gold.
Already we've reached the point where there are millions of websites and billions of pages. Every nook and cranny is already filled with a half dozen (or hundreds!) of websites fighting for a small share of the same pie.
If you spare no expense building the "best" website, Google might put you a bit higher than the "average" competing website in some of the SERPs for some of the keywords. But there are so many competitors, the extra traffic you gain from building a great website won't necessarily be anywhere near enough to to justify the much greater investment required to be the "best" in your niche.
|Google is actively trying to measure things like quality and user engagement |
Agreed. But, I've not yet seen any evidence that Google can consistently detect the difference between a "great" page or a "great" website and a mediocre one -- so the added investment is risky and may not be rewarded.
But, part of the problem is that they aren't yet consistently succeeding in this attempt. From what I can tell, Panda was successful at pushing down a lot of the lowest quality content, but it wasn't very successful at detecting the difference between "great" content and "mediocre" content.
Compounding the difficulty, some of the elements of Panda seem to be increasing the visibility of the biggest, most famous participants at the expense of everyone else.
In many of the SERPS we now see Wikipedia, "official" sites and "famous" sites filling most of the top 10 slots.
I suppose the solution is to become famous, but there isn't necessarily a clear path to this goal. In most markets it isn't feasible to spend millions marketing on TV, and most of us aren't fortunate enough to have an existing world famous brand name. Nor is there any predictable path to becoming a traffic lottery winner like Yelp or Pinterest (except by spending so much on lottery tickets that winning is losing),
Tedster: I agree with your comment in the WSJ thread:
|best thing any small publisher can do, IMO, is put renewed emphasis on producing top quality |
I don't think Google's doing a very good job distinguishing between great content and mediocre content, but it makes sense to anticipate that they will figure it out at some point -- and that they will increasingly supply superficial/simple content themselves, rather than sending users somewhere else to read basic facts.
All strategies have risks, but it seems to me if you focus on having high quality and greater depth, you have a chance of pulling away from the pack (move higher than the competition in the SERPs) if they ever figure out the difference, and you reduce the risk that Google will displace you by answering user's questions directly.
Just as a quick aside --
Even as Google (as econman points out) has a problem distinguishing between great and mediocre content, so do regular viewers.
In my "vertical," sloppy, inaccurate, tabloid reporting invariably gathers more hits and more discussion and link-sharing than accurate reporting. Indeed, one of the big players got that way by consistently making viewers so angry with their insanity, that they'd spread their indignation around the net, thereby guaranteeing backlinks and real viewers.
Unfortunately, favoring large sites is supported by viewers. Most want one or two big sites in any given field ... like Google for search, eBay for auctions, PayPal, etc... the hope is to be a big fish in a small pond, and then grow your pond, if you don't have a lot of time/money to invest.
In my arena, Wikipedia is and has long been a major threat... they invariably end up at the top of just about any search.
But I've lived through a LOT of these. Yahoo put a scare into a lot of us when they started charging $300 for REVIEWING a link (talk about greed!) -- and Yahoo died because they no longer had relevant links. Open Directory was a boon at first, then under mountains of spam they got increasingly weird, fired editors for no reason and with no recourse, made it nearly impossible to become an editor in the first place, and died for the same reason as Yahoo: they couldn't keep up. (ODP was also always extremely slow-loading.)
And of course there was goto.com, later Overture, which started the "bid for placement" game that Google latched onto. This spawned many imitators and it soon looked as though we'd be buying every non-repeat, non-referral visitor. (What made Google succeed was its LACK of greed: they didn't displace all the natural search hits, and they invited webmasters to display their ads, sharing revenue 50/50 or 60/40, while the BEST you could do with goto.com was 20/80 since they'd give you a penny per search, and they got 5 cents up to several dollars per click!).
As a side note, I've found my site has plummeted over the last few months in the rankings, though I've done no "black hat." I was told the problem was my ads are too big, though I've been following G's advice. I did discover a bug in Webkit that sometimes makes the ads REALLY BIG and that might have killed me.
| This 33 message thread spans 2 pages: 33 (  2 ) > > |