Google's Gary Illyes: Rankbrain has no effect on crawling nor indexing

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google's Gary Illyes: Rankbrain has no effect on crawling nor indexing

engine

12:58 pm on Mar 22, 2016 (gmt 0)

Google's Gary Illyes has tried to clarify Rankbrain, and has explained it has "no effect on crawling nor indexing" and is used for Google to "understand queries better."

He went on to say, "it does change ranking, " and gave an example that Google is "better at getting relevant results for negative queries."

[twitter.com...]

goodroi

1:21 pm on Mar 22, 2016 (gmt 0)

So apparently there is a reason it is called Rankbrain and not called Crawlbrain or Indexbrain since it has no impact on Google crawl or indexing.

aristotle

6:41 pm on Mar 22, 2016 (gmt 0)

Didn't someone at google suggest that RankBrain is self-learning? If it is, this might be a kind of "trial and error" process by which it learns from its mistakes, so that the relevance of the search results should gradually improve over time. But I haven't seen much discussion about this.

Andy Langton

7:52 pm on Mar 22, 2016 (gmt 0)

I believe a lot of this is PR - Google have not had any real innovation that has captured imagination since PageRank, and talk of "AI" and "self-learning" seems to be pointed in that direction.

That said, in terms of specifics, from what I've heard so far it seems like Google want us to see RankBrain as a change from Crawl >> Index >> Rank into Crawl >> Index >> Rankbrain >> Rank.

If that's correct, then it would seem logical that RankBrain was automating a response to the various ranking tests we know Google conducts, and adjusting the (existing) ranking criteria to suit. Perhaps it's also taking over the job of query pre-processor. This would also fit with Google's "automate everything" approach. As far as how this affects SEO, I don't really see how it does in any substantial way. If anything, I think we're likely to see more errors on Google's part in the short term. I think I've seen a few in recent months!

glakes

12:49 am on Mar 23, 2016 (gmt 0)

I believe a lot of this is PR - Google have not had any real innovation that has captured imagination since PageRank, and talk of "AI" and "self-learning" seems to be pointed in that direction.

Good point. I have another theory along the same lines. Instead of blaming anyone at Google for search results that appear designed to funnel businesses into adwords, webmasters can blame RankBrain instead. Googlers will sleep better night if webmasters start blaming an inanimate object. It also helps Googlers sleep better at night having been fed from a free 24/7 gourmet buffet while many small businesses owners are engaged in a struggle to just feed their families.

JS_Harris

9:55 am on Apr 3, 2016 (gmt 0)

I'm having to add more and more negative words to my searches over the past month. I've also been seeing corrected exact match queries. I search for "tutorial for building widgets" with quotes and the results contain variances of that, such as 'tutorials for...' and 'widget tutorials' etc, all highlighted within the result title/description. It seems rankbrain is taking liberties and ignoring exact match searches to a small degree.

martinibuster

2:06 pm on Apr 3, 2016 (gmt 0)

RankBrain only affects 15% of search queries Google's never seen before. Has that changed?

Also, it's incredibly sad that someone who is in the web business would be so confused to even ask if RankBrain had anything to do with crawling.

Andy Langton

2:17 pm on Apr 3, 2016 (gmt 0)

RankBrain only affects 15% of search queries Google's never seen before. Has that changed?

That's the 15% of daily queries that have not been seen before - not 15% of unknown queries. And Google have stated a few times that it's the third most important signal, however you interpret that.

martinibuster

7:04 pm on Apr 3, 2016 (gmt 0)

I agree with you Andy, there's a lot to think about here.

That's the 15% of daily queries that have not been seen before

Right, that's exactly what I posted and understood.

- not 15% of unknown queries.

Well, if Google's never seen the query before then doesn't that qualify as an unknown query? That is precisely what RankBrain is supposed to tackle.

All other queries can be understood in terms of intent because similar queries have been studied and classified for that purpose. Past user interaction with the search engine has determined what kinds of sites tend to satisfy search users and those are the sites that, in general, get selected from the Modification Engine part of the algorithm- not always the Ranking Part of the Algorithm.

Google have stated a few times that it's the third most important signal, however you interpret that.

That's a curious statement out of Google, isn't it?

IF RankBrain plays a major role in just 15% of search queries, within that context it could be seen as the third most important algorithm process overall to Google. In theory, it may represent 15% or more of referral traffic from Google but there's not been a major shift in referral traffic for most people. If Google hadn't said anything in October, nobody would have noticed. In fact, RankBrain had already been in use for several months and nobody had noticed. So in terms of a "major role," considering nobody's noticed a major change, from our perspective, RankBrain is a fairly minor algorithm with respect to referral traffic trends.

The RankBrain Signal
Now let's turn to the part of RankBrain being a signal. RankBrain is an algorithm. Signals are generally bits of data that are processed by Algorithms. But an Algorithm isn't normally thought of as a signal, not in the way we use the word.

The concept of RankBrain being a signal comes straight from a search engineer, introduced in the Bloomberg article that broke the story. [bloomberg.com]

RankBrain is one of the �hundreds� of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

So it's an algorithm that generates a signal, a data point, that then determines what the SERPs are going to resemble. That is similar to the determination that a query has a local intent and that a searcher's geographic location should determine what the SERPs resemble. In 2013, local Intent queries represented anywhere from 20 - 33% of search queries, (citation) [blumenthals.com], 43% if you consider research from Chitika. Considering the evolution of search trends as a consequence of smartphone use, it's possible that number is even higher. However, couldn't it be said that the local intent of a user is a more important signal than RankBrain?

I'm not 100% sold on the idea of RankBrain being the third most important signal to web publishers. To Google it may be. But to us, RankBrain has literally been a non-event when put into the context of traffic trends.

Andy Langton

10:58 pm on Apr 3, 2016 (gmt 0)

Martinibuster - I imagine we're thinking upon similar lines. The 15% number was widely publicised by Bloomberg back in 2013, in the context of this being Google's next big challenge. The video was here:

[bloomberg.com...]

At that point, Google apparently felt like they were addressing 1% of "unknown" queries correctly and were seeking a solution. So, RankBrain "adjusts" those queries to make them understandable.

In the general trend of Google ramping up query rewriting, and with RankBrain seemingly the query-rewriting method of choice, I don't think it's insignificant. If anything, I think it explains why keyword, keyword, keyword doesn't work like it's supposed to. Google didn't match keyword, keyword, keyword. Maybe RankBrain is the reason, maybe it's entities instead of keywords, maybe it's something else or a combination of factors.

To digress and offer a little opinion, I think Google's approach to query rewriting is fairly weak. It pretty much amounts to mapping unknown keywords to known results. This encourages mediocrity and is also one of the reasons "old school" SEO is rampant in results for major players, where it still carries more weight than many expect. I also think it's one of the reasons why smaller site owners with "old school" SEO expectations find their efforts thwarted.

iamlost

11:56 pm on Apr 3, 2016 (gmt 0)

Let's not get carried away with fancies about RankBrain.

Put simply it is Google's initial 'in the wild' attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand. This includes stop words (discarded/discounted previously), differences implied by capitalisation, contextual text strings...

It is currently third (so stated) in importance because it is feeding into an existing structure. If it and subsequent NLP efforts prove viable I expect it to become the foundation of a new (as in kind, where keyword to named entity was in degree) search algo.

NLP is the future because - finally - the machine aka SE is being adapted to our behaviours rather than us having to adapt to it's. Thanks be to mobile!

aristotle

1:04 am on Apr 4, 2016 (gmt 0)

attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand.

Most experienced searchers try to do this themselves by reducing their query to a few bare keywords. They already know that they shouldn't use "natural language" for most searches.

martinibuster

1:42 am on Apr 4, 2016 (gmt 0)

In the general trend of Google ramping up query rewriting, and with RankBrain seemingly the query-rewriting method of choice, I don't think it's insignificant. If anything, I think it explains why keyword, keyword, keyword doesn't work like it's supposed to. Google didn't match keyword, keyword, keyword. Maybe RankBrain is the reason, maybe it's entities instead of keywords, maybe it's something else or a combination of factors.

I tend to agree with you. However I believe it may be confusing things to link RankBrain to query rewriting. Query rewriting deals with synonyms. What RankBrain does, as you suggest, is focus on understanding concepts and entities. Maybe that might be closer?

I was posting about the keyword/keyword thing in WebmasterWorld around April 2015, maybe earlier and I did an interview on Search Engine Journal from PubCon last year about the keyword thing (I wasn't speaking at PubCon, just hanging with friends). Later, in the summer I published an article in SEJ about the keyword thing. Shortly before RankBrain was announced I started a thread here on WebmasterWorld [webmasterworld.com]about how planning a site around keywords might not be correct given that the search engine had changed (and many people balked, refusing to see what was before their very eyes in the SERPS).

On May 5, 2015 I submitted a panel suggestion for a popular search marketing conference about this very topic but it wasn't accepted. Here is a snippet that was spot on target, five months before RankBrain was announced:

Ranking factors used to be a matter of keywords, inbound links, anchor text and good web page construction. No longer. Many of the ranking factors of the past have been turned into signals of low quality. Scientific research on Adversarial Information Retrieval have created an entirely new set of criteria that search engines may be using to rank websites. If the phrase Adversarial Information Retrieval is new to you, then you need to attend this session.

Panda & Penguin are algorithms created to keep low quality sites out of the top ten of the search results. But they are not the algorithms that determine if a site will rank. There are more algorithms at work. This session examines cutting edge research from the last few years that describe new approaches to ranking web pages according to factors such as user intent, analyzing user experience metrics to identify relevant sites, and how machines train themselves to rate sites for quality as well as create more accurate algorithms.

One can call that presentation suggestion prescient. But it is not prescient to see what is right before your eyes. But it is most definitely a lack a vision to be blind to it. The above is just to demonstrate that much of what people are calling RankBrain has been in effect. I have been calling attention to it for a year now.

One day later, on May 6th, 2015 I started a post on WebmasterWorld about 2015 Ranking Factors [webmasterworld.com] in which I declared that keywords were a deprecated ranking factor.

Deprecated
Keywords
Focus on longtail phrases
Focus on ranking for specific keyword phrases
Lean code

Here's the point I'm getting at
Considering that RankBrain was out for several months before the October 2015 "announcement" and the SEO Industry failed to notice; and also considering that the web industry continues to not notice RankBrain effects in their referral traffic, it is probably safe to say that RankBrain is a non-event and the hand wringing about it has been overblown. No shark has ever been jumped that was bigger than the one that was jumped when someone stepped up to ask if RankBrain affects crawling and indexing.

To digress and offer a little opinion, I think Google's approach to query rewriting is fairly weak. It pretty much amounts to mapping unknown keywords to known results. This encourages mediocrity and is also one of the reasons "old school" SEO is rampant in results for major players, where it still carries more weight than many expect. I also think it's one of the reasons why smaller site owners with "old school" SEO expectations find their efforts thwarted.

The part about mediocrity is interesting. I have noticed this too for a long time and have been writing an article about it since last year. There is a certain mediocrity to the SERPs but I have strong evidence that it's something else about Google's algorithm that is at the root of that. But I'm not going to discuss it here as it's off topic.

Robert Charlton

7:21 am on Apr 4, 2016 (gmt 0)

In response to various points above, I'm going to pull some background from a number of different googler interviews, tweets, retweets, and blog articles, not citing many of the sources, as they've all kind of fuzzed together at this point.

My main Google sources end up originating, though, with Gary Illyes over a variety of channels... and also in this post with Andrey Lipattsev, the Search Quality Senior Strategist at Google (Ireland), via a sponsored Q&A session on YouTube, with Ammon Johns, Eric Enge, and Rand Fishkin. Bill Slawski, who had originally been slated for the panel, couldn't participate. The video has been cited several times by Jennifer Slegg on theSEMpost, and blogged about quite a bit.

The RankBrain section that I'm linking to below starts at c28:37 in, with Ammon Johns leading into the topic with some cautionary comments about how little we know about RankBrain... putting his thoughts also in the perspective that RankBrain doesn't stand alone. It derives from Hummingbird and various work in search vectors, entity identification, etc, that Google has been doing for some time.

Google Q&A+ #March
3/23/2016 - trt 1:04:11
https://www.youtube.com/watch?v=l8VnZCcl9J4&t=28m37s [youtube.com]

First, to address major points raised in the early parts of this thread...

(Gary Illyes) went on to say, "it does change ranking."

The meanings of "does", "change", and "ranking" get some discussion before it's settled ;) ...more precisely, that RankBrain changes Google's understanding of the query, and that "if the understanding of the query changes, {Google is} liable to show something different for the query"... but RankBrain itself is not adding any algo weightings to a page. This is essentially analogous to Hummingbird, but it's not easy to talk about because a short statement on the question can be so easily misinterpreted.

...Google's approach to query rewriting... pretty much amounts to mapping unknown keywords to known results.

Andy, as I read various descriptions of RankBrain I've been seeing, mapping to "known results" is perhaps a better description of Hummingbird.

As Andrey Lipattsev put it (my paraphrase)...

Hummingbird - those are just libraries, I suppose to some extent static libraries....

While the rest of his answer is hard to transcribe, the implication is that RankBrain is more interesting than Hummingbird from an engineering perspective, and it's more versatile. As stated elsewhere, RankBrain also can learn over time. It sounds like it's a multi-faceted operation on top of Hummingbird, I'm thinking with more complex substitution rules that include, say, entities and term vector proximity to substitute query vocabulary and to do the mapping. This model would ultimately cover both keywords and concepts, which would greatly broaden its capabilities. I think it would make for a much more scalable algo over time as Google moves toward understanding natural language.

how this affects SEO...

Right now, I can see where this might have a major effect in the identification of homonyms/homographs. This could enable the targeting of certain terms, eg, that I would have advised a client to steer clear of not too long ago.

RankBrain's ability to pick up on concepts also affects the kind of site that I advise clients to build... user-centric and useful... something I posted about in this forum with the advent of Hummingbird.

Google is dealing with several types of search inputs, ranging from keyboard input to spoken queries, all having roots in keyword keyword input. Because of mobile these are going to get more colloquial over time. In many areas the keyword-centric algo is still with us, and will be with us for some time, so I wouldn't drop my search terms just yet. But RankBrain will evolve.

And Google have stated a few times that it's the third most important signal, however you interpret that.

This interpretation also is fuzzy, but here's how they get there.... In the Q&A video, Andrey Lipattsev states that content and links are the first and second most important ranking signals (surprise ;) ) and that there's "no order" to these two, and that "number three is a hotly contested issue".

Since the algorithm doesn't work simply by multiplying the factors so that the page with the highest score wins (and RankBrain also doesn't add any algo weighting), Lipattsev looks at frequency of results to see where RankBrain seems to have affected top-ranking results. At about 30:31 into the video, using an internal debugger to see which results carry some RankBrain influence, he looks at the top ranking pages for a query and thinks out loud: "two times as often as the other thing, half time as often as another thing, so it's somewhere in there as number three." It's not at all clear to me what his base is for counting, or how this relates to the fabled 15%, but check the video and see if you can get a clearer sense of it. I should also note that the original Bloomberg story indicates that "a very large fraction" of queries returned were found to be interpreted by RankBrain to some degree.

Didn't someone at google suggest that RankBrain is self-learning?

This point was made in the original Bloomberg story, that RankBrain could trained, an important plus for Google over time.

I should add that I've read recently, but can't pin down where, that though the system can be trained, it's currently trained only off-line. This, IMO, would separate it from any Zombie-like artifacts. I have no way of telling whether a query-rewriter would also be rewriting ad queries, but from other pre-RankBrain Google caveats about departures from exact matches, I suspect it would be.

For a study of the types of queries that RankBrain is rewriting, I recommend...

RankBrain: A Study to Measure Its Impact
3/9/2016 by Eric Enge
[stonetemple.com...]

toidi

10:51 am on Apr 4, 2016 (gmt 0)

Put simply it is Google's initial 'in the wild' attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand. This includes stop words (discarded/discounted previously), differences implied by capitalisation, contextual text strings...

This makes sense as g tries to cope with voice search from mobile. The days of a few keywords typed into the search bar are going away as people ask their phones regularly spoken questions full of all the fluff that accompanies everyday language. I am shifting my efforts towards this.

Andy Langton

3:12 pm on Apr 4, 2016 (gmt 0)

Regarding training, you may be thinking of this:

All learning that RankBrain does is offline, Google told us. It�s given batches of historical searches and learns to make predictions from these.
[searchengineland.com...]

Those predictions are tested and if proven good, then the latest version of RankBrain goes live. Then the learn-offline-and-test cycle is repeated.

Another quote of interest from that one:

So when Google talks about RankBrain as the third-most important signal, does it really mean as a ranking signal? Yes. Google reconfirmed to us that there is a component where RankBrain is directly contributing somehow to whether a page ranks.

I realised that I'm using "query rewriting" in a much looser sense than I should, incidentally.

Put simply it is Google's initial 'in the wild' attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand.

Google have been doing NLP for years. Here's an interesting summary from 2013 by a Google Employee:

Syntax: we do part of speech tagging and parsing in 60+ languages. We have multiple taggers and parsers, some of which are application-specific, and which make different speed/quality tradeoffs. (You can imagine that parsing every sentence on the web takes a lot of time.)

Semantics: recognizing entities in text, matching those entities against our knowledge graph where it's possible, labeling the entities in a variety of ways, analyze coreference to figure out what words or phrases refer to the same thing, and so on.

Knowledge extraction: learn relations between entities, recognize events, match entities between queries and documents.

Summarization: Figure out the topics of a page, and generate summaries of the page. Sentiment analysis, clustering on a variety of metrics, etc.

Question answering: When is a query really looking for a piece of specific information (as in your example), and how do we find and surface that piece of information?

[quora.com...]

Identifying entities is part of natural language processing, not an old approach that is replaced by it. RankBrain may well aim to automate/improve these processes, but it isn't these processes itself.