Forum Moderators: Robert Charlton & goodroi
I believe a lot of this is PR - Google have not had any real innovation that has captured imagination since PageRank, and talk of "AI" and "self-learning" seems to be pointed in that direction.
That's the 15% of daily queries that have not been seen before
- not 15% of unknown queries.
Google have stated a few times that it's the third most important signal, however you interpret that.
RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.
attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand.
In the general trend of Google ramping up query rewriting, and with RankBrain seemingly the query-rewriting method of choice, I don't think it's insignificant. If anything, I think it explains why keyword, keyword, keyword doesn't work like it's supposed to. Google didn't match keyword, keyword, keyword. Maybe RankBrain is the reason, maybe it's entities instead of keywords, maybe it's something else or a combination of factors.
Ranking factors used to be a matter of keywords, inbound links, anchor text and good web page construction. No longer. Many of the ranking factors of the past have been turned into signals of low quality. Scientific research on Adversarial Information Retrieval have created an entirely new set of criteria that search engines may be using to rank websites. If the phrase Adversarial Information Retrieval is new to you, then you need to attend this session.
Panda & Penguin are algorithms created to keep low quality sites out of the top ten of the search results. But they are not the algorithms that determine if a site will rank. There are more algorithms at work. This session examines cutting edge research from the last few years that describe new approaches to ranking web pages according to factors such as user intent, analyzing user experience metrics to identify relevant sites, and how machines train themselves to rate sites for quality as well as create more accurate algorithms.
Deprecated
Keywords
Focus on longtail phrases
Focus on ranking for specific keyword phrases
Lean code
To digress and offer a little opinion, I think Google's approach to query rewriting is fairly weak. It pretty much amounts to mapping unknown keywords to known results. This encourages mediocrity and is also one of the reasons "old school" SEO is rampant in results for major players, where it still carries more weight than many expect. I also think it's one of the reasons why smaller site owners with "old school" SEO expectations find their efforts thwarted.
(Gary Illyes) went on to say, "it does change ranking."The meanings of "does", "change", and "ranking" get some discussion before it's settled ;) ...more precisely, that RankBrain changes Google's understanding of the query, and that "if the understanding of the query changes, {Google is} liable to show something different for the query"... but RankBrain itself is not adding any algo weightings to a page. This is essentially analogous to Hummingbird, but it's not easy to talk about because a short statement on the question can be so easily misinterpreted.
...Google's approach to query rewriting... pretty much amounts to mapping unknown keywords to known results.Andy, as I read various descriptions of RankBrain I've been seeing, mapping to "known results" is perhaps a better description of Hummingbird.
Hummingbird - those are just libraries, I suppose to some extent static libraries....While the rest of his answer is hard to transcribe, the implication is that RankBrain is more interesting than Hummingbird from an engineering perspective, and it's more versatile. As stated elsewhere, RankBrain also can learn over time. It sounds like it's a multi-faceted operation on top of Hummingbird, I'm thinking with more complex substitution rules that include, say, entities and term vector proximity to substitute query vocabulary and to do the mapping. This model would ultimately cover both keywords and concepts, which would greatly broaden its capabilities. I think it would make for a much more scalable algo over time as Google moves toward understanding natural language.
how this affects SEO...Right now, I can see where this might have a major effect in the identification of homonyms/homographs. This could enable the targeting of certain terms, eg, that I would have advised a client to steer clear of not too long ago.
And Google have stated a few times that it's the third most important signal, however you interpret that.This interpretation also is fuzzy, but here's how they get there.... In the Q&A video, Andrey Lipattsev states that content and links are the first and second most important ranking signals (surprise ;) ) and that there's "no order" to these two, and that "number three is a hotly contested issue".
Didn't someone at google suggest that RankBrain is self-learning?This point was made in the original Bloomberg story, that RankBrain could trained, an important plus for Google over time.
Put simply it is Google's initial 'in the wild' attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand. This includes stop words (discarded/discounted previously), differences implied by capitalisation, contextual text strings...
All learning that RankBrain does is offline, Google told us. It’s given batches of historical searches and learns to make predictions from these.
[searchengineland.com...]
Those predictions are tested and if proven good, then the latest version of RankBrain goes live. Then the learn-offline-and-test cycle is repeated.
So when Google talks about RankBrain as the third-most important signal, does it really mean as a ranking signal? Yes. Google reconfirmed to us that there is a component where RankBrain is directly contributing somehow to whether a page ranks.
Put simply it is Google's initial 'in the wild' attempt to translate Natural Language queries to something ye olde keyword/named entity driven search algo can understand.
Syntax: we do part of speech tagging and parsing in 60+ languages. We have multiple taggers and parsers, some of which are application-specific, and which make different speed/quality tradeoffs. (You can imagine that parsing every sentence on the web takes a lot of time.)
Semantics: recognizing entities in text, matching those entities against our knowledge graph where it's possible, labeling the entities in a variety of ways, analyze coreference to figure out what words or phrases refer to the same thing, and so on.
Knowledge extraction: learn relations between entities, recognize events, match entities between queries and documents.
Summarization: Figure out the topics of a page, and generate summaries of the page. Sentiment analysis, clustering on a variety of metrics, etc.
Question answering: When is a query really looking for a piece of specific information (as in your example), and how do we find and surface that piece of information?