|Google Hummingbird and Related Patents|
Since it appears there's a definite shift in SEO and I haven't seen any threads or posts on these I thought it would be a good time to share them.
Bill Slawski at seobythesea.com has some great summaries for those who aren't patent readers, so I quoted/linked those as well as the patents below.
Synonym Identification Based on Co-occurring Terms
[What appears to be the Humming Bird Patent]
|The announcement of the new algorithm told us that Google actually started using Hummingbird a number of weeks ago, and that it potentially impacts around 90% of all searches. |
It’s being presented as a query expansion or broadening approach which can better understand longer natural language queries, like the ones that people might speak instead of shorter keyword matching queries which someone might type into a search box.
Evaluation of Substitute Terms
|The process used to find substitute terms focuses upon the use of the co-occurrence of words found on pages returned in response to a query, and to a potential substitute query. These candidate substitute terms might originally show up in documents ranking for the first query term, or in meta data associated with those documents. |
For example, to find a potential substitute query terms for “cats,” terms that appear in documents ranking for “cats” may be explored. One of those might be “feline.” If we perform a search for “cats”, and look through the top 10 (or top 20, or even top 100) results for words that tend to co-occur on those pages, we might see words such as “furry”, “domesticated”, “carnivorous” and ” mammal” appear on a lot of the top pages returned for that query. If those are terms that tend to co-occur often in the results on a search for “cats,” they are considered co-occurring terms.
Generalized Edit Distance for Queries
|This patent looks for co-occurring words within search sessions instead of on web pages or within search results for particular queries. |
Query terms that might be similar are selected in part on how closely they might be related semantically. For example, It’s much more likely to see “become a dentist” followed by a query for “become a dental assistant,” instead of being followed by “become a doctor.” in a set of query sessions. It’s likely that we’ll see people change their queries in such a manner when they are performing searches in a search session.
Search Entity Transition Matrix and Applications of the Transition Matrix
|These search entities can include: |
- A query a searcher submits
- Documents responsive to the query
- The search session during which the searcher submits the query
- The time at which the query is submitted
- Advertisements presented in response to the query
- Anchor text in a link in a document
- The domain associated with a document
When I sit and think about all the preceding concepts and how they could work together it seems to explain quite a bit of what appear to be inconsistencies and oddities in the results people are reporting lately.
JD_Toims - Thanks for doing all that research and putting these together in a thoughtful and coherent sequence. Thanks also to Bill Slawski, who over the years has kept on top of this material, and to whom we all owe a great debt.
I haven't had a chance to read through all of these by any means... but at quick glance the patents listed here focus on query rewriting and user intent. Hummingbird might be limited to these, but I'm thinking that it necessarily goes beyond them (and I'm guessing that you would agree with that.)
At the risk of listing what might already be included in the above, I'd say that semantic search (implicit perhaps in "search entities"), longer term personalization, and some measures of user satisfaction... building on what's come before... also play a large part in the overall algo. Hard to say with any certainty at this point in time. I also assume that it's a "learning" algorithm, which is going to evolve over time.
Thanks for the post and clearly the effort and research that ha gone into it.
|Thanks also to Bill Slawski, who over the years has kept on top of this material, and to whom we all owe a great debt. |
+1 -- Was thinking the same thing earlier myself.
|Hummingbird might be limited to these, but I'm thinking that it necessarily goes beyond them (and I'm guessing that you would agree with that.) |
Absolutely -- The last 3 I added appear related to and incorporated within, but not encompassing of, the Humming Bird algorithm. The diagram/image on the "Humming Bird Patent Application" [ here: [pdfpiw.uspto.gov...] ] mesh the 3 together for the selection process, but there are still 200 other variables [including: PageRank, Panda, Penguin, etc.] that finish making up the whole of the Humming Bird algo.
@JD_Toims - +1 Great post, appreciate the time you spent putting this together, reviewing Bill Slawski's excellent work.
|longer term personalization, and some measures of user satisfaction... building on what's come before... also play a large part in the overall algo |
Years ago I would have suggested folks build their own semantic hierarchy through the site architecture e.g. using side navigation drill downs. Landing pages would need to be specific to each query with unique titles and inbound links based on the semantic hierarchy.
Fast forward 6 years or so, and Google has built up vast repositories of data on individual sites and query intent based on what users have been typing in, and how they've been responding, making the former somewhat redundant.
Bake that in alongside Panda which focus' on user satisfaction and Penguin which removes manipulative link practices, you are getting close to a search engine that needs to be told less about what the webmaster wants to be searched and more about what Google anticipates the user intends. ( Siteowners & SEO's need to adjust their thinking to be more aligned to this, to meet their clients needs IMO ).
Overlay that with brand recognition and specific vertical differentiators [ e.g. real estate versus, say, medical ], along with social signals that cause Google to be a lot more responsive to input patterns and emerging trends.
[edited by: Robert_Charlton at 5:00 am (utc) on Oct 13, 2013]
[edit reason] added missing section, per poster's request [/edit]
|Siteowners & SEO's need to adjust their thinking to be more aligned to this, to meet their clients needs IMO |
+1 for sure -- Hopefully you don't mind if I "interpret" what you said a bit so maybe more people will fully understand what I think you mean, because I think it's a really important point you're making, but I also think it might be a bit more "dire" than it sounds in the quote -- I'd say:
Siteowners & SEO's need to adjust their thinking to be more aligned to this, to be found at all in Google IMO.
BTW: I hope everyone who doesn't sit and read the full patents linked above will at least take the time to read the summaries by Bill Slawski.
I only included snippets of the summaries he shared, so there's quite a bit more to each full summary than my citations include -- I think it would definitely be time well spent by many for themselves, and also for the sake of discussion here, because people having an understanding of WTF is going on and the direction things appear to be heading usually makes for a more meaningful conversation about a given topic IME.
This is an extremely helpful thread, JD_Toims, and my thanks to Bill Slawski as well.
I think I'm just restating what others have said, but the implications of these patents is that Google's intention is to have the engine produce the sort of results another (extremely knowledgeable) human who "knows what you mean" would give you.
Unfortunately, the securing of keyword data makes it hard for me to tell how much or how little this is impacting my sites. I'm not noticing much difference in my own searches, either, so I'm just not sure what to think. How much evidence are any of you seeing that this works the way the patents imply it will?
Great thread.. I did experience the results of this update on this saturday, on one of my site.. lost about 50% of traffic on a top branded 15 years domain. hundreds of querry placed on the first page are now way behind... Still trying to see what can be done.. and i don't see any clue right now, what im seeing is a bunch of Crappy sites on page 1 on all those querries. Im seeing a lot of good branded domain/site way off to some lower place.. But looking at all the positions I lost, im even surprise i have not lost more traffick. So this is probably another effect of hummingbird, combined this with the fact we no more see what each keyword bring in term of traffic. really hard to know which way to go to improve anything. the site already have good content, and really good site stats.
|Fast forward 6 years or so, and Google has built up vast repositories of data on individual sites and query intent based on what users have been typing in, and how they've been responding, making the former somewhat redundant. |
Therefore presumably, absolutely no necessity for anyone to go to the scraped site since Google supplies the answer free of charge and no one queries as to whether it is correct or not?
Intention or not, Google is scraping sites for ITS own benefit, not for the webmaster/company who has provided it, it is pure theft, nothing else!
Google's algo has not a clue whether ANY scraped information is correct or not, their current SERPs, and especially images, demonstrate this to perfection!
Just wanted to say thank you to JD_Toims for posting this information. It's straight to the point and provides some very good reading.