Mods note: Thanks, mihomes, for sharing this, and also for being sensitive to the non-promotional aspect of our posting guidelines. Your example is sufficiently generic, I feel, and very definitely worth discussing. Knowledge Graph carousels... I'm assuming that these are Knowledge Graph results, results based on "factual lists", as stated in one of the announcements where Google first described its Knowledge Graph carousels....
Explore with the Knowledge Graph carousel in English globally Sept 5, 2012 [
search.googleblog.com...]
Google's early carousels were effectively lists of things, places, "named entities", etc that were classified in ways that put them in the context of easily-defined universes... eg, actors who played James Bond, hotels in specific cities, entertainment groups or credits, events, music, etc.
As the 2012 Google announcement described it...
...drawing on our Knowledge Graph and the collective intelligence of the Web
Where are they getting the data? How that Knowledge Graph and the "collective intelligence" of the web are assembled is a long discussion, and I'd be guessing at a lot of it. Google made it clear that its movie carousels, eg, were
not assembled by spidering IMDB or other sources on the web. My thoughts below are mainly about the algo that is providing structure to these new categories.
Freebase, an open source "collaborative knowledge base" [
en.wikipedia.org...] is well known to be one source of the collective intelligence, though. A lot of volunteers spent a lot of time putting it together.
Schema.org markup is another source, but note that Google doesn't simply rely on schema markup by itself. As with business Knowledge Graph listings, eg, where Google uses several channels for confirmation, the schema markup also helps Google in discovery, but relies on other confirming signals as well. Once Google has even a single data point, much less computing power is required for the rest.
How much statistical phrase-based indexing data has been used in Knowledge Graph results isn't clear. Phrase-based indexing had offered ways of using data clusters to help identify topics and entities, and... complete conjecture here... it's possible that something analogous might help provide "match points" for the Knowledge Graph, to then align with other entity information.
What's new about these results... The new "software" classification in these results is IMO a broader kind of a leap than what has come before, as "software" is less well defined than actors who have played James Bond. What I'm seeing in some cases might push what is commonly considered "software". In the "Software > websites" carousel, "AdSense", eg, is one of the logos included... and it's not what I'd traditionally call software.
I'm assuming that "AdSense" was brought up by the same algo that brought up the rest. I mention this because I doubt that Google is trying to promote its own product here, but we might look for another reason for its inclusion. Clearly, in these software lists, there is a significant size or prominence factor. I'm not sure without Machine Learning tools that we might readily spot other factors.
Also worth noting, in Knowledge Graph fashion, the query changes as the Knowledge Graph hierarchy changes. So, for the carousel display "Software > websites", the query is [website software].
If you click on "Software" above the carousel, you'll get a new, higher level carousel display, with more general types of software... like operating systems, browser names, .NET Framework, Adobe Acrobat, etc being displayed... and the query display has changed to [list of software].
The query [list of browsers] also shows an expected list. If you remove "list of" from your query, Google doesn't return the carousel. It's fairly rough, clearly a trial balloon, and, I'm guessing, also a test, as are all things Google, to see how users deal with it.
With regard to naming specifics in this discussion, I assume that these carousels are going to continue to be general enough that we can mention the queries that bring them up.