Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google : Rethinking Search: Making Experts out of Dilettantes

         

engine

3:25 pm on May 18, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Google has published a paper in which it argues that search as it stands today may not be the search of the future. The current model requires ranking to play a part in providing an answer, and then places "a rather significant cognitive burden on the user." Today's search services, including Google are wanting to provide answers instead of purely ranked results.Google goes on to say that this has had limited success.
In addition, is explains that there has been much progress into natural language understanding, including "word-embeddings", "sequence modelling, large pre-trained language models which capture relationships between entities.
This paper envisions a unified model-based approach to building IR systems that eliminates the need for indexes as we know them today by encoding all of the knowledge for a given corpus in a model that can be used for a wide range of tasks. As the remainder of this paper shows, once everything is viewed through a model-centric lens instead of an index-centric one, many new and interesting opportunities emerge to significantly advance IR systems. If successful, IR models that synthesize elements of classical IR systems and modern large-scale NLP models have the potential to yield a transformational shift in thinking and a significant leap in capabilities across a wide range of IR tasks, such as document retrieval, question answering, summarization, classification,recommendation, etc.


When you read this paper, it's quite clearly proposing the idea of an alternative way of searching and answering. Instead of document retrieval from an index, it's discussing pre-trained language models, model-based information retrieval, and even beyond language models.
If all of these research ambitions were to come to fruition, the resulting system would be a very early version of the system that we envisioned in the introduction. That is, the resulting system would be able to provide expert answers to a wide range of information needs in a way that neither modern IR systems, question answering systems, or pre-trained LMs can do today.Some of the key benefits of the model-based IR paradigm de-scribed herein include:
•It abstracts away the long-lived, and possibly unnecessary,distinction between “retrieval” and “scoring”.
•It results in a unified model that encodes all of the knowledge contained in a corpus, eliminating the need for traditional indexes.
•It allows for dozens of new tasks to easily be handled by the model, either via multi-task learning or via few-shot learning, with minimal amounts of labelled training data.
•It allows seamless integration of multiple modalities and languages within a unified model.

Here's the paper (PDF) [arxiv.org...]

NickMNS

4:10 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



TL:DR Google doesn't need websites any more. Users can query Google and they will provide the ad.




Whoops... I meant, provide the answer!

lucy24

4:32 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The other day I did a search in the form “when did suchandsuch event take place?”* The People Also Ask section of the results included one about “suchandsuch event in {year}”. This was almost too meta for me.


* Because I've got a fixed mental block on whether it happened in 1905 or 1907. By the time this post is visible, I will again have forgotten.

zeus

4:38 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm does this mean google will stop with traditional search results, I will not have a problem with that, be cause I think many still want to make a classic search for sites, then on duck.com, ecosia.org,...

saladtosser

5:00 pm on May 18, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



When this happens I think webmaster world (before shutting down) should arrange a whip round for Danny Sullivan, as his job like most of ours will be irrelevant ;)

NickMNS

5:58 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hmm does this mean google will stop with traditional search results

It depends on what you consider "traditional search results". One could argue that it isn't "will stop..." but rather "has stopped...".

In my opinion the most concerning part of this paper is not that it will lead to some future state but rather a confirmation of the current state. The progressive erosion of search results with things like "knowledge graph" and "people also asked" are, as suspected, more than a convenience but the direction the company is headed.

This raises the question, how far can they push this? At some point copyright must be kick in, maybe it already has, as we have seen with online news. But how far can Google as a monopoly push passed that point. Pushed to the extreme, the system must break, content creators will stop creating and Google will be left with stale and obsolete information. What will the impact be on society? On knowledge?

Will something else emerge? Peer to peer, decentralized?

Wilburforce

6:48 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



IR systems that eliminates the need for indexes as we know them


I know them as indices, which is probably why Google doesn't understand what I am looking for.

JorgeV

9:21 pm on May 18, 2021 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

I am sure that Google will propose Wikipedia-like page, as the answer of any query. Eventually, they'll put the sites used as sources in the tiny prints, or they can simply "Cohort" the sources.... and as for copyright infrigmnent, do not count on it, Google is smart enough to train its engine with factual information, and mix them.

Average users won't mind, in fact, they might prefer this. Remember when the Google image search started linking images to the page , instead of the image itself, and user went mad ...

superclown2

10:23 pm on May 18, 2021 (gmt 0)



So Google is to become the arbiter of what is true and what is false, what is right and what is wrong? If they are allowed to do this the human race deserves everything it gets. Where is James Bond when we need him?

Wilburforce

5:56 am on May 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All you need to do is ask a question in any forum - including this one - to see that humans bring many perspectives and answers to the same question. Some answers explain it better for the layman, some better for the expert, some are we--informed, some not, some are "correct", some not.

Related to this is a point Timothy Leary once made: (paraphrasing) to know what it is like to be a reptile you have to leave your human consciousness behind.

The aspiration to speak as human's speak and answer questions as some kind of superhuman who provides all answers for all people is ridiculous. All the AI in the world won't give a machine the nose of a dog.

The move from returning highly relevant results from text queries to a dataset - in which Google excelled - to trying to answer questions as a superhuman would has led (in my view and experience) to a significant weakening in returning relevant results.

As for their ambitions, Google could try listening to Macbeth:

"I have no spur
To prick the sides of my intent, but only
Vaulting ambition, which o'erleaps itself,
And falls on th'other. . . ."

FranticFish

7:31 am on May 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The move from returning highly relevant results from text queries to a dataset - in which Google excelled - to trying to answer questions as a superhuman would has led (in my view and experience) to a significant weakening in returning relevant results.

Absolutely.

Since Google took away keywords for Organic Analytics it's been impossible to measure how good their AI is. Without the search phrase you're locked out.

However, because you can still get relatively complete data via Google Ads, it's possible to see how good their AI is there - and with their new 'close variant' nonsense you can see what they consider relevant to your keywords - and they haven't a clue. The AI (at least in practice) has NO idea between transactional and informational intent, nor has it any clue that certain modifiers (modifiers they now foist on you) COMPLETELY change the intent of a search phrase. They've gone from suggesting what they see as related keywords, to just showing you for them.

Of course, this could just be corporate greed rather than a lack of ability to determine intent, and perhaps arrogance : they're clever, they know best. Or perhaps they've decided they'll let you help their AI learn while you pay them for the privilege via clicks you never wanted.

In practice, it makes no difference. People and organisations are NOT what they say they're doing (or trying to do), they're what they actually DO. And what they're doing is trashing one of their two core purposes: relevancy.

engine

9:52 am on May 19, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Remember, this is a paper about the future of search, and Google is an advertising company. Somehow, Google will have to monetize this. The questions I have are: Where would the ads go, and what are the ad opportunities this could create. AdSense would not work with this.

It's more likely this will be developed alongside current search, and it may be offered as an alternative search service, not a replacement.

Google will have to join the dots before it becomes mainstream.

Achernar

2:43 pm on May 19, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



As an "advanced" user perspective ; using groups of words, including/excluding terms/websites ; I've found that their search results have become more and more filled with irrelevant data.
In the past (probably 10 years ago) I regularly had results with no entries at all - which was perfectly logical. But now, google systematically returns results for single words when I specifically asked for results for a group of words. And even worse, for words that slightly resemble one of them.

If their future system is based on that, it does not bode well for us.

samwest

2:57 pm on May 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All your base are belong to us.

lucy24

3:35 pm on May 19, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But now, google systematically returns results for single words when I specifically asked for results for a group of words. And even worse, for words that slightly resemble one of them.
When doing an exact-match search, it is reasonable to "put the whole thing into quotation marks". But who's going to put "each" "separate" "word" "into" "quotation" "marks" just to convince the search engine you really mean it?

I routinely get results that omit one of my search terms, each with the fine-print option of "must include"--which, yes, redoes the search with the previously omitted term in "quotes". Dammit, search engine, if I use a particular word in my query, it's because I want the search to include that word. It really seems as if this kind of thing should be a Search Preference, stored permanently just like your preferred language, or number of results. (And, if so, "I really mean it" ought to be the default assumption.)

Achernar

7:19 pm on May 19, 2021 (gmt 0)

10+ Year Member Top Contributors Of The Month



When doing an exact-match search, it is reasonable to "put the whole thing into quotation marks". But who's going to put "each" "separate" "word" "into" "quotation" "marks" just to convince the search engine you really mean it?

It used to work perfectly, but not anymore. I'm using exact-match (between quotes), but even in that case I still receive results with incomplete match (only one word of the sentence match).
IIRC forcing a word-match was also possible by prefixing a word with a + sign. Not anymore.

aristotle

1:01 am on May 21, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A lot of searchers are just looking for one little tidbit of specific information. There are also a lot of people that rarely read detailed articles unless perhaps they are very short. At best these people might skim through a longer article mostly looking at headers and glancing at images. Most likely google can frequently identlfy these types of searchers, both from their past behavior and also from the characteristics of the search term. So it often may already tailor its search results and their presentation accordingly.

JorgeV

12:01 pm on May 21, 2021 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

a lot of people that rarely read detailed articles

There is a lack of attention from people, and also, they mostly scan a page, instead of really reading it, as you said, ... but publishers also share a responsibility ... the holy SEO rule, about how many words an article needs to be made of, is making that lot of publishers are writing long but uninteresting, boring, articles, by stretching the text, with tons of satellite information or rehashing.

Today, I wanted to know all the possible values for the "Referrer-Policy", so beside the MDN page, I checked other sites, and one of them, had an endless page, about it, beginning by explaining what is a referer, what is a web page, what is a server header, what is a meta tag, then a history of the "Referrer-Policy", and finally, the list of possible values...

martinibuster

1:22 pm on May 21, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Wow, so many of you have no idea about what this is about and so this entire discussion, the whole thing, is off topic. Sad. :(

The Dilettantes paper was published this month, 5/21.

That paper is addressing the problem of Long Form Question Answering (LFQA), a shortcoming in Bing and Google and Facebook, all of whom are trying to figure it out.
[searchenginejournal.com...]

LFQA is a type of query that people don't do because search engines can't handle them. So they do up to eight search queries to learn what they want to learn.

These queries require multiple paragraphs to answer, not a featured snippet or ten blue links and Google can't handle that because SEARCH is superficial knowledge, i.e. dilettante.

Google MUM, which was also just announced is Google's attempt to deal with LFQA.

Explanation here:
[searchenginejournal.com...]


Good luck.

Roger Montti

Featured image: webmasterworld
www.searchenginejournal.com
Google Research Paper Reveals a Shortcoming in Search
New research highlights a little known area of information retrieval that Google hasn't quite succeeded in: Long Form Question Answering

Wilburforce

10:17 pm on May 21, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wow, so many of you have no idea about what this is about


What this is about is the difference between querying a dataset and answering a question. As soon as Search tries to move from the former to the latter, it moves from trying to do what machines do easily and well to trying to supplant what people do easily and well. The amount of processing required for number-plate recognition - while substantial - is trifling compared to the amount required for facial recognition, but most people recognise someone they know instantly and effortlessly.

SImilarly, people are pretty good at recognising what is being asked and what kind of answer will satisfy the question - whether (as in your linked examples) it is a "search for knowledge" question or an exam question - but to set out the parameters for a machine to do it is far more complex, and the results - insofar as we see evidence of Google and other SEs trying to do it - are woeful. It is obvious without the need to submit a paper on it that it is difficult to write a program that can answer the kind of question any third-form student would find easy.

For me, it is simple: the purpose of programming machines is to facilitate tasks that I don't find easy - like storing and retrieving individual items that comprise large bodies of data, or performing iterative computation. As soon as someone gets the idea that its purpose is to outperform me in tasks I already find effortless, they stop trying to make my life easier and start trying to make it redundant. When Google helped me do what I couldn't, Google was useful. If, instead, Google wants to compete with me for what I can already do, it is time to throw in my hand and go elsewhere.

Make no mistake: "What this is about" is the point at which - if you are able to throw enough processing resources at the problem - a machine can tell me this is my neighbour's face. I already knew that.

[edited by: Wilburforce at 10:57 pm (utc) on May 21, 2021]

Wilburforce

10:34 pm on May 21, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...and to distill and clarify...

matinibuster says this is a technical question we don't understand.

I say it is an existential question he doesn't address.

martinibuster

10:55 pm on May 21, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>>>The amount of processing required

Google solved that.

The details are in the second or third paragraph of the article about MUM that I linked to above. You might want to give it a close read so you can understand what this paper is really about.

It's a very easy to understand article, even non-search people will be able to pick it up.

MUM is 1,000 times more powerful than BERT, which means that the resource cost is substantially lower.

The dilettantes paper and the MoSE paper I discuss appear to be very much related to MUM. But so are a number of other new technologies that work together to speed things up and scale. It's not just those two.

Wilburforce

11:23 pm on May 21, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@martinibuster

Yes, but it isn't about processing or technical issues, it is about what the processing and technical wizardry are trying to do. For example:

I have a Thing.

What does it do?

Well, for example, if you bring it your bag of strawberries, it will find in that bag of strawberries the sweetest and most succulent one.

What a wonderful Thing! I will buy into your Thing.

Yes, and now you have bought into it, I can Improve it.

How will you Improve it?

Well, for example, if you bring it your bag of mixed fruit, it will tell you which fruit you prefer.

Whether or not "non-search people" can pick it up, and regardless of the underlying engineering complexity which is self-evidently beyond most of us, the fundamental question is whether - if all the processing power is available, and all the technical limitaions are overcome - Google. Can. Do. Everything.

It really doesn't matter what the answer is. A soon as that aspiration is on the table, Search/SEO/whatever-this-forum-is-about become irrelevant. We're no longer creating content for SEs or our users, but competing with Google for a need to exist at all.

martinibuster

5:12 am on May 22, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



We're no longer creating content for SEs or our users, but competing with Google for a need to exist at all.


I understand what you mean and would share your concern and amplify it if there was something more material than fear to be concerned about.

We are not yet at the point where we can point to anything as an existential threat. There is literally nothing there to point to, to say, this thing is a threat. There is no thing there to be afraid of.

It's a non-existent threat. There is no basis for worry, nothing that will affect anyone, you're crossing a bridge that hasn't yet been reached.

We don't know what this system will look like, where it will be deployed and what the source of the data will be, so we can't even point to something that is on the horizon that is coming to eat our lunch because there is nothing on the horizon to squint at, point to and raise an alarm about.

There is literally nothing there (to worry about), much less an existential threat.

Wilburforce

12:44 pm on May 22, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There is no basis for worry, nothing that will affect anyone


Yes. Preventing a 737 from stalling induces a rather significant cognitive burden on the pilot.

aristotle

2:00 pm on May 22, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



JorgeV wrote:
There is a lack of attention from people, and also, they mostly scan a page, instead of really reading it, as you said, ... but publishers also share a responsibility ... the holy SEO rule, about how many words an article needs to be made of, is making that lot of publishers are writing long but uninteresting, boring, articles, by stretching the text, with tons of satellite information or rehashing.

I agree that many publishers write poor articles.

But my post was about people who generally won't make the mental effort to read any articles, good or bad. Many of them would rather watch videos, look at images, or wander aimlessly around the web.

Also, as I mentioned, many searchers are just looking for one little tidbit of information, and don't want to have to read an article to find it.

As for how all of this relates to the topic of this thread, the types of people I'm describing would have little use for LFQA and would likely be unable to adapt to it anyway.

FranticFish

2:25 am on May 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



We are not yet at the point where we can point to anything as an existential threat

There's probably - at best - a quarter of the amount of people posting in the parts of this site that I frequent as there were when I joined. I would say we are well into the existential threat stage.

There is no thing there to be afraid of

I see a monopoly who appear - to me - to want to change the meaning of the questions people ask. No doubt the organic algorithm and the Google Ads algorithm are very different, but the recent changes to Google Ads can - I think - be used as a weathervane for the mindset within Google. And what they are doing, as far as I can see, is changing the intent of the query with the way they group search phrases into topics.

For some time when I search (organic) if I do not get what I want and will vary my query - but without any change in the results. From Google's point of view I'm 'asking the wrong question' - but it won't let me see the answers to the wrong question I asked. Their 'we know best' attitude is starting to grate because they do not. And now it appears to be happening even with ads.

Examples (N.B. exact match):
- I am bidding on 'topic service provider' and they show my ads for 'topic'
- I am bidding on a verb and they show my ads for the noun.

This is REALLY BASIC stuff. If we're to be charitable to Google and not regard this as a cynical move to grab advertisers' money, then they have drunk their own Kool Aid.

the whole thing, is off topic

You're right, it is. But if I could simplify the discussion I would do so as follows...
'Google have published a paper about the latest big clever stuff to do with search'
People: ' Oh dear. They are starting to mess up the simpler stuff that they used to do very well.'

martinibuster

9:47 am on May 23, 2021 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The sky was falling on WebmasterWorld in 2012 when Google announced their Knowledge Base integration.

Why is it that so many people scream it's the Death of the Web every time Google announces a new technology?

It's 2021, the Internet is still here. I'm not trolling you. I'm ENCOURAGING you to consider facts and stop this repetitive nonsense.

From the 2012 WebmasterWorld discussion about the Knowledge Graph announcement:


Time proved this was wrong
The result will be that web masters who currently produce fascinating websites will never get visited, and they will just give up.

Time proved this was wrong
If they succeed, many sites will never need to be visited at all.

Time proved this was wrong
... dreaming if he thinks he can essentially steal content and claim the users of that content to be his. When the owners of that content receive no benefit from Google for having been scraped... ]

Time proved this was wrong
If Google keeps those visitors to itself, where then is the symbiotic relationship...?

Time proved this was wrong
Ignoring the web ecosystem in order to put forth a google version of "knowledge"?

Wilburforce

3:19 pm on May 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why is it that so many people scream it's the Death of the Web every time Google announces a new technology?


Death by 1,000 cuts is still Death. The fact is that some webmasters have not survived, and many others have lost rank, traffic, trade, or all three.

I disagree with the view that we have nothing to worry about: "What was once a rich selection of blogs and websites has been compressed under the powerful weight of a few dominant platforms" (Sir Tim Berners-Lee, March 2018) and "It’s not that we need a 10-year plan for the web, we need to turn the web around now" (ibid, November 2019).

As for time, all it has has proved now that this paper is on the table is that all those claims signposted Google's intended direction of travel. That hasn't changed, and - whether or not anyone is worried - I believe it will continue to affect all of us, in most cases adversely.

NickMNS

3:33 pm on May 23, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The problem with the statement is that it is framed in absolute terms. Eg:
Time proved this was wrong
The result will be that web masters who currently produce fascinating websites will never get visited, and they will just give up.

"...will never get visited, and they will just give up".

While a site with fascinating content still gets traffic today, the quantity and quality of the as compared to 2012 is lower. Now I realize that this statement is difficult if not impossible to prove, but there lies the crux of the problem. We can take advertising revenue as a proxy, for the same traffic level x, the advertising revenue earned to day has fallen considerably. This despite the fact the value of online advertising as a whole has risen over the same period (~4X). If all would have been equal webmasters should also have seen 4x increase in revenue or at least a sizable increase. But we haven't.

...and they will just give up.

Many have and many more will. I myself have basically given up, my website still exists put I have barely put any work into in the past few years because I realize that depending on Google is not a sound business plan, so I focused my attention on other things that don't really as heavily on Google. Side note: you will likely say "Ah! you have given up that's why your view is so grim" but no, this year is looking like it will be my best year ever, but that still isn't enough.

Google is not stupid, they have very many smart people working for them, business strategists and economists. They know how far they can or can't push this, so traffic will not fully dry up, or more specifically to find the optimal point, but you can be sure that they will extract as much rent from webmaster's as possible, and this new technology makes it possible to extract just that much more.

This isn't a sudden death scenario, it is more a frog in boiling water scenario, and it is getting hot in here.
This 61 message thread spans 3 pages: 61