Google No Longer Indexes all The Web

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google No Longer Indexes all The Web

General Public Notices Google Quality Decline

Brett_Tabke

12:27 pm on Apr 10, 2019 (gmt 0)

[lifehacker.com...]

Tim Bray and Marco Fioretti noted that Google seems to have stopped indexing the entirety of the internet for Google Search. As a result, certain old websites—those more than 10 years old—did not show up through Google search. DuckDuckGo and Bing both still seem to offer more complete records of the internet, specifically showing web pages that Google stopped indexing for search.

bingdude

8:22 pm on Apr 12, 2019 (gmt 0)

Site of the Internet = Infinity. Every day we (Bing) discover more than 100B new URLs never seeing ever before, including ignoring useless URLs parameters. Infinity triggers technically impossible to index the whole internet.

tangor

8:43 pm on Apr 12, 2019 (gmt 0)

@bingdude ... preaching to the choir of "paying-attention crowd." :)

Thanks for the confirmation. That daily number of discovery is mindboggling (but already suspected by many of us).

Brett_Tabke

3:20 pm on Apr 13, 2019 (gmt 0)

> old content

and the algo's are currently encouraging the removal of old content.

...making data-driven decisions about whether you should improve (update, rewrite, or consolidate) or remove (deindex) old content from search engines.

[searchenginejournal.com...]

RedBar

3:55 pm on Apr 13, 2019 (gmt 0)

You could start to call it the 'fortune 1000' index only.

And judging by the quality of its SERPs in MY global industry, blatantly promoting and advancing US companies since it seemingly does not know nor care about "other" countries' sites.

I just wonder whether or not they could actually index all these other counties and languages?

tangor

10:01 pm on Apr 13, 2019 (gmt 0)

The strain on g is beginning to show. The web is "that big"!

From here on out expect g to pick "winners and losers" and deal with it.

iamlost

12:25 am on Apr 14, 2019 (gmt 0)

Google has always picked 'winners' and 'losers', the 'playing field' has always been tilted, the 'rules' have often been changed.

The question for those (1) who run afoul or (2) prefer to spread risk is simple: where and how best to develop and build (1) return visitor traffic and (2) new traffic streams.

The only other options are (1) find another business model or other employment, (2) retire, (3) keep gambling on G.

There is still lots of money on G's table, a great many sites are and will continue to do well... However,
You've got to know when to hold 'em
Know when to fold 'em
Know when to walk away
And know when to run…

Best wishes to one and all.

EditorialGuy

1:11 am on Apr 14, 2019 (gmt 0)

From here on out expect g to pick "winners and losers" and deal with it.

I think it's less about Google picking winners and losers than it is about site owners condemning themselves to failure.

Part of a search engine's job is to separate the wheat from the chaff, and too many Web pages are little more than empty husks. Why shouldn't a search engine discard pages that any rational searcher would consider worthless?

tangor

2:33 am on Apr 14, 2019 (gmt 0)

Why shouldn't a search engine discard pages that any rational searcher would consider worthless?

True words, and sadly will fall on deaf ears plugged by fingers while whining "Great Content! Great Content!"

Mules and 2x4s come to mind to get their attention.

flatfile

6:02 am on Apr 14, 2019 (gmt 0)

I don't understand why people constantly complain about something that Google set out to do more than a decade ago. It was in 2008 when Eric Schmidt, the then CEO, said the internet was turning into a "cesspool". [adage.com ]

Speaking with an audience of magazine executives visiting the Google campus here as part of their annual industry conference, he said their brands were increasingly important signals that content can be trusted.

"Brands are the solution, not the problem," Mr. Schmidt said. "Brands are how you sort out the cesspool."

It looks like Google has worked out that only a small portion of the web is useful and trustworthy. That trustworthiness seems to be based a lot on links. I think Google is constantly upping the threshold of what it considers to be a credible link, at least that's the net effect.

On a side note, I always find it strange when people complain about ranking losses while "proudly" announcing that they don't build links. Just look around! Most sites that have done well without link building are old. Some of the owners of those sites think they're doing well simply because of their "great content", which is a huge blind spot in my opinion because they end up misdiagnosing the problem when a competitor eventually replaces them. This place is littered with such examples.

My comment started out short but ended up being a rant. It's not directed to any particular person, It's just stuff I've observed here over the years.

StoneSolid

11:10 am on Apr 14, 2019 (gmt 0)

@flatflie

In a way, I agree with you, but that is also the problem of current google serps.
There is no way for a small site to get "out there" anymore, not even with super specialized niche content - serps will still belong to the big players with big backlinks.

Rndm

1:29 pm on Apr 15, 2019 (gmt 0)

I think that I have the most unpopular opinion of all time. If I were Google I would make people register for some type of cert to be indexed. I would not index anyone without some type of cert for their business. Spam is the problem and despite what many think regulation is the answer. There need to be more hurdles to get a site in a Search Engine.

Brett_Tabke

10:50 am on Apr 16, 2019 (gmt 0)

Also, look at some of the related posts in the article:

Google Memory Loss: (Tim Bray)
[tbray.org...]

When I have a question I want answered, I’ll probably still go to Google. When I want to find a specific Web page and I think I know some of the words it contains, I won’t any more, I’ll pick Bing or DuckDuckGo.

Google's forgetting the early web: (Cory Doctorow):
[boingboing.net...]

Tim Bray went looking through Google for some posts he knew about from 2006 and 2008 and found that Google couldn't retrieve either of them, not even if he searched for lengthy strings that were exact matches for text from the articles"

And then this classic:

The good news is that Bing and Duckduckgo both maintain much more complete indices of old posts and publications, and so if you're looking for stuff that's more than a decade old, you can switch to one of Google's competitors to find it.

Indeed, it seems that Google IS forgetting the old Web: (Marco Fioretti)
[stop.zona-m.net...]

Google would only return links to mentions, or even to whole copies, but archived elsewhere. I asked Google to reindex this whole website, but nothing changed. Yesterday afternoon, through BoingBoing I discovered Bray’s post. As soon as I read it, I tried DuckDuckGo and got the same result: Google ignores my copy of my own post, DuckDuckGo correctly lists it as first result

Then there is this awesome thread of search engines and resources on Hacker News:
[news.ycombinator.com...]

While it's become impossible to browse the wider Web with Google, it's getting a bit easier elsewhere. A few helpful search engines:
* [millionshort.com...]
* [wiby.me...]
* [pinboard.in...]
A recent movement to build personal Yahoo!-style directories:
* [href.cool...] (my own project)
* [indieseek.xyz...]
* [districts.neocities.org...]
* [the.dailywebthing.com...]
The above resources are focused on general blogging and personal websites - for software and startups, I would refer to the appropriate 'awesome' directories. (https://github.com/sindresorhus/awesome or [awesomelists.top)...]
[ubu.com...]
Here's another big art repository:
[monoskop.org...]
And a very well-documented collection (a "wiki") of paintings, also non-profit:
[wikiart.org...]

>. Every day we (Bing) discover more than 100B new URLs never seeing ever before, including ignoring useless URLs parameters.
> Infinity triggers technically impossible to index the whole internet.

Agreed, but the size of the 'old internet' is finite and basically infinite storage is available at a cost. (I have a quarter petabyte in my small business office - I can only imagine what a search engine could do with a few Exabytes)

Brett_Tabke

1:22 pm on Apr 18, 2019 (gmt 0)

Related:
Digital Dementia – Are Google Search and the Web Getting Alzheimer’s?
[blog.ouseful.info...]

jmccormac

4:02 pm on Apr 20, 2019 (gmt 0)

Looks like the ultimate SERPs poxification of Panda and all the other poorly thought out and completely anti-Web "algorithms" and adjustments is being played out with Google's index. Rather than fix the underlying problems with Google, these tweaks and kludges were the equivalent of a new coat of paint on the rotting husk of a worm infested wooden boat. Google had the opportunity to stop a lot of the problems with link pollution but it got the whole Natural Links thing backwards. Then the whole domain name business shifted in the last ten years or so. What had been a largely stable business started seeing massive problems with Domain Tasting circa 2005-2008. Domain names were being kept from expiring and the whole "all the good domain names are registered already" idea started propagating. But that wasn't so. Then the new gTLDs launched by 2013. The problem was that a lot of the demand for these new gTLDs had been destroyed in 2009/2010 when that restocking fee for tasted domain names (domain names deleted by registrars within the five day Add Grace Period) was introduced. Domain Tasting dropped from the peak of 40 million or so domain names a month. Then there was that clueless rubbish from Google about what makes a "good" website.

The registration volume in the gTLDs was falling and the registries were forced to rely on discounting registrations fees for domain names. But what was really happening was that the renewal rates on new registrations were falling. In 2004, the renewal rates for new registrations after their first year was over 73%. This is different from the blended renewal rate published by registries. The blended renewal rate is the renewal rate for all domain names up for renewal in a given month. The one renewal rate for .COM over the last year or so is around 58%.That means that about 42% of the domain names registered in .COM will not renew. It is worse for some of the other gTLDs. There were some claims that the new gTLDs had impacted .NET registrations. This is wrong. The .NET TLD has been in decline since 2009. Discounted registrations do not renew at the same rate as full fee registrations. It is that simple. So while they keep the registration volume of a TLD inflated for a year, the bubble bursts at renewal time. A registry has to either keep using discounting to drive sales or risk their TLDs shrinking.

It is far worse with some of the new gTLDs. A few of the new gTLDs had made discounting their business model. Not good registrations. Just discounted registrations. These NGTs attracted all sorts of bad actors and there was very little active development of websites in these new gTLDs. One had about a thousand developed websites and over one million adult affiliate landing pages/sites. The renewal rate on heavily discounted registrations is typically down around 5%. The renewal rate for the unnamed example is actually below 1% and it has over 1.5 million domain names at the moment. The replacement management of these gTLDs decided to stop discounted registrations and increased the wholesale fee for registrations (the fee charged to the registrars). The numbers of new registrations per month in some of these gTLDs collapsed from tens (or hundreds of thousands) of registrations per month to under a hundred.. In the next six months, the largest example above stands to lose approximately a million registrations. However, these discount driven new gTLDs are the exception. Many new gTLDs are doing relatively well in terms of development and renewals but they are small. They have ccTLD dynamics rather than gTLD dynamics.

Discounted registrations are not generally developed into working websites. They drop without ever having been developed and more often than not drop at the first renewal. To the casual web user, this is all invisible. But at the domain name and search engine level, this is a problem. For .COM 2017 registrations renewing in 2018, there were 34,366,832 new registrations. Of these 14,097,961 deleted and 20,268,871 were retained. For .NET the retention was 53.16%. The .ORG TLD was better than both at 60.64%. The .BIZ was 32.04% (uses discounting). The .INFO was 29.36% (pioneered the discounting model). The .MOBI was 53.22%. The .ASIA gTLD was at 45.44%. The five largest new gTLDs use discounting and low priced registrations.The .TOP (a Chinese market NGT) was 28.18%. The .LOAN was 0.58%. That's not a typo. The .XYZ is at 23.49%. The .CLUB is at 17.85%. The .ONLINE is at 25.52%.

The problem for search engines is that they have to work out what is an actively developed website and what is an inactive (holding/PPC/unavailable/sale) site. This has to be done on a scale of hundreds of millions of domain names. And tens of millions of domain names checked this month won't exist next year. The failure of Google to deal with its link problems while waffling about what makes a "good" website (other search engines suffer from these problems too) means that new low quality sites created for link purposes keep appearing. There is even some very impressive webspam generation software available in the Chinese market specifically to take existing website content, SERPs and blog posts and churn them out as "new" websites specifically for advertising or backlinks purposes. Unless you were familiar with the software signatures, these sites are, to the ordinary web user, indistinguishable from real websites.

Combine all this with that portalisation of Google with its greed in keeping users on Google properties and feeding them advertising and there's a good argument for Google being the creator of the content problems it faces.

Regards...jmcc

RedBar

12:59 pm on Apr 22, 2019 (gmt 0)

The last couple of days when I have been searching for some medical information on G I have been surprised to see results from the search engine ecosia.org being listed ... a search engine with results from another search engine and listing it on the first page?

MrSavage

5:08 pm on Apr 22, 2019 (gmt 0)

The web is whatever Google says the web is. If they want the web to be 180 million websites, they can make it so. They dictate what is and what isn't. Using the word "web"? Replace that with "Google" and conversations would be a bit more accurate in real terms. I tend to think of the web in past tense. The "web" shrinks every day I use Google. I noticed nobody responded to the point I made about YouTube in the SERPS. Wouldn't that processing mean some other aspects of the "web" get left behind? I'm sure Google indexes most of the words spoken in videos on YouTube and uses that to push those YouTube results in the SERPS. To think that all that processing doesn't mean cutting corners, like say, indexing less websites or dropping older content is somehow irrelevant? Google can't do it all. YouTube is part of the SERPS now, along with answers and other content. With that, how can they do all that, yet do everything they did before? And in the blink of an eye. People accept the idea of Google dropping information out of the index like it's what they have always done. Well? They never populated their SERPS with YouTube and scraped answer box content before either. I gather like most things, so long as people are "rolling in it", they tend not to care about ethics, morals, or proper.

sunjun

11:33 am on Apr 24, 2019 (gmt 0)

What will affect the old url if is indexed?I'd say the traffic,backlink,DA,PA etc.But once removed from the google index,I insist google keep its original rank point and ready to take it into index anytime people talk or give backlink on it.

This 47 message thread spans 2 pages: 47