Forum Moderators: Robert Charlton & goodroi
Tim Bray and Marco Fioretti noted that Google seems to have stopped indexing the entirety of the internet for Google Search. As a result, certain old websites—those more than 10 years old—did not show up through Google search. DuckDuckGo and Bing both still seem to offer more complete records of the internet, specifically showing web pages that Google stopped indexing for search.
...Pinboard, a minimalist bookmarking service similar to Pocket, which has a key feature for archivists: If you sign up for its premium service—$11 per year—Pinboard will make a web archive of every page you save.
...i've heard the odd word here or there from Google saying it doesn't index everything.
No surprise there.
Whether it shows all the content it indexes is also a point.
One thing to add here - we don't index all
URLs on the web, so even once it's
reprocessed here, it would be normal that not
every URL on every site is indexed.
Google No Longer Indexes all The Web
[edited by: robzilla at 2:36 pm (utc) on Apr 11, 2019]
"It's only April, and 2019 has already been an absolutely brutal year for Google's product portfolio. "
We love being your neighbor and being part of the community. As sure as breakfast tacos are awesome, Google Fiber is here for good.
Our network is built to last. And with Google Fiber, you get all the Internet we can give you, all the time.
"to organize the world's information"
For them to notice Google is no longer indexing all the content, is significant.
There's an enormous amount of information in old books, newspaper archives, non-english publications, orally-transmitted traditions, etc, which isn't on the web and dwarfs what is on the web.
The study, published in the journal Science, calculates the amount of data stored in the world by 2007 as 295 exabytes*.
That is the equivalent of 1.2 billion average hard drives.
The researchers calculated the figure by estimating the amount of data held on 60 technologies from PCs and and DVDs to paper adverts and books.
"If we were to take all that information and store it in books, we could cover the entire area of the US or China in 13 layers of books," Dr Martin Hilbert of the University of Southern California told the BBC's Science in Action.
They have also crammed so many in-site links on the to serps now, that organic exposure is falling fast. Thus, they are killing off swaths of the web (starting with older content).
If G is removing websites / pages purely based on age then they need to be very careful
They're not, and the whole "10 years old" theory is nonsense. I have no trouble pulling up obscure pages from 20+ years ago.
I've just been reading that Google will allowing users to specify date ranges that include "before than". This suggests that Google might be aware of this criticism about dropping old results, and may be allowing those who are looking for these vanished results to find them once again.
...suggests another approach to unearthing old pages, which is to allow the user to choose the time segments, rather than attempting a full historical view in a set of ten links. The segmented approach is particularly useful in finding gold in old pages, which are othewise buried under ever-growing layers of new results. How this will eventually relate to the apparent size of the web that Google is indexing remains to be seen.
...I'm very curious where Google is going with this... how many results are simply going to be Supplemental, and how many others are likely to become Vintage, like old wines, brought out for special occasions.