Welcome to WebmasterWorld Guest from 52.91.245.237

Forum Moderators: Robert Charlton & goodroi

Google Cache returning 404

     
5:20 pm on May 6, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:May 3, 2017
posts:111
votes: 6


I noticed my site is completely dropped from cache of google. It shows 404. I tried indexing new content and was in the index fine and cache. Site is still cached. Header looks ok. Anyone had this before. Site is 10 years old. No changes.
6:38 pm on May 6, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4085
votes: 257


Are you seeing this in the new GSC? There seems to be a number of bugs to work out in there.
11:43 am on May 7, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:May 3, 2017
posts:111
votes: 6


Thanks for replying - its in the cache of all my pages. So when I search site:mydomain all my pages appear (10k+) but when I click "cache" are all showing 404 in google e.g.

The requested URL /search?q=cache:BhZF5SW2_bFJ:https://mysiteurl/+&cd=1&hl=en&ct=clnk&gl=ukwas not found on this server. That’s all we know.

When I go into render in webmaster tools it indexes new content fine with a cache. Traffic drop of 5% since seeing this.
3:49 pm on May 7, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4085
votes: 257


The site: search has no protocol while the cache search does. Have you recently changed to https?

If so, have you added the "new" domain to GSC and ensured that they are de-indexing the old http version of your site while indexing the new https site? Make sure that the change uses a 301 (permanent) rather than the Apache default 302 (temporary) status. I ask because you should be able to visit both sites within GSC and see the new version's indexed pages increasing while the old version's indexed pages decrease. Sitemaps can help speed the changes.
9:25 am on May 8, 2018 (gmt 0)

New User

joined:Nov 16, 2016
posts:10
votes: 0


Tihs thread might help, answer from John Mueller: [reddit.com...]
9:32 am on May 8, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


We're currently in the Mobile-first Index Update. Site data is migrating to the new index little by little.

I would give it some time. Things like this happen all the time.
1:22 pm on May 8, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:May 3, 2017
posts:111
votes: 6


Hi guys I went over to https in October 2017 but the google cache seems to be indexing my site under http rather than https e.g.

https://webcache.googleusercontent.com/search?q=cache:BhZF5SW2_tEJ:https://www.example.com/+&cd=1&hl=en&ct=clnk&gl=uk (Shows 404)

This link resolves the cache but notice the http?
https://webcache.googleusercontent.com/search?q=cache:BhZF5SW2_tEJ:http://www.example.com/+&cd=1&hl=en&ct=clnk&gl=uk (Shows the cache) but the site is in https

301 Redirects are all fine with screamingfrog and headers fine.

Any ideas?




[edited by: not2easy at 6:45 pm (utc) on May 8, 2018]
[edit reason] de-linked for readability [/edit]

6:25 pm on May 8, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Again, give it time. If you have correctly installed the 301 to HTTPS and HTTP paths are no longer possible, indexing (including cache) will catch up eventually.
8:52 am on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 3, 2002
posts: 2580
votes: 0


Samsam,

Did your missing-cache issue get resolved?

This problem began with one of my sites about 4 months ago and continues to this day. During that time period, rankings have continued to decline -- worsening with each update. Two other websites that I work on are no longer cached in Google and I fear they may suffer the same fate. It seems that the missing cache problem is affecting more and more websites. I've been monitoring this phenomenon for months, and I have been unable to determine a cause. Some have speculated that large, site-wide structural changes or a site's transition into the mobile index could be culprits. That explanation seemed reasonable initially. However, I've watched other websites migrate to https (large site-wide change), transition to the mobile index, and continue to be cached.

It's a frustrating problem. As it pertains to my websites, if I have something misconfigured on the server or if there are underlying issues inside Wordpress, I haven't been able to identify it. I just hope the long-duration/steady decline in rankings on one uncached site isn't a sign of things to come for newly-affected sites.
2:24 pm on Sept 17, 2018 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member redbar is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Oct 14, 2013
posts:3079
votes: 436


I've also been monitoring this for months to see if I can ascertain any common factor and I can't therefore I have come to the conclusion it's completely broken ... for some of us.

The lack of missing-cache has not affected any of my rankings, I have had no site-wide structural changes, I've been mobile/responsive ready for years, I do not use WordPress.

I would like to say that nothing is misconfigured on the server however I had one question/doubt about this. I use the Plesk hosting platform and I was fairly sure that their "Permanent SEO-safe 301 redirect from HTTP to HTTPs" was not 100% prefect.

I have just changed "is" to "was" in the last sentence because I have just tested a very small known problem I had but which now does not exist. This is new, I'll see what happens, otherwise I have no idea what could cause this missing-cache issue.
3:28 pm on Sept 17, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2004
posts:1064
votes: 8


We have been inflicted with this phenomena rather dramatically. The increase seems to have coincided with a recent plesk upgrade we underwent to Plesk Onyx, but we have no evidence of correlation with that.

We have sites with this affliction that are https, some that are http, some word press, some not.

"That can happen, it doesn't mean anything."

Doesn't mean anything for whom? Obviously it "can" happen, and there has to be a reason why.
8:10 pm on Sept 18, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 29, 2007
posts:1823
votes: 107


I applied a sitewide Google noarchive meta tag to keep my pages from even having a cache. I whitelist my domains in adsense and earn nothing from clicks in cache. This hasn't caused me any ranking issues and it stops a way to scrape my content without me seeing a hit in my logs. While it may help people when my site is down it's almost never, ever down and it's fast.

I guess you have to decide how important google cache is to your site before you worry about it too much.
5:44 pm on Sept 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2004
posts:1064
votes: 8


I guess you have to decide how important google cache is to your site before you worry about it too much.

"404. That’s an error. The requested URL /search?q=cache: (example site here) was not found on this server. That’s all we know."

Well, the site is on the server, its not a 404, and "Thats all we know" isn't true (they certainly know why its happening). In the past, when you would get this occasionally, it wasn't hard to shrug your shoulders and "not worry about it" because inevitably it would go away. I think most of us, over many years, have gone about our business with the understanding that what Google displayed in their search results, was based upon a cached copy of a site, a reference point that in most cases you could take a look at.

But when Google pukes up a 404 consistently for three weeks, for 50 sites, I wouldn't say it causes me worry, but more curiosity and the suspicion that comes from them trying to casually brush off the question.

There is no cause for concern because at this juncture theres nothing you can do about it, and there is no evidence its causing anyone difficulty. But the question remains; why is this happening, why is it so widespread compared to the past and why is it dramatically increasing. What is your feeling if it got to the point where being able to see a cached copy of a site was the exception and not the norm?
6:39 pm on Sept 19, 2018 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member redbar is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Oct 14, 2013
posts:3079
votes: 436


The thing is though a few months ago it was only my .coms which were affected and my .co.uks were fine, now even they have the error therefore it seems to have spread throughout their whole system.

Has anyone any tlds etc that are not affected, if so, which ones?
8:08 pm on Sept 19, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15314
votes: 708


Well, the site is on the server, its not a 404
How do you know what's on Google's server? (We will set aside the issue of Google's spectacularly useless 404 page, which can serve as a bad example for lazy webmasters everywhere. But that's a different thread.)
9:53 pm on Sept 19, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:July 29, 2007
posts:1823
votes: 107


Are we talking about the little dropdown arrow next to your page title in serps that contains the "cache" option?

When you apply a google noarchive meta tag that cache option disapears in Google. Visitors are forced to see your live site. I really don't see a problem with Google showing a 404 for their cache copy when your site is just fine, it's not a problem on your end. Again, does it benefit you for visitors to see your content from Google's cache copy instead of from your site directly?

The only scenario in which it would is if your site goes down and someone wants to see it and so clicks on the cache link to see it there. I'm not suggesting you block your site from Google's cache, it's default behavior for it to be there, but it's not bringing you traffic. In my case I suspect it was bringing me scrapers who could access my site in the cache and never leave a trace in my server logs.
8:58 am on Sept 20, 2018 (gmt 0)

New User from EE 

joined:Dec 28, 2016
posts:19
votes: 3


I haven't checked all of my websites' cached versions, but the ones I have (that also have HTTPS), all show 404 when checking cache, but I don't see how it's a problem for me. AFAIK, it's a Google issue that doesn't hurt me whatsoever (since replacing HTTPS in the URL with HTTP shows the cached version correctly)
3:06 pm on Sept 20, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2004
posts:1064
votes: 8


How do you know what's on Google's server?

Every single solitary thing regarding web sites returned in their search results, and anything even remotely connected to any activity generated from that, is on their servers, that is a given. Their not real big on ignoring things, or deleting data.
4:42 pm on Sept 20, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15314
votes: 708


“In their database” != “on a public server”
4:48 pm on Sept 20, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2004
posts:1064
votes: 8


“In their database” != “on a public server”

I'm sorry, I don't follow you. I'm genuinely curious on your take on this if you could elaborate I would be appreciative.
5:10 pm on Sept 20, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


Hazrding a guess at what lucy24 may mean..and it is certainly "my take" on what they ( Google ) mean is..

Possibly ( highly likely , even certainly ) that they have it ( and much other data ) on "a server" in "a database" ( one of millions, servers and databases, that they have ) ..but certainly not on one that you can connect to ( public facing ) and certainly not on that particular server, hence the 404..which refers to that particular server..
7:29 pm on Sept 20, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2004
posts:1064
votes: 8


certainly not on that particular server, hence the 404..which refers to that particular server..

So they store a cached version of some pages, but not others, on that particular server.

Why some and not others?
8:13 pm on Sept 20, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


Firstly because that is the way they store data, they use what could be described ( in a vastly simplified way* ) as "highly dispersed non linear storage" ( Google's storage and database architecture has been mentioned and described in detail many times here and elsewhere, much / many of their methods they have "open sourced" )..it is not like a windows box simply multiplied..

Secondly, for reasons of cost , they cannot store everything , even though what they store is "boggling", data is created faster on a publicly accessible ( "crawlable" by search engines ) world level than even Google can crawl , index and store..

When any search query is answered, providing that answer ( searching what matches that they have crawled and indexed ) is staggeringly complex..even more impressive when one sees how fast the answer is returned, and allows for the massive amounts of data created every minute ..

*How search engines work , crawl, store , index, and provide "matches" to queries , is a subject way beyond the scope of a thread in WebmasterWorld...even how massive data bases work is the subject of multiple specialities..I picked up just a smattering from a friend who was working in the field ( search engines, crawlers, massive multi user databases, and artificial intelligence ) when I launched my first website..we spent many hundreds of hours discussing what he and his colleagues were building / researching / implementing for many companies and institutions, he also gave me very many books and files .. interesting.. :)

There was at that time ( still is ) a great deal of philosophy involved too , not all by any means are blindly coding without asking, why, what for, what might the results be , why should ...

Anyone who knows more than I do on the subject ( which will be very very many people ) ...will be aghast at the simplifications I've made in this post...But you could fill multiple Terrabyte discs on how and why search engines do what they do..

Or.."it's automagically like that because the unicorns and the #*$!(i)? want it that way"
8:15 pm on Sept 20, 2018 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:7139
votes: 410


Interesting..the filters don't like "illuminated beings"..I wrote i ll u min ati(i )? ..
8:17 pm on Sept 20, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15314
votes: 708


Hazarding a guess
Yup, that's what I meant. I'm willing to concede that Google has all the world's knowledge in a database somewhere--but not all of it is in a public-facing server.
3:14 am on Sept 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2266
votes: 603


This is very strange indeed. I checked a couple of sites and they are all impacted. What is interesting is Samsam's post of May 8th. Where he remarks that if you change the https to http in the google cache url the page is shows correctly.

Now what is really strange is I checked for a newly created page that never existed in an http state on my server and it's cached version still returns a 404, but if you change the https to http it shows up correctly. This appears to be a bug on Google's side.
3:59 am on Sept 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15314
votes: 708


This appears to be a bug on Google's side.
... Yeah. I would say so. Yikes. Is it even physically possible for http and https to point to different content (in the way that example.com and www.example.com could theoretically be different sites)?
4:11 am on Sept 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2266
votes: 603


This is the crux of the bug. This is not a true https vs http thing. The cache's url just incorporates the http(s) portion of the initial url as part of the query parameter. But I don't think that Google has configured its backend to process https as part of the parameter. Note in the url below (copied from Samsam's post) that the problematic https is the second one in bold, change that to http and bingo! the cached page displays fine.

https://webcache.googleusercontent.com/search?q=cache:BhZF5SW2_tEJ:https://www.example.com/+&cd=1&hl=en&ct=clnk&gl=uk (Shows 404)
4:51 pm on Sept 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15314
votes: 708


:: continuing to think this through, at the risk of over-thinking ::

The cached content was retrieved by Google at some specific time using some specific protocol. (I'm pretty sure the cache isn't always the result of the single most recent crawl; it's a separate process.) At the specific time the to-be-cached URL was crawled, was the site still using http? If you postulate that not every crawl is cached, then it is possible for a page to be indexed as https, while the cached version dates back to an earlier time when it still used http.

Further thinking suggests that I was wrong before, and it is possible for a site to return different content depending on whether the protocol is https or not--in fact it would be quite easy to code--but only if both versions remain reachable concurrently. Otherwise it's no different from “Here’s what the site looked like two weeks ago, though we can’t guarantee it still looks the same”.

But honestly, you'd think someone at Google would notice the http:https discrepancy and code some more useful response than a 404. After all, cached pages aren't primarily intended for webmasters checking up on their indexing status; they're supposed to be for humans. Are they afraid someone will call them out on a “Soft 404” of their very own? If the requested URL isn’t available, offer this very similar alternative instead.
5:02 pm on Sept 22, 2018 (gmt 0)

Moderator from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14845
votes: 473


This is just a bug related to the migration of sites to the new mobile first index.

There is no SEO factor or meaning attached to the cache being missing. It's just a bug that will be fixed at some point in the future.

If there is any meaning to be had from the missing cache, it could be said that it's a sign that the site has been migrated. That's really all that means.
This 51 message thread spans 2 pages: 51