Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Cache returning 404

         

Samsam1978

5:20 pm on May 6, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



I noticed my site is completely dropped from cache of google. It shows 404. I tried indexing new content and was in the index fine and cache. Site is still cached. Header looks ok. Anyone had this before. Site is 10 years old. No changes.

NickMNS

5:34 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The cached content was retrieved by Google at some specific time using some specific protocol. (I'm pretty sure the cache isn't always the result of the single most recent crawl; it's a separate process.)

No. As I said in an earlier post. I tested a page that has only ever existed as https, if you type http://the-specific-page my server redirects the user/bot to https. So it is impossible for Google to have ever cached the page as http. Yet the page is impacted by the http vs https bug. This has absolutely nothing to do with protocols, this is a coding bug on Google's part. It caused by the link displayed to the user includes values ("s" after the http) that are not valid parameters so it return 404. Change the para value to valid value http and it works as expected. This is a simple bug that should be easily fixed. What is puzzling is why has it persisted for so long?

This has nothing to do with Mobile first index.

[edited by: Robert_Charlton at 5:37 am (utc) on Oct 23, 2018]
[edit reason] Disabled auto-link so example url is visible. [/edit]

Leosghost

5:39 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Simplifying somewhat..how could they show a cache of a 3rd party https ressource..the https element prohibits another site from showing what is covered by the https..except as a screen cap..anything else would mean that the cert was not working..

Question is ..is what the http cache showing "current" ( contemporary to what is on the real https site ) in which case it would be Google "extrapolating" from what they crawl into a "snapshot'..or is it what they saw the last time that they crawled when the site was still http..

If the latter, then one would expect the cache to eventually disappear..as later crawls would be to the https pages..and so any cache update" on Google's and would result in a database entry with no content..hence a 404..

IMO this would be the logical way for any search engine to proceed in relation to https and "cache"..

Anything else would be leaving themselves open to all sorts of problems..and , as I mentioned above, this way is far cheaper..every byte stored is an overhead cost..I don't think it can be called a bug ( it is working as the protocol is designed to ) more a case of this is "the future"..

Leosghost

5:45 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ah..NickMNS posted while I was typing, so I did not see that this also happens to pages that have only ever been https..
In which case Google is making "shadow copies" of some https pages..possible, but complex, and fraught with legal problems ( not that they ever worried too much about those ) ..and involves them in some re-writing of each page..behind the scenes, not very cost effective..

What does comparing all the source code of what they cache from an https page ( and present as http ) and what they cache of an http page and present as usual tell..

NickMNS

5:58 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In which case Google is making "shadow copies" of some https pages..possible, but complex, and fraught with legal problems

How is this a surprise? When one says the page is indexed, what that means is that a copy of the page has been placed into their database. This is what Googlebot does. But basically Google has been copying the web into its server since the day it began. The cache lets you see what their copy looks like.

I assume that the NOARCHIVE meta simply tells them not show the copy and NOINDEX tells them not to make a copy.

Leosghost

6:13 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not a surprise at all, but an insight into the changes that they have made into how they do it..example at one time they moved to hotlinking the images directly from the originating site's server when serving a cache page..they stopped doing this after just as short time and began keeping the images themselves and integrating them directly into pages served from their cache..

Not talking about simple image search here..but about complete page caching methods..this meant that ( when they were doing this ) if someone looked at your page via Google cache ..you were serving the images to that "view" and thus incurring the bandwidth..for each view..They stopped doing this relatively quickly ( for them ) after objections..it was again an attempt cost saving exercise for them..This is going back a few years now..

lucy24

6:30 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I tested a page that has only ever existed as https
OK, I did miss that. Having eliminated all other possible explanations: Yup, it's a bug. A pretty inexcusable one, considering the vast resources they have at their disposal, but what can you do.

I don't see what the MFI has to do with it.

martinibuster

7:47 pm on Sep 22, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yet the page is impacted by the http vs https bug. This has absolutely nothing to do with protocols, this is a coding bug on Google's part. It caused by the link displayed to the user includes values...


No, no, no.
Read my post above.

It's not a coding bug. It's not caused by a link displayed to the user etc.

It's just a bug from the transition of sites to Google's new mobile first index.

It's a simple transition, cache likely comes from a different source now, that hasn't been implemented.

There is no meaning to be derived from the bug.

sunnydeval

3:13 am on Oct 23, 2018 (gmt 0)

5+ Year Member



Google has fixed cache 404 error. Now no need to use cache 404 hacks.

Source: <snip>



[edited by: not2easy at 3:39 am (utc) on Oct 23, 2018]
[edit reason] Please Read the ToS [/edit]

NickMNS

3:36 pm on Oct 23, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes it appears to be resolved.

RedBar

9:48 am on Oct 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Half a dozen sites tested so far and all appear "normal".

I wonder if they'll ever admit to what the problem was?

swetha

11:43 am on Oct 24, 2018 (gmt 0)

5+ Year Member



My client website also showing the same Cache 404 error from past 2 months. Why so? I didn't get any solution to solve this. Recently I have changed WordPress to PHP site. Might be because of this cause google cache is showing 404?

RedBar

12:50 pm on Oct 24, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld swetha

Recently I have changed WordPress to PHP site. Might be because of this cause google cache is showing 404?


It's nothing you've done/changed, it has been like this for months, all this year it seems to me, maybe even longer, I had actually given up on it ever being repaired!

Robert Charlton

4:18 am on Oct 25, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Cache problems have been so frequent this year that at times we've had too many threads here on the topic to comment on them all. I've been assuming that they've related to broader infrastructure changes. It's likely that the cache display, necessarily being one of the last things to be updated, is also one of the least important aspects of Google's infrastructure.

John Mueller has been busy all year telling people that what they see in the cache doesn't matter with regard to ranking or spidering. Here's a short thread on the 404 issues on which elaborated these thoughts a little further....

Recent copy of cache error 404
May 21, 2018
https://www.webmasterworld.com/google/4902307.htm [webmasterworld.com]

martinibuster

6:01 am on Oct 25, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The problem has been solved.

keyplyr

9:48 am on Oct 29, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Fix discussed here: [searchengineland.com...]

Robert Charlton

11:52 pm on Oct 29, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



From SEL...

Why did it matter? The truth is, it didn’t matter much. It caused concern amongst SEOs and site owners that the cache link would 404. But Google said it didn’t impact rankings or anything else; it was just an internal bug.

nettulf

7:35 am on Oct 30, 2018 (gmt 0)

10+ Year Member



But, I am still confused.

What is Google using for ranking? I seem to remember they said it was the cached version. But which?

I have changed a few things on my site.
The cache I see in Google (on several computers/IP) is the old version, about 2 weeks old.
The version I see in the new webmaster tools (view source) is my new version, about 1 week old.

So what is the "real" version for Google?

keyplyr

8:16 am on Oct 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@nettulf - No need to be confused. Cached pages * do not* affect ranking. Someone lied to you.

I haven't allowed Goggle to cache any of my pages for years and it hasn't affected ranking a bit.

.

cr1m

8:35 am on Oct 30, 2018 (gmt 0)

5+ Year Member Top Contributors Of The Month



@nettulf
Even if Google used cached versions in some way, there was a problem with displaying cached versions, but Google still had them in their database, as the cached versions could be seen using several "hacks".

nettulf

8:41 am on Oct 30, 2018 (gmt 0)

10+ Year Member



Thanks @keplyr and @cr1m

To rephrase the question, I wanted to know what is Googles version of my site. How much of my changes have reached their "core". Not concerned about cache or not, but how do they see it? What version do they have in their database that they are using for ranking? How much of my changes have they digested?

I thought the cache used to show just that, but seems like I am wrong. Especially when I see two different versions of my site in two of their tools. :)

keyplyr

9:06 am on Oct 30, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google *sees* your pages just how it shows in the Fetch As Google tool in GSC. Forget about the cached pages.

SERPs update every few weeks. That's when you may see changes in what you refer to as ranking.

But you need to change how you think abour all this. There is no ranking per se because each person sees different search results. Not everyone sees what you see when you search for terms that show your site.

There are many factors that determine what you call ranking. Artificial Intelligence (AI) plays a more significant role in determing each user's intent and giving that user a set of search results that best fills that intent.
This 51 message thread spans 2 pages: 51