|Cache request for a recent post gives a "not found"|
I have a website that ranks well in Google. The problem I'm having is when I check the cache of a recent post (around 6 days old) I get the Google 404 error page with this text:
"The requested URL /search?q=cache: (URL String) was not found on this server. Thatís all we know."
Eventually the posts will get cached but it's taking 10+ days. It's also taking afew days for my new posts to get indexed which I don't think has anything to do with the cache problem.
Anyone had a similar issue? I'm afraid it may affect my rankings eventually.
I wouldn't worry about it. The cached pages are not stored in the same area as the data that Google uses to calculate rankings. After watching things like what you report for years, I'm convinced there's no connection.
I don't even allow cache in the first place. So the lack of result for a cache:url query was never a problem. If your cache eventually appears after 10 days, it just means that the site is not considered newsworthy by Google. Also not a big deal (unless news coverage is what you do, of course).
Is it really a 404 page from Google or your site? I mean, is the cache page showing your 404 page? If it's Google's page, there's something up on their end. I can't agree with 1script, it sounds like they are having issues with the pages you're creating, not that it's not news worthy but that it can't reach the newly created page but wants to. I have seen CMS site pages get cached right after creation but the author was creating the page for a story that wasn't finished, the CMS doesn't actually have a rewrite assigned to the content and somehow crawlers see it, try to hit the page and get a 404. Later, after completeing the article, the CMS allows the redirect and the page is reachable.
I'd assumed the OP meant Google's 404 page, with that characteristic "That's all we know". Matter of fact, he says so explicitly. Oops.
:: pause to contemplate the wild incongruity of google claiming ignorance on any point ::
You've been reading WebmasterWorld too long. You've grown used to material being cached and indexed before the page has time to refresh. For the rest of the world it takes longer ;)
But that still doesn't explain what google-- of all people!-- is doing, linking to a page that doesn't exist. Do they create the URL for the cache before creating the cache itself?
How do you check for cache? Do you mean you have clicked a "Cached" link in Google SERPs or are you using one of toolbars or similar?
|The problem I'm having is when I check the cache of a recent post (around 6 days old) I get the Google 404 error page |
From what I could see there are (at least) two different ways Google URL for cache can be constructed as. Both have q=cache: with the domain name, but the "Cached" link from the serps has additional parameters such as SERPs keywords, position of listing within the SERPs, language, google geolocation, and a nnnnn string that probably tells google something as omitting it still returns the same result. This link would highlight the search string on the cached page. URL q parameter example:
|But that still doesn't explain what google-- of all people!-- is doing, linking to a page that doesn't exist. Do they create the URL for the cache before creating the cache itself? |
The second example is a q parameter of just domain name, which returns cached page without keywords highlight, e.g.
Therefore, using the syntax above, anybody can make up a cache request URL, which is what various SEO toolbars do for "Cached" option. So unless imbckagn answers how he got to "Cached" link, we would not know whether Google created URL for "Cached" link that does not exist or was the URL made up by some toolbar or similar, using the rules above, which resulted in Google returning 404 as the cached page does not (yet) exist.
From the limited inspection I have done, if the page does not have cache, the "Cached" link is not shown in Google SERPs.
|I have seen CMS site pages get cached right after creation but the author was creating the page for a story that wasn't finished, the CMS doesn't actually have a rewrite assigned to the content and somehow crawlers see it, try to hit the page and get a 404. |
This is not quite what is happening. If the page returned 404, Google would never cache its content as 404 means "Not found".
What has most likely happened is that the CMS returned the content of its 404 message, but the HTTP response was 200 OK. In which case, Google cached the page content as it was told that the page exists. There are milions of such cases all over the web and the reason for this is the CMS not handling 404 response properly and is not Google's fault.