|404 errors from pages that do not exist|
I've a website with <sitebuilder and hosting provider>, I've been blogging twice a day for three to four years now and suddenly about a month ago, every blog post URL was being seen by Google as a 404 error. I started off by deleting each post, then having to delete the page entirely. Nothing. Google could still see it. I then requested that the main blog pages be deleted using webmaster tools, which was accepted, but no good. They still kept on coming back. I have since used URL link tools to make sure there is no sign of the old blog page on my site and I then resubmitted a new sitemap to Google. The problem still persists. I spoke with a technician at <sitebuilder and hosting provider> yesterday who tells me there is nothing they can do their end and that it's a Google issue. Has anyone any idea on how I might want to go about getting rid of these errors and old pages that do not exist.
I had over 2600 errors at one stage. I selected all of them as being problem solved after trying to fix them and each day the number rises. I'm now seeing around 600. WMT is telling me that it's coming from my sitemap - example.com/sitemap.xml but I've checked and the page doesnt exist on there. internal-page-blog.php was the original blog page.
Any help would be great.
[edited by: aakk9999 at 12:36 pm (utc) on Aug 10, 2013]
[edit reason] Exemplified, removed specifics. Please go and read ToS [/edit]
Try to "fetch as googlebot" in WMT. Is the response also 404?
Welcome to WebmasterWorld!
I am a bit confused with your post, especially this part:
|I've been blogging twice a day for three to four years now and suddenly about a month ago, every blog post URL was being seen by Google as a 404 error. I started off by deleting each post, then having to delete the page entirely. Nothing. Google could still see it. |
These two bolded parts written as they are seem to be contradictory. Reading your post again, this is what I am getting from it:
It seems that the blog pages started to return 404 (not sure if they suddenly started to return 404 or did you do something on purpose to return 404).
Then you decided to physically remove the pages, but I am confused what do you mean by "Google can still see it" since you previously say Google is geting 404 response. Do you mean the content is shown, but the response is 404? Or do you mean the URLs still appear in Google search results?
Reading on your post and assuming you want to remove these blog pages from your site, then:
You should ensure that pages return 404 (which they do, it seems). You should also ensure you are not internally linking to these pages (which you did, using the URL link tool). And you should remove these pages from the sitemap.xml (which you did). So far, fine.
Once pages were removed, Google will report them in WMT as "404 errors" - this is normal. You have checked 404 errors "linked from" information and you saw that the referring page is your sitemap.xml, but the sitemap was updated and resubmitted. Note that it may take some time for Google to process the sitemap AND to re-crawl the page after processing the sitemap to remove the reference from the sitemap to your page. So you need to give it a bit of time.
To me everything seems fine so far. Here is some additional info on 404:
When the page starts to return 404, it will still be shown in Google search results for a while, and then it will eventually drop from Google search results. It may take some time for Google to drop these pages from SERPs. To see if a page is still in Google SERPs, you can use the following search command:
site:example.com inurl:blog.php (replace example.com with your domain name)
If pages keep returning 404 then the above command will return less and less results as the time passes.
Removed pages will show in WMT as 404 errors, at least for some time. If the page has external links, it may keep coming back in 404 errors report of WMT even though you acknowledge errors in WMT. If there are no external nor internal links, page may still come back in 404 report, but this will come less and less often since if the URL keeps returning 404 for extended time, Google will crawl these URLs less and less frequently. Just keep acknowledging these "errors" in WMT.
You need to give it some time for 404 pages to disappear from Google SERPs as Google will need to re-crawl each page to see 404 response, and it may expect 404 a few times from the same page before it drops this page from its index (and SERPs).
If I haven't got right what you are trying to do, can you please try to explain better what your problem and question is.
:: poring over OP and sharing others' puzzlement ::
My guess about meaning: Google has recently started asking for pages that don't exist, and is correctly getting a 404. But it continues to ask for these pages.
Oddly, there was another post just a few days ago where the post'er similarly noted that g### claimed the nonexistent pages were in the sitemap although they weren't. Are we about to learn about a Google + sitemap.xml bug?
I can't really tell either if I'm reading the OP correctly or not, but if it is about getting Google to stop showing legitimate 404 errors for pages that do not exist, I can only say I wish I knew how to do that too.
I just recently started trying to help a site with this kind of problem. Google routinely returns 404 errors for pages that have not existed on one site or in its sitemap for over two years. They are not linked to anywhere except maybe in Google's memories. They are way down from 6 months ago when hundreds were listed, but I can't see any evidence that Google is replacing an old sitemap when there is a new one, more like they add the new one to whatever information they previously had for a domain. The new sitemap has been resubmitted many times. The 404 errors claim to be linked to from "sitemap.xml" which of course, they are not in the current sitemaps.
I was not managing this site when it was changed, but the site had been built a page at a time and at some point I guess logic or something kicked in and these pages were rewritten to a more sensible structure using directories for the various categories. I mark them all as corrected and go back next week and correct them all again. The 404'd pages were all replaced with a different site layout so that an equivalent page exists but not at the original URL. That was all done over 2 1/2 years ago. Their 'remove a page' feature does not work when the pages really don't exist.
If google is putting in repeated requests for the same nonexistent page, it's got some reason to believe the page exists. It isn't like testing for "soft 404" where it will make up a one-time complete gibberish URL and then never ask for it again.
Unlike some search engines, google does seem to understand a 410. So if you've intentionally removed a page that used to exist, you're better off coding a 410 response. Be sure to include a line pointing humans to a custom 410 page. (It can be the same physical page as your 404 if you don't have the resources to make a separate one.)
|The 404 errors claim to be linked to from "sitemap.xml" which of course, they are not in the current sitemaps. |
Eeuw, I'd forgotten that detail. "Linked from sitemap" doesn't necessarily mean the current sitemap. It means any sitemap that they've ever set eyes on-- even when the Sitemap section of gwt shows the correct number of pages on the sitemap. I think it just means "How we first found out about this URL".
I have the same problem, Google keeps looking for pages that used to exist because of a CMS problem about five years ago. When I first noticed the problem, I returned a 410 for about a month, then used WMT to remove the pages from Google's index, then blocked then in robots.txt. When the CMS finally fixed the problem, there was no need for the block in robots.txt. Now Google started asking for those pages again and the site returns a 404.
Google's like an elephant...
Yes, I've reluctantly realized that the right time to retire 301 or 410 responses is ... never. Just keep redirecting or "ain't here no more"-ing that sucker, and you can't go wrong.
Caution: Will not work with That Other Search Engine. They don't seem to understand that 410 means "Look, it's gone, OK? It's not coming back. They took it off the air. Live with it."
|Google routinely returns 404 errors for pages that have not existed on one site or in its sitemap for over two years. |
This is not an issue. If these pages should be on the site, fix the problem and restore the content. If these pages really are gone for good, be content that Google knows that too.
If you have a way to code "gone for good" such that it returns "410 Gone" you'll make life slightly easier, but it's not the highest priority to do that.
I share the same confusion about the original question.
If a live and valid site suddenly returned 404 errors for all or many of the existing URLs, is it possible that a CMS upgrade or a configuration change meant that every page of the site suddenly had a new URL and the old ones were not being redirected to the new?
I think there's more to the story than was explained in the original question. Perhaps they could explain it again, from the beginning and in more detail.
Certainly it makes no sense to read (paraphrased) "the pages of my site were returning 404 errors so I deleted those pages".
Just a note here about the information shown in GWT - I went in to sweep out the 404s yesterday for that domain, and I always check to see where Google says it is linked from. In the "Linked From" tab it often shows "sitemap.xml" although I know that those URLs are not in the current sitemap so after checking the "Linked From" tab and seeing the sitemap reference, I clicked on the "In Sitemaps" tab and that page is blank. So it tells me that Google is not claiming to have found the nonexistent pages in the current sitemap, it is telling me that a sitemap is where they originally found the URL. Confusing, but that's what I get from it.
Thanks for coming back to me with further questions
I don't know why all of a sudden I was being emailed by google asking me to sort the errors out. I had 2000-ish 404 errors.
The errors all stemmed from my blog page on my <blog provider> site. Each blog post had it's own URL. Now, Google was telling me that this were 404 errors.
I tried deleting them individually.
I tried blocking the main blog page in webmaster tools.
I then completely deleted the main blog page.
I resubmitted a new site map without the blog page. But the errors still keep on being found. Even though those pages have been deleted entirely from my site. A new sitemap has been submitted, without these pages.
In the crawl error section of webmaster tools. If i click on one of the errors, it opens up a box with three tabs.
error details - in sitemap - linked from
In error details I'm being told - Googlebot couldn't crawl this URL because it points to a non-existent page. Generally, 404s don't harm your site's performance in search, but you can use them to help improve the user experience.
In sitemap I'm told - http://www.example.com/sitemap.xml
Which is impossible as it's not in the sitemap
In linked from I'm told - http://www.example.com/sitemap.xml
http://www.example.com/some-path-blog/category/Widget Test Info
These have all been totally deleted and are not in the sitemap.
<blog provider> told me that they had issues with their blog page and have since discontinued it. They couldnt help me and insisted it was a Google issue.
I'm being told by certain people that 404s shouldnt damage your site. But I'm not happy with 2000+ errors. I've also seen my structured data crash from around 50 pages to zero at roughly the same time. I've read a few other people suffered this back in July. I managed to get it back to 22 a month later and this weekend it's dropped back to zero. At the same time my 404s have gone beyond 2000.
I see a pattern.
[edited by: aakk9999 at 1:32 pm (utc) on Aug 27, 2013]
[edit reason] Examplified, do not post your site URLs, please go and read ToS and Forum Charter [/edit]