|Webmaster Tools showing a bunch of 404 errors|
Just checked our webmaster tools account and Google is reporting a huge spike in 404 errors. When we checked some of them it appears there is no error at all.
Google is reporting that a page like blue-widgets.html is returning a 404 error but the page is actually blue-widgets.htm and the pages linking to it are all linking to the correct URL.
Is something broken here?
The WMT report is in fact correct: The page blue-widgets.html does not exist and therefore the response returned is (and should be) 404.
A different question is why/where is Google finding these URLs with .html since, as you said, you are internally linking to .htm version of your URLs
- It is possible that some other website links to your pages with incorrect URL extension.
- It is also possible that Google has decided on its own accord to try URLs with .html extension - it has been known that Google does on occasions make up the URL to try on a site.
- Is it possible that at some (even very short) point in time in the past you did have .html version of pages internally? Because Google never forgets an URL it saw and will be re-trying them from time to time
Have you checked "Linked from" tab for some of these URLs? What does it say there, where did Google discovered these URLs?
However, if such pages never existed on your site and you have never internally linked to this URL format, these 404 errors should not do you any harm.
I do not know if I actually should post it here or in another thread, but I will ask my question anyway. I have a similar, but at the same time slightly different problem, the bunch of 404 errors I have in WMT are caused by pages that no longer exist, but still are somehow picked up by Google Robot. I have checked the linked from tab, but even there found pages that were linked to those pages in the past, but now do not exist. After spending hours and hours of my time trying to clean this mess, sending emails to those I could and simply submitting requests to big G to remove none existent pages off cache and search, those error pages keep appearing in my crawl error reports on WMT.
Any practical suggestions?
If you have deleted pages and they no longer exist and your server returns 404, then WMT reporting is a normal situation and this is nothing to worry about. Consider this as a warning to you that the page does not exist in case the page *should* exist, i.e. in case you have not deleted it or where you have deleted it by mistake.
If the page has existed and you have deleted it, it would be slightly better to return 410 Gone rather than 404 as it sends a clearer message to Google that the page is gone on purpose, and Google acts on this faster.
If the "linked from" pages show, but you know that there is no link any more from these "linked from" pages or the "linked from" page reported does not exist any more, this means that Google has not yet recrawled / attempted to recrawl this "linked from" page to see its new content/status and the report relies on the data Google has from the last time it crawled "linked from" pages.
If these "linked from" pages have a very low crawl priority in Google, Google may crawl them seldom and therefore you may see these pages as "linked from" in WMT for a long time.
Another usage of the WMT 404 report is that, if you see that you have a lots of external links (by checking "linked from") to a particular page that has been removed, and these links may have potentially been a source of traffic to your site, you may consider redirecting this URL to a similar page on your site (if you have similar page), to keep the traffic.
Alternatively, a very good custom 404 page which entices the visitor to continue browsing your site may also recover this traffic.
So to summarise:
If you have removed a page, and the request for this page now returns 404 (or 410), and you are NOT internally linking to these removed pages from within your own site, you should not worry about WMT 404 report.
|submitting requests to big G to remove none existent pages off cache and search |
Are you saying that these pages you have removed are still appearing in Google SERPs? Have you removed them long time ago or recently?
404 pages still in SERPs may happen when the page is newly removed - Google may keep it in SERPs for a little while just in case it was a "server blip" when Googlebot requested a page that resulted in 404.
This is why returning 410 is better when a page is removed - it tells Google that the page was purposely removed and Google is faster in removing it from its index - but it will still show it in WMT report.