asheridan - 4:05 pm on Apr 27, 2012 (gmt 0)
Hope someone can shed some light on this...
A while ago, I noticed our pages that should have been returning a 404 header response were actually 302ing to a custom 404 page first. I've since fixed this with PHP to work out whether or not the requested URL should return a page - if it shouldn't, I return a 404 response and include the custom 404 text with PHP.
Since doing this, reports of soft 404s in GWT have retreated to 0 but, naturally, 404 errors have skyrocketed.
This doesn't bother me, because the pages shouldn't have existed in the first place - and as long as we're not LINKING to any of those internally, Google should eventually give up on them and play nice.
However, I'm now concerned because Google is reporting that the pages returning a 404 header response are in fact linking to themselves
e.g. mywebsite.com/page_that_doesnt_exist.html returns a 404 but - according to GWT - is being linked to from mywebsite.com/page_that_doesnt_exist.html
The only link I can see on the resulting page is the rel=canonical
So my question is - Is Google ever going to give up on this page? Or is the fact that I'm generating the 404 response AFTER directing Google to the 'appropriate' canonical URL, forcing Google to attempt to index the page again and treating it as an internal link?
Hope that makes sense to someone!