Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

GWT & 404's That Never Die

         

austtr

11:25 pm on Jan 31, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can anyone explain why GWT keeps showing 404's for links on pages that ceased to exist years ago, in some cases over 12 years ago.

Not only do the links no longer exist, but the pages that showed the links no longer exist... they are no longer on the server, they are gone, kaput, finito.

But GWT keeps telling me my site(s) have all these broken links associated with these non-existent pages. Are they not able to distinguish the difference between a current broken link and a historical record from the far distant past.

This is a bit of a rant. I know there is supposedly no ill-effect from this but boy, is it ever annoying! For the life of me I can see no logical reason why this happens.

netmeg

12:39 am on Feb 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Cause somebody somewhere probably is still linking to them.

Robert Charlton

8:17 am on Feb 1, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The 404s probably indicate that Google has recently done a deep crawl, and, as netmeg suggests, Google has found links to these urls somewhere on the web.

This thread goes over a bunch of possibilities.

17 May 2013 - GWT Sudden Surge in Crawl Errors for Pages Removed 2 Years Ago?
http://www.webmasterworld.com/google/4575982.htm [webmasterworld.com]

I recommend reading the 2006 interview with the Google Sitemaps Team that I link to in the last post in the thread, even though the interview is an old one. As I understand it, there are going to be periodic times of deep crawl when Google might resurrect old crawl lists. My interpretation is that Google feels it's almost the "responsibility" of a search engine to occasionally recheck all links and report errors (and yes, it can be a PITA).

Note in the thread that best practice, per Google's John Mueller, would be to serve 410s where you absolutely intend the urls to be gone. This at least lets Google know that the error is on purpose, and they're likely to recheck less often than they would otherwise.