Msg#: 4562321 posted 10:54 am on Apr 7, 2013 (gmt 0)
Several months ago I worked over my site .htaccess file to eliminate the possibility of returning a 200 response for malformed urls. If a page doesn't exist a 404 response is the result, the site no longer makes a best guess redirect. Over the years this site had picked up countless links containing malformed urls, it was due for a cleanup.
As expected the number of 404 errors reported in GWT shot up and I diligently went through the list regularly and cleared it. After a couple of months of doing this daily, on March 15th to be specific, I stopped logging into GWT. Monitoring these 404 errors in GWT had become a chore without benefit. On March 23rd I received a 404 error email warning and every day since the crawl rate has been 3x normal.
All of the previously cleared 404 errors are back. My concern is that when I was clearing the errors the crawl rate was normal but now that I'm not it's 3 times higher than normal. Should I go back to GWT and clear the errors when 404 is the right response? I'm inclined to let Google get a good dose of the 404's and, eventually, stop crawling them as they weren't intended to exist in the first place.
Clearing 404 errors in GWT can seemingly return the crawl rate to normal, but is it a good idea? Traffic, and rankings, remain unchanged.
I'd be inclined to "let them figure it out" for a couple of months. The increased crawl rate means they are going to discover a lot of these dead URLs and clean up their database.
One thing I have found useful on occasions is to add a chunk of PHP that makes a note of every 404 served to Google from the CMS and next time they or anyone else asks for the same URL, return 410 every time thereafter.
That is: the kind of requests you get when the search engine suspects you've been throwing soft 404s (which I've never done, and I haven't done an unusual amount of redirecting). Either that, or the googlebot has a cat.
Huh what? I've never had image files flopping around loose in the top level-- and I don't even own a png with this name! (I just checked. Not one on my entire HD.)
In the last few days they've also asked for several pages that have been 410 since January of 2012. You'd expect that of bing, who like 410s almost as much as they like robots.txt so they keep coming back for more. Google tends to get bored and go away after two or three doses.
It makes me nervous when g### does something different. And I still haven't figured out what that blasted snippetbot wants, let alone why it wants it so many times.