|GWT swimming in 404 errors making it almost unusable|
I decided to remove a small section of my site that had continually updating affiliate offers along the lines of "today's top 10 widgets". The links all redirected via 301 through a single redirect page that was blocked in robots.txt, the standard installation for this particular program. Additionally all links pointing to the redirect page carried a nofollow tag.
After removing the offers from the handful of pages they were on my GWT has become inundated with 404 errors, to the tune of 30,000, most of which expired some time ago, years even.
Google never forgets a link, is there anything I can do besides just leave the 404 errors in place to speed their removal from GWT? Marking them fixed just gets them tagged as 404 again and they return.
How long does it take for 30,000 404 errors to stop being reported these days?
If the "small section" was a particular folder, you can go thru the steps to remove that folder. I've done it in the past but don't have a checklist of all the steps, it is not hard to do. The folder can not be blocked in robots.txt if you want to apply to remove it in GWT. If these were all individual pages then each would need to me removed. I have no idea how long it takes to accomplish, but it does stop the 404s.
It will take years to clear that out. You should serve 410 instead of 404, in that case they will disappear in a matter of weeks/months.
Do 410's really get treated more swiftly than 404's when you remove content? Google stated they see them both the same. Can't hurt to try, I'm not going to clear 1000 of these a day for a month manually when it ultimately changes nothing.
410 is treated slightly differently to 404; in this case just enough to be the right option for you to choose.
To expand on what he said about the 'slight difference', basically, a 404 will eventually be handled as Google handles a 410 initially.
Technically, a 410 is 'gone' and should not be re-requested by the user-agent in the future, but since webmasters change their mind and Google doesn't want to 'throw out' something good, they will re-request a 410 at a relatively infrequent rate, but the 'infrequency of re-requesting the page' begins as soon as the 410 is discovered. And, since it requires a 'deliberate action' to generate a 410, the page generating the error will be removed from the results nearly immediately.
A 404 initially is re-requested more frequently and will remain in the results for a longer period of time, because it's 'not found' and there are a number of reasons a page is 'not found' such as server error, or even some crazy 'update timing' where a site owner has decided to upload a newer version of all the pages in a directory on the server, which causes (or can cause) all the current pages in it to be deleted prior to writing the new version.
In a case such as the preceding, where there are 100 pages in the directory it can take a bit for the upload and writing to be completed, so if anyone (including Google) happens to request one (or more) of the pages after it's been deleted but before the new version is uploaded/saved, a 404 'not found' will be generated.
The page generating the error due to 'odd timing' should (correctly) be re-requested by the user-agent (including GoogleBot), and fortunately for webmasters with 'ugly upload timing', will remain in the results for a longer period than a 410 will, because a 404 Could be deliberate, but a 404 Could also be a short-term error which will be corrected, and there's no way of knowing for sure which it is from the error code, so Google 'errs on the side of caution' for 404 handling and keeps 'checking back' to see if the error is corrected at a relatively high frequency initially ... The longer a 404 goes uncorrected the less frequent the requests they make will become, and over time 404 will be re-requested at (or about) the same frequency as a 410.
Yes, as jdMorgan used to say:
"404 - the server can't find it, doesn't know why it is missing, doesn't know if it is ever coming back, and doesn't atually know if it ever existed in the first place."
"410 - it was here at some time in the past but now it has gone away forever. Don't bother asking for it again as you'll likely get the same response every time."
Once Google sees a URL starting to return 404, they check it up to twice more in the next 24 to 48 hours and then don't come back again for many weeks.
Once Google sees 410 Gone, they don't come back for many months to check the status again.
Google does check every URL they have ever seen at least once or twice per year because a large number of URLs returning 410 and other such codes do eventuallycome back to life (new content, new site owners, change of CMS, and many other reasons).
Google does sort the 404s by "priority". Do actual problems bubble up to the top for you?
I'd like to see you be able to sort the 404s by "number of current internal links", and "number of current external links" in that report. You'd think that Google's priority metric would already have the urls sorted by number of links to the page somehow.
I have 404 deleted pages and still appears in serps. An option in WMT for reported 404 pages with "remove permanent and immediatelly" will be useful.
I dont know if actual option "Mark as fixed" removes from serps too.
They will drop from the SERPs soon enough. There's nothing you need to do except be sure that the human readable 404 error message is helpful to the user and contains links to relevant content.
|Once Google sees 410 Gone, they don't come back for many months to check the status again. |
Necessary disclaimer: This applies specifically to google. Other search engines-- assuming for the sake of discussion that there exists such a thing-- may behave differently. This is based on direct personal observation. There are probably additional factors involved.