Forum Moderators: phranque
Spiders are very forgiving when it comes to receiving a 404 error. All that means is the page cannot be found.
If a document has been permanently removed, the server should probably return a 410 Gone status as opposed to a 404 Not Found.
There are specific issues to contend with when removing a page from the web. If there are inbound links to that page (which there most likely are), those links need to be removed completely if you want the pages to get removed from indices. If those links are being followed (which they most likely will be), they will continue to get a 404. The spider does not know that the page is Gone forever. It only knows that the page didn't exist on that visit. A 404 Not Found can be returned for a variety of reasons. Server being down, page moved, page removed, etc.
Here's the big problem. Many just don't know the intracacies (I didn't) of dealing with pages that have been moved and/or removed. It could take years to clear out a page that has been permanently removed if it is returning a 404 instead of a 410.
By inderectly I mean I was having problems with Yahoo Search API that I used on my website and one of the techies was helping me solve it. He saw the broken URLs being returned and fixed them, though he never admitted it :) But they were in the index for a year, and dissapeared the same moment he was helping me. Too much of a coincidence, don't you think?