Forum Moderators: martinibuster
My gut reaction is to never use a 404 for anything outside of error reporting. But trying to put myself in your shoes, I think I'd take a hybrid approach in order to preserve your site's ability to tell you that you've got a bad link on one of your pages, as well as to avoid losing incoming traffic. My purist approach works well for me on my small sites, but on a large site, I can see the benefit of recapturing the incoming traffic.
What I'd propose is something like this:
If a page is not found:
.If the referrer is non-blank AND indicates an internal link
..Then report a 404-Not Found with a custom 404 page, offering a link to the site index.
..Else use a 302-Moved Temporarily redirect to your home page (or to a "Sorry, page has been moved" page with a link to the site index).
However, once a popular missing page is identified, I'd add a custom 301-Moved Permanently to point to its replacement. Handle the top-ten most-popular missing pages in this way, or as many as you can justify.
The premise here is to separate out internal linking errors (which you can control) from externally-caused errors.
The above can be implemented easily with mod_rewrite on Apache, using the -U or -F RewriteCond special patterns to detect missing files before making the decision whether to respond with a 404 or 302.
Jim
Well, you have to decide where to redirect each page... A big part of it is whether the "user experience" is good or not. As far as losing links, no, you won't, unless the webmaster of the linking site decides to remove the link because he/she feels it is no longer relevant. But as far as basic link-checkers go, they will think the linked page exists in some form if you feed them a 301 or 302. It's only when you respond with a 404 or a 410 that they're likely to drop the link.
One thing I routinely recommend after implementing redirects and custom error handlers is to use Brett's Server Header Checker [webmasterworld.com] to make sure both the HTTP response code and the destination are correct for all the important old page URLs. This prevents common problems that can cause a 302 to be returned instead of a 403 or 404, when you really intend to return a 403 or 404.
HTH,
Jim