jdMorgan - 3:24 pm on Oct 26, 2010 (gmt 0)
One reason it's not trivial to do this is that it's such a bad idea.
For purposes of keeping your site's search ranking and not confusing visitors, you should resist the urge to "correct" all incorrect URLs, and simply improve your 404 page to be informative and useful.
A good 404 page explains (in a somewhat apologetic tone) that the requested resource could not be found for an unknown reason, and then presents helpful resources for the visitor to find what he/she was looking for.
These typically include a link to the home page, links to major "categories" or "sections" of the site, a link to your HTML site map, and a link to your site's search facility, as applicable.
You can, if you like, include a meta-refresh on this page to forward the user to your home page after sufficient time has been allowed for the user to read and completly understand the page and select one of the links provided. But don't be in a rush here. Allow 15 to 30 seconds -- enough time for a new, non-technical reader to read and fully understand what they're reading and to make an informed choice.
If you set this meta-refresh time too short, then some search engines will treat it as a redirect, and you will get the same problems as described here for explicit redirects.
If you insist on redirecting requests for all missing resources to your home page, you will create an essentially 'infinite' URL-space, where requests for *any* URL that resolves to your server will be served the home page. The result will be duplicate content and the search engine spiders' arbitrary limitation of the depth to which they are willing to crawl your site, since they will see that it has an infinite number of URLs on it. Neither of these are good.
It is true that most major search engines have methods to avoid these problems, but I have never been one to rely on their algorithms to be 'perfect and fault-free' 100% of the time. If you set up your server correctly and in compliance with the requirements and intent of the HTTP protocol [w3.org], then you simply don't have to worry about whether each search engine can compensate for the problems in your server configuration and "figure it out."
Note that for resources which are intentionally removed, a 410-Gone response (and error page) is the correct approach. The 410-Gone error page can be mostly identical to the 404 error page, except that it should state that the requested resource has been intentionally removed, rather than being "missing for an unknown reason."
Implementing 410-Gone requires that you keep a list of intentionally-removed resources (for example, as part of your .htaccess file or in your "main script." Then when you get a request for one of them, your code "knows" that a 410 response should be served by using a rewriterule or by generating a 410 page and response header in your script(s). This presumes that your site is stable and well-administrated, so that you have (and allow) only a very few URLs which must be removed over the life of the site.
Nothing in the above should be construed to mean that you cannot "correct" minor common typos in requested URLs and redirect them to the correct page. Useful corrections are to fix things like ".htm" when ".html" is needed, removing "punctuation" from the end of the requested URLs such as "my-page.php." or "my-page.php," when this occurs due to poorly-coded link-posting software in forums, blogs, etc., and redirecting requests for resources for which you have knowingly changed the URL. The point is that if a requested URL is completely unresolvable, then you should not just redirect it to your home page; Reserve redirects for replacing only those URLs for which you know the exact, correct, relevant, and unique replacement URL, as intended by the HTTP protocol.