vlexo and born2run
Properly configured, Drupal should return a proper 404 unless it is very old (like Drupal 5).
If you are getting thousands of hard or soft 404s, it is because you have some type of config problem. I have no idea what or why. Out of the box Drupal should return a standard 404 for any page that isn't found. Check it with LiveHTTPHEaders or some similar header checking tool.
First question: What are these URLs? Do they resemble valid URLs? Are they related to pagination, date stamps, search parameters or anything like that?
A note about URLs. If you have something that generates valid "native" URLs (like "node/15") but then appends a parameter, you will get the same result as the page with no result. For example
[
drupal.org...]
[
drupal.org...]
This is functionally the equivalent of adding a get query string to any URL, such as
[
webmasterworld.com...]
Now if you are using a Drupal URL alias (set manually or via pathauto or what have you) then you cannot append random stuff and have a valid URL, because in that case it is doing a DB lookup for the entire page and it will not find it.
So if I have
[
example.com...]
then
[
example.com...]
will return a 404 unless such a page exists.
If you are set up with pages using valid URLs with pagers as GET query strings, then you can also end up with thousands of 404s. Again
[
webmasterworld.com...]
[
webmasterworld.com...]
[
webmasterworld.com...]
Are all valid ad infinitum. That's not Drupal-specific. In other words, it's probably not the
handling of 404s that is the problem, but the
generation of bogus URLs that is the problem.
Special note about Views Views behaves like the rest of Drupal. If you are getting Views pages that are causing 404s, however, you can set your Contextual Filters to serve a 404 if argument validation fails. Under "More" you can also set a filter to return a 404 if there are *more* arguments than required.
If you don't have a contextual filter, you can use the Global:Null filter which does pretty much nothing except let you set these options.
That said, you may simply be masking a problem (the problem being that you are generating URLs with extra parameters).
-----
As a side note, there are things you can do to improve 404 handling in Drupal. None of these are oriented toward fixing your problem, because that shouldn't be happening period.
- Make sure you have created and set custome 403 and 404 pages in your site settings. In D7 this is Configuration -> System -> Site Information (at admin/config/system/site-information).
- Fast 404: serve a 404 without bootstrapping the whole system
[
drupal.org...]
- Search 404: Attempt to search based on url keywords
[
drupal.org...]
- Global Redirect: not so much for handling 404s, as for a number of things like 301s to URLs that use the page ID rather than the friendly. Should be on all Drupal sites.
[
drupal.org...]