Forum Moderators: phranque
I find it every strange that a RewriteRule still takes place even if the page is 404.
For example:
RewriteRule ^archives/([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ /index.php?year=$1&monthnum=$2&day=$3 [QSA,L] Say you browse to:
http://www.example.com/archives/2006/06/55/
Your browser will send a 404 header response but you are still taken to index.php. :( Instead of my 404 page. Why?
Of course the code above works just fine for valid pages. Very strange, you would think this would not work this way.
Any thoughts,
Will
The normal server response is a 404 if the URL does not resolve to an existing file.
By rewriting all of you "archives" URLs to your script, you are in effect saying, "all URLs of this format resolve to the existing file index.php."
If you rewrite URLs to a script, then it is up to that script to look in your database and determine whether the content associated with the requested URL exists or not. If it does not, then the script can and should output a 404-Not Found HTTP response header, a 410-Gone response, or any other response you choose.
...browser will send a 404 header response but you are still taken to index.php.
Jim
If you rewrite URLs to Wordpress, then it is up to Wordpress to look in your database and determine whether the content associated with the requested URL exists or not. If it does not, then Wordpress can and should output a 404-Not Found HTTP response header, a 410-Gone response, or any other response you choose.
Once Apache enters the content-handler API phase (to run a script such as Wordpress), then the default Apache error-handling is no longer available, and the script becomes responsible for all error-handling.
You might want to look for a Wordpress plugin that detects missing content and returns an error response code; Having done so, it could then "include" your 404 custom error page as an in-line file.
Do not use redirection to accomplish this. If you do, the client will see the redirect response code instead of the 404 response code. Search engines will therefore not recognize that the originally-requested URL resolves to non-existent content. Check your work using the "Live HTTP Headers" extension to Firefox, or any other accurate server header checker.
Jim
That's actually an excellent way of describing the misunderstanding, and probably easier to understand than my approach of trying to explain the various Apache API phases. So thanks for posting that; I'll try to remember that phrasing the next time this question comes up.
To further clarify, the reason you wouldn't get a 404 is that as far as the Apache 'exists' checking is concerned, the URL does exist, because the URL is rewritten to "/index.php" and "/index.php" does exist.
The "year=y&monthnum=m&day=d" query string parameters (the "GET data") attached to that URL are not part of the URL, and are not meaningful to Apache in any way; They only have meaning to the script, and do not affect whether the script exists or not. Apache has no way to find out that a database lookup inside the script has failed because one or more of those values does not return a valid record, and so the script itself must handle all such error conditions.
Be sure to check the response code headers returned by your script as described above.
Jim