Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite / 404 handler problem

         

willamowius

3:39 pm on Nov 10, 2004 (gmt 0)

10+ Year Member



I generate content dynamically and also do my eror pages via a script.

Apache http.conf looks basically like this:
ErrorDocument 404 /perl/error.pl?code=404
RewriteEngine on
RewriteRule ^/dir/(.*)$ /perl/content.pl?para=$1 [PT]

Works fine for 404 errors outside /dir/ and when the content script has HTML to print.

When the content script detects that it has nothing to send, it generates a 404 error.

print "Content-Type: text/html\nStatus: 404\n\n";
print "<html>...</html>\n";

The strange thing is, that in the case where content.pl generates a 404 error the browser also gets the output of the error script.
To make this even more strange to me, the content script is called _without_ the parameter.

Can anybody enlighten me why this is so?

PS: I tried to replace [PT] in the RewriteRule by [L], but then Apache doesn't find the script any more.

jdMorgan

5:39 pm on Nov 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> When the content script detects that it has nothing to send, it generates a 404 error.

*How* does it detect that it has nothing to send? This is a critical factor.

Jim

willamowius

5:46 pm on Nov 10, 2004 (gmt 0)

10+ Year Member



The content script gets the filename that was requested in the URL as CGI parameter. It looks the file up in the database (CMS).

It fails when somebody types in the wrong URL or in the rare case an article is removed from the CMS. Unfortunately I can't detect that situation in a RewriteCond, so the content script has to produce the 404.

The problem is not that a 404 gets thrown, but that the output of _both_ scripts gets sent to the user.

jdMorgan

6:39 pm on Nov 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is probably going to take some troubleshooting, but it's likely that the ErrorDocument handler is being invoked by a failed (missing) file request within the script, rather than the error being handled completely within the script as you intended.

You can detect missing files (but not CMS-deleted entries) in mod_rewrite using
RewriteCond %{REQUEST_FILENAME} !-f

and similarly, missing directries
RewriteCond %{REQUEST_FILENAME} !-d

I'm not sure if that will help, but it's one possibility.

Also, make sure that your errordocument URL does not match the pattern in the rewrite rule. It does not appear to do so in the code you posted, as "perl" won't match "dir" but there's the possibility that you may have obscured your URLs or shortened them for posting.

If you can tell, please post the order in which the scripts appear to be outputting their headers. It would be helpful to know which handler is running first or "inside" the other, as my "guesses" above cover both possibilites.

Jim

willamowius

7:04 pm on Nov 10, 2004 (gmt 0)

10+ Year Member



The content script runs first and after it's output the output of the error script apprears.

Do you think the content script is triggering the the 404 handler by looking for something nonexistant? Or is this a case of pass-through where Apache sees that a script generated a 404 status and then calls the 404 handler.

jdMorgan

7:26 pm on Nov 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache's "visibility" is limited to the URL- and file-request API phases only. It has no way to "watch" the internals of a script, unless that script requests a file using Apache's mechanisms.

Another way to put this is to say that ErrorDocument will be invoked asnychronously if non-existent URL is requested. without Apache being aware of anything other than that the requested document was requested and does not exist.

Hopefully, someone who's seen something like this before will drop by and post more useful comments, as I'm stumped.

Jim

willamowius

7:57 pm on Nov 10, 2004 (gmt 0)

10+ Year Member



I found one more detail to add:
This somehow interacts with mod_perl.

When I switch off mod_perl for the content script the problem doesn't happen.

Unfortunately the load is too high to leave it this way and only switching the error script to CGI doesn't help.

Does that ring a bell for anyone?