Forum Moderators: phranque
i have the following setup on our site:
in the htaccess:
ErrorDocument 404 /errors.php?code=404
ErrorDocument 403 /errors.php?code=403
RewriteEngine on
RewriteCond %{REQUEST_FILENAME}!-s
RewriteCond %{REQUEST_FILENAME}/index.html!-s
RewriteRule ^(.*)$ html.php/$1 [L]
this just means that if the requested page is not found in the site, the URI should be passed to html.php which generates the page in question.
in html.php, if i get an error, or the page can not be found in the database, then i exit with
header ('HTTP/1.0 404 Not Found');
this then calls /errors.php?code=404 which returns a correct 404 header to the browser.
the problem is that none of these 404s show up in my apache access.log. all i get is a 200 code.
i am sure this has something to do with my rewriting all requests to html.php but exactly how or why...?
much appreciate any help!
This line has a problem.
RewriteCond %{REQUEST_FILENAME}/index.html!-s
However, that line may need to be modified, or it may need to be broken into two lines.
If you can re-post with a more detailed explanation of your intent, we can probably get this sorted.
Jim
thanks for the reply...
here is the setup hopefully with the correct spaces:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-s
RewriteCond %{REQUEST_FILENAME}/index.html !-s
RewriteRule ^(.*)$ html.php/$1 [L]
i don't actually have any html pages on the server except...
lots of clients in subfolders. e.g. mydomain.com/client-website/
so i needed to test first of all if the file exists and is more than zero length (e.g. client-website/subpage.html) then to check whether an index.html file exists if just the directory is called.
if none of those criteria are met, then it must be a dynamic page, so html.php does it's thang and queries the database and generates the page according to the $1 parameter (which is the SCRIPT_URL)
it works perfectly in practice, it's a bit like a 404 cacheing mechanism, except that i don't actually write the pages to disk, i just generate them.
thanks
OK, so this
RewriteCond %{REQUEST_FILENAME}/index.html !-s This (it seems to me) would only work if the client requested "domain.com" or domain.com/subdir - without a trailing slash in either case. But it may be OK -- I haven't ever played with that construct.
If so, then the problem is likely one of expectations versus reality, in that (I think that) once your script returns a 404 header, it must also return the content of the 404 page itself. In the API phase where content-generation has begun, I believe it is too late for Apache to take any further internal error-handling steps.
I would break this problem into two pieces, the call-the-script piece and the scripted-404-response piece. If your script is actually writing the 404 response header, I see no reason why you should see a 200 response. So, I assume that that part of the script is not being invoked or that it's failing. Make sure you're declaring a Content-Type before outputting the 404 header... That's about all I can think of. If you don't, though, you should be getting that annoying "Premature end of script headers" message in your error log(?).
I'd set up a super-simple rule to invoke your script -- redirect a "silly" non-existent URL to the script unconditonally. This will help you break the problem into the two invokation versus script-function pieces to simplify troubleshooting.
You can use the server header checker [webmasterworld.com] to see if the response header is the one you're expecting and whether it's properly-formed.
Jim
the mystery deepens...
if i try to access a URL which doesn't exist, the script returns a correct 404 header.
i can do this in two ways.
1* header ('HTTP/1.0 404 Not Found'); or
2* $num = 404;
include ($_SERVER['DOCUMENT_ROOT'] . '/errors.php');
both ways return the correct 404 header, but both ways still only write a 200 to my logfiles:
69.36.190.175 - - ... "GET /nopage.htm HTTP/1.1" 200 5 "" ""
when i manually call my error script (2*) then the custom 404 page is loaded and displayed, however when i send the header (1*) a bog standard 404 page not found is displayed - i am sure this is what you were saying about the expectation versus reality... once the script has been called, i can't then go back and send a 404 header - my /error.php?code=404 doesn't work. apache is out of the equation so to say.
at the end of the day, it doesn't matter if i call the error page manually, as the server headers are correct for all 404s. but this still doesn't solve the problem of the log files. it is something i can live with, but does make tracking down errors tricky.
even when i try to access a .gif which i know doesn't exist, i get a 200 in my logfiles...
i am off to try some of things you suggest :-)
cheers
back again... :-)
when i add the first line to my .htaccess so it looks like this:
RewriteCond %{REQUEST_FILENAME} !jpg¦gif¦css¦js$
RewriteCond %{REQUEST_FILENAME} !-s
RewriteCond %{REQUEST_FILENAME}/index.html !-s
RewriteRule ^(.*)$ html.php/$1 [L]
and i try to access a gif which is not on my server, because it is now no longer rewritten to html.php it is now seen correctly by the server, and the correct 404 header is written to the logs.
however, when an html file is accessed which doesn't exist, it is rewritten to html.php (bypassing apache) and the script itself sends the 404 header. the client sees the correct header, but the logfiles are not written - because i guess, the script html.php is valid - i am sure this accounts for the 200 message in the logs.
i think the problem with html files is inavoidable.. i suppose i could manually include the subfolders which are dynamic - there's about 10 of them - but the RewriteCond gets larger and larger as we expand the site
RewriteCond %{REQUEST_FILENAME} thisfolder¦thatfolder¦anotherfolder¦etc,etc,etc
i have had a search but i couldn't find a way to include an external file with RewriteCond - is it possible - this would probably be the best solution - that way i just add folders to this external file which have to be excluded?
thanks for input! this has been a good learning experience!
cheers
No, unfortunately, it is not.
A possible solution to your case is to use a folder-naming convention that uses a common foldername element, such as: "/rewrite-excluded_folder1", "/rewrite-excluded_folder2" etc. Of course, you'd want to use something shorter, but the idea is that the RewriteCond would only test for the "/rewrite-excluded_" part of the path, not the "folder1" or "folder2" part, and then exclude any requests that matched from being redirected. As long as the foldername contained that common element, it would not be redirected.
Otherwise, if you have access to RewriteMap in httpd.conf, you could use an external rewritemap or script in order to simplify adding new folders.
This line is not going to do what you expect:
RewriteCond %{REQUEST_FILENAME} /index.html !-s
One more note: When dealing with rewrites, it is always a good idea to exclude the destination itself from being rewritten. This can save considerable heartache during development, and the "insurance" RewriteCond can then be removed later if you're absolutely sure it's not needed:
RewriteCond %{REQUEST_URI} !html\.php
RewriteRule ^(.*)$ html.php/$1 [L]
thanks for the tip with the destination file rewrite. i'll remember that.
regarding the {REQUEST_FILENAME}/index.html...
that (as you supposed earlier) actually concatenates the /index.html onto the {REQUEST_FILENAME}
it checks if mydomain.com/subfolder/ (with or without trailing slash) actually refers to /subfolder/index.html
without this rule, the index.html page of a subfolder is not found, as the!-s only checks the /subfolder/ not whether there is an index.html file sitting there.
but i've just had another look and it appears to me to be more logical to check for -d. that way apache automatically appends the trailing slash if left off (the {REQUEST_FILENAME}/index.html!-s didn't do this for some reason).
both ways work, but my gut feeling says the -d is better - nothing like a methodical approach eh ;-)
<added> jim i'm going to look at the rewritemap (i can access httpd.conf), or failing that write a bit of code which saves the 404s - at least that way i can see which pages are missing regardless of the log files.
many thanks for help!