Forum Moderators: phranque
We have a website with thousands of "virtual" web pages. The pages themselves do not exist, but instead, are created on the fly using a custom CGI program. The way it works is this...
1) A page request is made to Apache.
2) Since the page doesn't exist, it goes to a 404 error page.
3) Our CGI is setup to automatically handle all errors, so it looks at the page being requested, creates the virtual HTML for it, and sends this back to Apache to serve.
4) Apache then serves the correct HTML information.
This seemed to be working just fine... all of our pages have been working great for months, over 50,000 of them, however, we learned today that while they look just fine in a web browser, the pages themselves are sending an "HTTP/1.1 404 Not found" error within the header of the page.
This doesn't matter for web browsers, however, this recently caused Google to stop indexing all of our web pages because of these 404 codes.
What can we do to "force" Apache to send back a "200 OK" header with a page, even if Apache is handling it as an error page?
I am hoping this is something I can specify in one of Apache's config files.
Here is an example of one of our many "virtual" generated pages:
<snip>
Sincerely,
Jeff Gold
[edited by: eelixduppy at 4:38 pm (utc) on Mar. 27, 2009]
[edit reason] no personal URLs, please [/edit]
There should be a simple rewrite that captures all relevant incoming URL requests, and simply feeds them to the script for processing - and not by invoking any error handling at this point. This is exactly what rewrites are designed for.
That cgi script should then send either:
- 200 OK, and the page content, or
- 404 Not Found, and the error page content, depending on whether there is real content to be sent or not.
That's a slightly different way of doing things.
Once you've sent '404 Not Found' it's too late to send something else instead - and a horrible bodge to try and force it to do that.
The magic is in how the control is passed to the script. It is the rewrite that makes it all possible.
FastCgiExternalServer /Library/Tenon/WebServer/WebSites/.*(?<!\.(?:txt¦ico¦jpg¦gif¦htm))$ -re -host localhost:9008 -pass-header Authorization
I'm not sure what you mean by a "simpe rewrite", so please explain, what can we change, and in what configuration file, so that every single request coming to the Apache server (other than image or txt files) gets treated as 200 OK and gets passed to our CGI?
There are NO pages at all we want to treat as errors. Everything should be passed as 200 OK to our CGI, regardless of what page is requested. Thank you.
Best wishes,
- Jeff
At the point of the rewrite, no status code is sent out. You're simply passing control to the script without returning anything to the browser.
Only after the script has decided whether there is content to be returned for this request or not, should the 200 OK or 404 Not Found status be actually sent - by the script.
I don't use CGI, so I'm not 100% sure how you hook the two things together. There's certainly plenty of examples in this forum, of how you do it with PHP. Perhaps jd can say whether it is similar to that.
Given this mindset, is there a simple way to...
Replace a string within an Apache CONFIG file somewhere, or within some sort of data fork of the Apache program, so that the string "404 Not Found" never comes up, even for error pages, and instead, only the string "200 OK" comes up?
The '404' status code needs to be forced when for the URL that was requested, the script does not have any valid content to be returned.
.
This is different to your current arrangement, in one key way. Your current version returns the '404' status code to the browser before the script even gets to run, and that is not at all good.