Forum Moderators: phranque

Message Too Old, No Replies

Editing HTTP Status Code Numbers with Apache

Need to change "404 Not found" to "200 OK" in HTTP Header.

         

rescueme

8:38 pm on Mar 26, 2009 (gmt 0)

10+ Year Member



Hello,

We have a website with thousands of "virtual" web pages. The pages themselves do not exist, but instead, are created on the fly using a custom CGI program. The way it works is this...

1) A page request is made to Apache.
2) Since the page doesn't exist, it goes to a 404 error page.
3) Our CGI is setup to automatically handle all errors, so it looks at the page being requested, creates the virtual HTML for it, and sends this back to Apache to serve.
4) Apache then serves the correct HTML information.

This seemed to be working just fine... all of our pages have been working great for months, over 50,000 of them, however, we learned today that while they look just fine in a web browser, the pages themselves are sending an "HTTP/1.1 404 Not found" error within the header of the page.

This doesn't matter for web browsers, however, this recently caused Google to stop indexing all of our web pages because of these 404 codes.

What can we do to "force" Apache to send back a "200 OK" header with a page, even if Apache is handling it as an error page?

I am hoping this is something I can specify in one of Apache's config files.

Here is an example of one of our many "virtual" generated pages:

<snip>
Sincerely,

Jeff Gold

[edited by: eelixduppy at 4:38 pm (utc) on Mar. 27, 2009]
[edit reason] no personal URLs, please [/edit]

g1smd

8:51 pm on Mar 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Doing that as an error handling routine is completely flawed, as you have found out.

There should be a simple rewrite that captures all relevant incoming URL requests, and simply feeds them to the script for processing - and not by invoking any error handling at this point. This is exactly what rewrites are designed for.

That cgi script should then send either:
- 200 OK, and the page content, or
- 404 Not Found, and the error page content, depending on whether there is real content to be sent or not.

That's a slightly different way of doing things.

Once you've sent '404 Not Found' it's too late to send something else instead - and a horrible bodge to try and force it to do that.

The magic is in how the control is passed to the script. It is the rewrite that makes it all possible.

rescueme

9:43 pm on Mar 26, 2009 (gmt 0)

10+ Year Member



Thanks for the suggestion. This is what we are using in our CONFIG file now to send all Apache requests to our script (other than image or txt files):

FastCgiExternalServer /Library/Tenon/WebServer/WebSites/.*(?<!\.(?:txt¦ico¦jpg¦gif¦htm))$ -re -host localhost:9008 -pass-header Authorization

I'm not sure what you mean by a "simpe rewrite", so please explain, what can we change, and in what configuration file, so that every single request coming to the Apache server (other than image or txt files) gets treated as 200 OK and gets passed to our CGI?

There are NO pages at all we want to treat as errors. Everything should be passed as 200 OK to our CGI, regardless of what page is requested. Thank you.

Best wishes,

- Jeff

g1smd

9:57 pm on Mar 26, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. Everything should NOT be treated as a 200 OK.

At the point of the rewrite, no status code is sent out. You're simply passing control to the script without returning anything to the browser.

Only after the script has decided whether there is content to be returned for this request or not, should the 200 OK or 404 Not Found status be actually sent - by the script.

I don't use CGI, so I'm not 100% sure how you hook the two things together. There's certainly plenty of examples in this forum, of how you do it with PHP. Perhaps jd can say whether it is similar to that.

rescueme

1:20 am on Mar 27, 2009 (gmt 0)

10+ Year Member



I know this isn't the "right" way to do this, but for our application it doesn't have to be the right way, it just needs to work, and be the easiest way.

Given this mindset, is there a simple way to...

Replace a string within an Apache CONFIG file somewhere, or within some sort of data fork of the Apache program, so that the string "404 Not Found" never comes up, even for error pages, and instead, only the string "200 OK" comes up?

g1smd

1:39 am on Mar 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No. And if there was, your site would eat itself in the SERPs with infinite duplicate content - so that's yet another wrong way.

phranque

1:04 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you can use mod_rewrite to check for existing resources and serve them as usual and then rewrite the other requests to your cgi script.
your cgi script can then examine the request and serve the appropriate response.
if the requested url could serve as the canonical url for the content you are serving then you could respond with a 200 OK http header and then some content.
otherwise you could respond with a 301 Moved Permanently http header and the url of the canonical url for the content you want to serve.
or you might respond with a 404 Not Found if you don't have appropriate content for the requested url.
you don't want to serve a 200 response and the same content to thousands of random, meaningless urls.
there are many other options:
HTTP/1.1: Status Code Definitions [w3.org]

rescueme

4:16 pm on Mar 28, 2009 (gmt 0)

10+ Year Member



Thanks for the suggestion to use mod_rewrite, phranque.

I read a bit on that, and am not sure I understand quite how to do that. Is there a way for mod_rewrite to just pass along all requested URLs as-is to the CGI, but in doing so, change the status to 200 OK?

g1smd

5:36 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's the CGI that needs to send the header out if it is anything other than 200 OK for the requested URL.

I suggested using a rewrite to connect the URL request to the script. Such a rewrite would be coded using a RewriteRule (found under Mod_Rewrite in the manual).

rescueme

5:53 pm on Mar 28, 2009 (gmt 0)

10+ Year Member



I guess the idea is that if I use mod_rewrite to pass the URL along to the CGI, even perhaps with the URL unchanged, the fact that is passing it along will cause Apache to treat it as a 200 OK when passing it along to the CGI...?

g1smd

6:09 pm on Mar 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the HTML and content that is output by the script will be prefixed with the '200 OK' header unless the script forces the '404 Not Found' status code into the header as its first action.

The '404' status code needs to be forced when for the URL that was requested, the script does not have any valid content to be returned.

.

This is different to your current arrangement, in one key way. Your current version returns the '404' status code to the browser before the script even gets to run, and that is not at all good.