Forum Moderators: phranque

Message Too Old, No Replies

Path/URL sent by Apache when error pages are handled by CGI.

Need help! Incorrect Path & URL are being passed from Apache to a CGI.

         

rescueme

1:41 am on Jan 22, 2009 (gmt 0)

10+ Year Member



Hello,

I just transferred a large website to an Apache server, and am relatively new to using Apache. We have all of our error pages (and regular pages, for that matter) handled by a custom CGI program called NetCloak. This CGI worked great on our old web server, but it seems to be running into trouble with error pages sent to it by Apache. I think something must be configured wrong within that part of Apache which sends out the header and URL for error pages sent to a CGI.

When a page requested doesn't exist on our server, thus generating an error, it seems Apache is sending conflicting URL and PATH arguments to the CGI which we have set to handle error pages. For example, this web page doesn't exist at all on our server, so it generates an error:

http://www.example.com/testfolder1/testpage.html

If you visit the page above, I have the CGI set to display on that page the URL which is being passed to it by Apache, as well as the complete HEADER which is being passed to it by Apache.

You will notice that the URL is shown as just "/testfolder1" however, I would expect that the URL should be instead "/testfolder1/testpage.html".

Further, if you look within the header being passed to the CGI, you will notice that the PATH_INFO also appears incorrectly, showing only "/testpage.html" when I think it should probably be showing "testfolder1/testpage.html" as well.

Because these items don't match up, and each only contains just part of the URL/PATH, our CGI is handling these error pages erratically, even crashing at times.

Is there a certain file or files within Apache which handle how data is passed to a CGI, and if so, where are these files, and how might I fix the configuration so that our CGI receives the proper information and can handle error pages properly.

THANK YOU!

- Jeff

[edited by: jdMorgan at 2:24 am (utc) on Jan. 22, 2009]
[edit reason] Please use example.com. See TOS. [/edit]

Caterham

2:25 am on Jan 22, 2009 (gmt 0)

10+ Year Member



If the path segment /testfolder1/ does not exists, r->filename would be /physical/path/to/testfolder1 and path_info is the unconsumed part, which is /testpage.html

notice that the PATH_INFO also appears incorrectly, showing only "/testpage.html" when I think it should probably be showing "testfolder1/testpage.html" as well.
No, the first non-existent segment of a physical path is treated as a file, there is no path_info for a folder, as you'd suggest with path_info testfolder1/testpage.html while r->filename would be /physical/path/to/

You are evaluating the wrong variable; as you can see, REQUEST_URI: /testfolder1/testpage.html contains the desired path, you're not looking for the physical web-view SCRIPT_NAME: /testfolder1 (according to your description), but it looks like you're using it.

[edited by: Caterham at 2:27 am (utc) on Jan. 22, 2009]

jdMorgan

2:32 am on Jan 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi rescueme, and welcome to WebmasterWorld!

Please provide all necessary information inside your post, rather than referring members to a page which may change or go missing in a few months or years. That will keep this thread useful to others who will read it in the future.

I would guess that you're struggling with differences in the definitions of server variables between your old and new servers, since your description of the "wrong information" in PATH_INFO actually sounds like the "right information" to me (as an Apache user).

So the solution apparently calls for a review of the CGI script to modify how it uses server variables, and which ones it uses. Short of re-coding a custom version of Apache and compiling it to re-define the server variables to appear as your script expects them, I can't think of a better solution.

Jim

rescueme

3:17 am on Jan 22, 2009 (gmt 0)

10+ Year Member



Thanks for the replies. For the archive, I've pasted to the end of this post what comes up on our server for the non-existent folder and non-existent page requested, mentioned earlier, substituting "example.com" for our actual domain name. What is shown at the end of this post is what comes up when I ask our CGI to display the URL passed to it by Apache as well as the HEADER passed to it by Apache.

The CGI we are using is discontinued, but we are stuck with it for a while since we have tens of thousands of pages programmed with it already. The CGI works fine in every other respect, except for this issue of properly handling these error pages.

Since I can't reprogram the CGI, and it is automatically taking in these requests, I need to find some way to slightly alter what is being sent to the CGI when a page which does not exist is requested.

Due to the way this CGI handles these requests, certain requests can even cause the CGI to crash, which is why I need to find some way to intercept and slightly change what is being sent to the CGI from Apache.

I imagine there is probably some sort of statement or statements I can put into some of the config files to alter what exactly Apache sends to the CGI for the URL and Header info. when a missing page is requested, I just don't know enough about this yet to know what to do.

Any suggestions where to look, and what parameters to experiment with, would be very appreciated.

----------

Here is what is coming to our CGI from Apache, when the non-existent page is requested from this non-existent folder:

http://www.example.com/testfolder1/testpage.html

According to Apache, THISURL is: /testfolder1

According to Apache, the HEADER is: UNIQUE_ID: SXff3X8AAAEAAAViu3IAAAAB HTTP_USER_AGENT: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1 HTTP_ACCEPT: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 HTTP_ACCEPT_LANGUAGE: en-us HTTP_ACCEPT_ENCODING: gzip, deflate HTTP_CONNECTION: keep-alive HTTP_HOST: www.example.com PATH: /bin:/sbin:/usr/bin:/usr/sbin:/usr/libexec:/System/Library/CoreServices SERVER_SIGNATURE: SERVER_SOFTWARE: Apache/2.2.9 (iTools 9.0.2)/Mac OS X) mod_ssl/2.2.9 OpenSSL/0.9.7l DAV/2 mod_fastcgi/2.4.2 SERVER_NAME: www.example.com SERVER_ADDR: 69.**.175.242 SERVER_PORT: 80 REMOTE_ADDR: 69.55.175.243 DOCUMENT_ROOT: /Library/Tenon/WebServer/WebSites/example.com SERVER_ADMIN: default@server.admin SCRIPT_FILENAME: /Library/Tenon/WebServer/WebSites/example.com/testfolder1 REMOTE_PORT: 48862 GATEWAY_INTERFACE: CGI/1.1 SERVER_PROTOCOL: HTTP/1.1 REQUEST_METHOD: GET QUERY_STRING: REQUEST_URI: /testfolder1/testpage.html SCRIPT_NAME: /testfolder1 PATH_INFO: /testpage.html PATH_TRANSLATED: /Library/Tenon/WebServer/WebSites/example.com/testpage.html

[edited by: jdMorgan at 3:54 am (utc) on Jan. 22, 2009]
[edit reason] Obscured IP address in headers [/edit]

jdMorgan

3:59 am on Jan 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apache doesn't "send" anything. It simply sets the values of the variables based on the current request and server context. Apache modules and 'user' scripts can then access these globally-available variables.

If these variable definitions don't suit your needs, you'll need to either modify your script, modify and re-compile Apache, or move back to whatever server you were on when your script worked. That's not the answer you want, but I'm afraid it describes the situation accurately.

Jim

Caterham

11:35 am on Jan 22, 2009 (gmt 0)

10+ Year Member



Modifying the code looks quite simple here

Index: server/util_script.c
===================================================================
--- server/util_script.c (revision 723997)
+++ server/util_script.c (working copy)
@@ -355,7 +355,7 @@
apr_table_setn(e, "PATH_INFO", r->path_info);
}
}
- else if (!r->path_info ¦¦ !*r->path_info) {
+ else if (r->status != HTTP_OK ¦¦ !r->path_info ¦¦ !*r->path_info) {
apr_table_setn(e, "SCRIPT_NAME", r->uri);
}
else {

since the forum software modifies a character and eats spaces, the unix tool

patch -u <patch.patch

to apply a patch might not work. That's why I posted the patch at [rafb.net ] too. Testing is on your own, of course. The behavior hasn't changed since apache 1.2b5, Jan 1997.

Caterham

1:52 pm on Jan 22, 2009 (gmt 0)

10+ Year Member



BTW: I doubt that we're talking about a 404 from the request processing prior the content handler, because otherwise your handler couldn't be invoked with that ENV output. The status code is likely HTTP_OK so you may want to change
!=
into
==
.

Or are you using a hook prior the content handler, which sets r->status but returns OK instead of HTTP_NOT_FOUND (i.e. your handler is invoked and not ap_die)?