Forum Moderators: phranque

Message Too Old, No Replies

cgi-bin script-alias dir, and mod rewrite

It's all gassed up and I'm holding a match.

         

Brak

3:50 pm on Mar 10, 2007 (gmt 0)

10+ Year Member



Ok, so... I've been banging my head against the wall for about 11 hours on 1 rewriterule. I've been reading this forum on this specific topic for about 9. I'm having trouble with a specific case of rewriting. I'm running Apache 2.0.55 but I'll be upgrading to that 2.0.59, or 2.2.3 I read about on here A.SA.P. But for now this is what I've got.

I am trying to (for lack of a better word) "cloak" this file browser perl script I wrote. It lives at /cgi-bin/fm.pl (A ScriptAlias directory). I'd love for it to be able to redirect to /browser/ (a non-existant location) then rewrite part of the query-string into the /browser%1 URL so that it looks like you are just browsing inside a folder and don't ever see "/cgi-bin/fm.pl?go=/whateverurl/". What I need help with (aside from possibly a whole rewrite of my rules) is my redirection from /cgi-bin/fm.pl to /browser

Here's the entire relevant section in my .htaccess file in my Document Root.


RewriteEngine On
RewriteBase /
RewriteCond %{QUERY_STRING} ^go\=(.*)
RewriteRule ^cgi-bin/fm\.pl$ /browser%1 [R]
RewriteRule ^browser(.*)$ /cgi-bin/fm.pl?go=$1 [L,NE]

Here's what I've learned:
----
#1. The last line works perfectly as expected (except it appends the query string to the end of the line, as would be expected with QSA on, but QSA isn't on at all...) :(

#2. Line 4 is the one that is giving me my headache and I can't for the life of me figure out what the problem could be. It seems that the RegEx section of the rule never catches anything. I've even tried setting it to be the only active rule, and disabling the condition but still no luck. Here's the weird part. If I take out the directory / (so the url looks like "/cgi-binfm.pl" which doesn't really exist) and take it out in the browser, it works perfectly as expected. I read about, on here, sometimes these rules work only on a per-directory basis. I tested this theory by rewriting the url to an actual directory like /web/dynamic/links.shtml and sending it to /wow/ and that worked perfectly even though it's directory levels deep into the server. The only thing left I can think it could be is that it's not working because it's a ScriptAlias directory and isn't matched like other real directories. I tried putting a modified .htaccess file with updated URLs in my cgi-bin folder, but no luck there either.

The ultimate goal is to get the following to work.
[servername.com...] -> [servername.com...] [With URL/Browser Redirect]
[servername.com...] -> [servername.com...] [Do not redirect, and "cloak" the CGI's true URL]

This way, OLD urls to the script will be automatically forwarded to the new url, while NEW urls are secretly passed to the script.

Notes:
- My hours of testing haven't been in vein because I have found lots of things that don't work, and lots of things that are close but no cigar and do work.
- I have also learned from reading the first line of all responses on this forum is that query strings aren't matched by the RewriteRule.
- I run my own server, RewriteLog is enabled (presently)
- The perl file *must* stay in the cgi-bin directory.
- The perl script will always have a query string.
- The query string will only consist of 1 name and 1 value. Name->"go", Value->"/some/url/"

Thanks so very much in advance. I really appreciate the help.

jdMorgan

4:38 pm on Mar 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Rewrite target URL is the "real" URL-path or filepath, while the pattern must match the URL-path that was requested by the client -- the link that the visitor clicked on, and what he sees in his browser address bar.

So, it appears that the main problem is conceptual -- Simply put, it appears that you are trying to rewrite in "the wrong direction."

This is the correct format for a RewriteRule used to do an internal URL-path rewrite:

RewriteRule ^pattern-matching-requested-URL-path$ /local-server-path-to-real-file-object [L]

So, in this case, the pattern should match "browser/<something>" and the substitution URL should point to your local script (alias) path. ScriptAlias will then detect that local script path, and deliver the request to your actual cgi-bin directory.

Jim

jdMorgan

4:52 pm on Mar 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I got this thread and another one crossed in my head, so this may be more on-target:

RewriteEngine on
RewriteBase /
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /cgi-bin/fm\.pl\?go=([^&]+)\ HTTP/
RewriteRule ^cgi-bin/fm\.pl$ http://www.example.com/browser%1? [R=301,L]
#
RewriteRule ^browser((/[^/]+)+)/?$ /cgi-bin/fm.pl?go=$1 [L]

You need to look at THE_REQUEST to get the original browser request. This is needed to prevent these two rules from interacting, resulting in an 'endless' rewrite-redirect-rewrite-redirect loop. Using THE_REQUEST prevents the first rule from redirecting requests that have already been rewritten by the second rule, since this rewriting will not (and can not) change the value of THE_REQUEST.

As written, this code also omits the slashes on "go=/dynamic/" since there is no need for them and they are ugly.

Jim

Brak

5:10 am on Mar 12, 2007 (gmt 0)

10+ Year Member



I really appreciate the response Jim.

I just tested out the code you posted, with the replacement of example.com with my domain and it still didn't catch the redirect to /browser/. It loaded the URL just as if the rule hadn't been there at all. Is my apache possessed?

Rule 2 (/browser/whatever/ -> /cgi-bin/fm.pl) now also behaves funny. It seems to only want to work when some extra "[^/]+/" exists. Ex: if i were loading the url "/browser/games/Quake/", it would only send the "?go=/games/" to the script because of the ((/[^/]+)+)/ trailing + and / so to actually get the Quake/ sub-dir of games/ you need to add some extra stuff to satisfy the RegEx, like "/browser/games/Quake/x/'.

I was thinking in my original rule writing that since the first conditional rule was lacking the L switch, that it would simply continue rewriting the URL until it got to an L or reached the end of the file. Though I remember reading on here that only 1 rule is ever applied to a single request. If so, What's the point of the L switch?

If it's not too much trouble, I'd love to understand your solution a little more. For the sake of learning, it looks like the condition text has similar syntax to a rule, except with escaped white-spaces. If the explanation is already written out somewhere, you can just let me know so you don't have to waste time typing it out again.

jdMorgan

4:05 pm on Mar 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I cannot explain the results you are seeing in the context of *only* the code above. Perhaps you have some other directives that are interfering with this code.

The rule, as written, requires only one occurrance of </one-or-more-characters-not-a-slash> in the requested url-path "/browser</one-or-more-characters-not-a-slash><optional-trailing-slash>", as specified by the "+" quantifier on the outside of "(/[^/]+)+" -- It will accept one or more sequences of "</one-or-more-characters-not-a-slash> but it requires only a minimum of one. By enclosing that in an "outside" side of parentheses, we take any/all of those matched sequences and store them in the variable "$1".

So, there's something else going on with rule #2, and I cannot spot it, so it is likely outside the context of the code we're discussing.

The most likely question about the RewriteCond (I assume you've read the mod_rewrite documentation many times) is the form of the variable THE_REQUEST. This is the entire request header sent by the client (e.g. browser) and looks something like this:

GET /somepage.php?var1=foo&var2=bar HTTP/1.1

So, the given pattern matches the HTTP method (GET, HEAD, POST, etc.) followed by a space, then the local URL-path, then the query string (if present), a space, and then the HTTP protocol, HTTP/1.1, HTTP/1.0, etc. We use this variable because it is unaffected by any internal rewrites that may have taken place -- It is alwyas the original browser request header.

[L] stops processing for *this pass* through the mod_rewrite code within the current HTTP request. However, if a rewrite is invoked, then the server will re-process all the mod_rewrite code, and [L] does not stop that. If an external redirect is invoked, then that ends the current HTTP transaction, and the client will (usually) begin another one. So in either case, [L] only ends process for the current pass through the code. For the sake of efficiency, however, I use [L] on every rule where it is not implicit unless I have a reason not to use it. Some functions, such as [G] and [F] imply [L] as well, so including it with them is redundant.

Because of the limited-scope function of [L], it is necessary to explicitly prevent rewrite/redirect loops in mod_rewrite code in .htaccess files.

Jim

Brak

8:12 pm on Mar 12, 2007 (gmt 0)

10+ Year Member



Those 5 lines are the only lines in my .htaccess file. Enable rewrite, Base, Condition, Rule 1, Rule 2.

I cannot explain why it doesn't work either. That's what's got me so distressed. There are no rewrite rules in my apache conf either. The only thing I can assume is that it's because of some ScriptAlias directory oversight in the server code (for my version). This .htaccess file living at DocumentRoot is the only .htaccess file in the entire DocumentRoot directory tree; as well as none in any alias directory.

It appears that literally no rule that refers to anything inside /cgi-bin/ will work as expected so far. I notice nothing in the apache docs CHANGES_2.0 file that addresses Rewrite, except the vulnerability corrected in 2.0.59.

Do you have any other ideas of what I could try?

Again, I really appreciate your wisdom and assistance, Jim. Thanks

jdMorgan

8:29 pm on Mar 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If your cgi-bin directory is Aliased -- located outside the HTTP-accessible server space, then .htaccess can have no effect on it -- This is an intentional security-related restriction, and is also implied by the very name of the .HTaccess file -- HTTP-access.

If you wish to pursue this further in that light (security), then you can make a cgi directory --called say, cgi-local-- and either copy your scripts into that directory or symlink it to cgi-bin. Then refer to your scripts as if they reside in cgi-local, and mod_rewrite will work on them. Alternately, create a 'virtual' directory, refer to that when accessing scripts, and then rewrite that to cgi-bin. Because this directory doesn't actually exist, it won't be 'aliased-away' before mod_rewrite gets ahold of it. The downside to this approach is that it works only on HTTP accesses; Server-side includes of those scripts will still need to use the 'real' path.

Jim