Forum Moderators: phranque

Message Too Old, No Replies

Manipulating a URL request

         

Nautilus

11:02 pm on Apr 22, 2004 (gmt 0)

10+ Year Member



Details would take too much space here. In brief ....
I want to send all requests for my webpages (.html , .htm , maybe .cgi?) first to a perl script that allows/denies access by IP,HOST,and NET BLOCK range defined in a text db I manually update. I want more than the simple blocking ability that I have been able to get from .htaccess . Logging to a custom security log , email notices when visitors are denied as a result of a filter definition , the gears are churning .... Maybe even interacting the script to save denied accesses to .htaccess to further add blocking levels in certain situations. I have written a good filter script in perl to perform those functions but can't get the interaction I need between the request and the delivery of a webpage. From reading other posts here it seems that the mod-rewrite might be the trickery necessary to pull this off! I'd appreciate any thoughts! My setup is virtual hosted with CGI , shell access , main domain name with static IP , lots of domains URL forwarded to my account directories. I control DNS through a third-party.

My knowledge of using Apache modules is beginner. I can script Perl , and work with .htaccess at this time.
I get the feeling from other posts that in addition to the mod-rewrite , maybe making a module out of this routine might be the better way to go?

I have been able to make this whole scenario work using not so great methods such as :

1) with Meta-Refresh from HTML page to filter CGI script and hard-coding page_urls as queries. With this a raw webpage could be opened stripped of the Meta Refresh and redelivered to the visitor. This does not seem practical estimating what the load increase would be to deliver all pages this way.

2) Using Location and a Script SRC javascript routine.
I haven't tried this but I am pretty sure I can get this to work. I really don't want to rely on anything javascript for true operation of denying access to visitors! It doesn't seem secure or reliable enough!

Any thoughts , pointers , or code would be greatly appreciated! Thanks!

anchordesk

12:14 am on Apr 23, 2004 (gmt 0)

10+ Year Member



One way is to set up your web site without many/any static pages and send all visitors to a, in this case, single script. Then you can check, inspect, whatever about the visitor and their page request before including the proper file for the user to see.

In your .htaccess file, make sure the folloiwing is included:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^[0-9a-zA-Z_/.-]+\.html /control.pl

Then, if the requesting file does not actually exist, the user will be directed to run the script where you can determine what file they want and dynamically or otherwise feed it to them ... after you sniffed them out. As writen, it will only direct nonexistent .html file requests, but you could expand that to whatever you want. And the user will never know the file does not exist ... it appears as a static file when your server returns whatever to the user.

[edited by: jdMorgan at 1:09 am (utc) on April 23, 2004]
[edit reason] Fixed missing space in code [/edit]

Nautilus

7:24 pm on Apr 23, 2004 (gmt 0)

10+ Year Member



Thanks for the reply JD! I have jotted down the code you provided as an alternative to making this work. In your code I am assuming this would grab all incoming .html requests and redirect them to the REQUESTED_FILE. How is the requested file being determined though? Would a query still need to be sent to the script to determine the page that should be returned? One concern would be how would this affect search engine listings if everything references just a script?

The flow I would like to have would be URL REQUEST FOR WEBPAGE to the CGI FILTER SCRIPT to the HTML PAGE. That way the CGI Filter Script was never the front-end but the "middleman". Right now , the filter script I wrote is essentially just a pass-through designed to either allow a visitor to pass-through the script or get trapped by filter definition, denied access , and returned a custom error message. I can also choose to send a email notice to myself on trigger. Honestly , the best way to do this , which was my original intent would have been for this filter to run through a pagelogger script I already call from an IMG SRC tag on each page. But of course , being inside an IMG tag I cannot escape it in order for the Location command to return an HTML error page. I just get a missing image icon on the HTML page. Tough stuff I'll tell you!

In your experience with modules , mod-rewrite , etc. Do you think it is possible to create something that will function in the flow I described above? A simple yes or no will suffice , I am game to do some Apache reading and figure out how! Thanks.

jdMorgan

8:17 pm on Apr 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nautilus,

Welcome to WebmasterWorld [webmasterworld.com]!

I edited anchordesk's reply, I didn't write it!

mod_rewrite gets the requested filename in the same way that a server-side script does -- by examing the {REQUEST_FILENAME} server variable. This variable is populated for each HTTP request with the name of the filename requested by the client. This information is sent by the client (browser, spider, etc.) in the HTTP request it sends to the server.

If you need to put the requested filename into a query string, that's easy enough. The tough part here is defining precisely what you want or need to do; After that, coding it is usually fairly easy.

Jim

Nautilus

1:32 am on Apr 24, 2004 (gmt 0)

10+ Year Member



The {REQUEST_FILENAME} server variable , huh? Curiously , are there any other methods (Perl , JS , PHP , etc.) of accessing this {REQUEST_FILENAME} server variable other than through .shtml or a mod-rewrite? I have heard using all .shtml pages puts a strain on the web server. Any comment on the differences of load strain between using .shtml or using a mod-rewrite? Thanks!

jdMorgan

2:16 am on Apr 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



REQUEST_FILENAME (along with a lot of other variables) is available automatically, in one form or another, to Apache, SSI, PERL, PHP, and any other server-side scripting language I can think of. Support is built-in.

For example, in PERL:


$reqfile = $ENV{'REQUEST_FILENAME'};

Jim

Nautilus

6:38 am on Apr 24, 2004 (gmt 0)

10+ Year Member



I am beginning to think my web host is running an old version of perl or has features turned off?

I cannot get my web server to return any data for the:

$reqfile = $ENV{'REQUEST_FILENAME'};

Tried link-clicking to script and calling it by direct url , but no luck. Is there a certain Perl version that this ENV variable became available?