Welcome to WebmasterWorld Guest from 34.229.24.100

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

mod_rewrite for printable page not working

rule matches, but then gets 404

     
9:28 pm on Jun 26, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 19, 2005
posts:44
votes: 0


I have a CGI script at /cgi-bin/print.pl that I use to create a printable version of any page on my site. It works fine with a URL like: .../cgi-bin/print.pl?p=/somedir/thispage.html

Now, I want to make the URL much simpler: .../p/somedir/thispage.html. In other words, I want my users to know that if they simply insert a "/p" before any URI, it will give them the printable version. I do not want it redirected; the address bar must remain "/p/...".

I've already got a bunch of rewrites working just fine, so I know the module's working. I think I'm just missing something on this one. I've tried this:


RewriteCond %{REQUEST_URI} ^/p/.*$
RewriteRule ^.*$ http://%{SERVER_NAME}/cgi-bin/print.pl?p=%{REQUEST_URI} [L]

I've read through Apache's mod_rewrite stuff that I've had bookmarked for years. I also read through jdMorgan's post from Apr 20. It was quite informative, and I tried this based on some syntax I found there:

RewriteRule ^p(/.+)$ /cgi-bin/print.pl?p=$1 [L]

Nothing I'm trying works. When trying my first example above, I get the following in my rewrite_log (level 9):

(NOTE: I've removed the first part of each line and substituted "MYHOSTNAME" where appropriate so as not to identify my web site or IP address. I've also changed a couple directory names accordingly to protect the innocent.)


(4) RewriteCond: input='/p/fye/schedule.html' pattern='^/p/.*$' => matched
(2) rewrite /p/fye/schedule.html -> http://MYHOSTNAME/cgi-bin/print.pl?p=/p/fye/schedule.html
(3) split uri=http://MYHOSTNAME/cgi-bin/print.pl?p=/p/fye/schedule.html -> uri=http://MYHOSTNAME/cgi-bin/print.pl, args=p=/p/fye/schedule.html
(3) reduce http://MYHOSTNAME/cgi-bin/print.pl -> /cgi-bin/print.pl
(2) local path result: /cgi-bin/print.pl
(2) prefixed with document_root to /usr9/website/htdocs/cgi-bin/print.pl
(1) go-ahead with /usr9/website/htdocs/cgi-bin/print.pl [OK]
(2) init rewrite engine with requested uri /cgi-bin/error.pl

The good thing is that it's definitely matching, and it's rewriting the URL. For testing, I copied and pasted the rewritten URL just to make sure it's correct with no typos. Works fine.

But something else is happening and I'm not sure what/why. Some how it got rewritten to my "error.pl" script which is my ErrorDocument for 401, 403 and 404.

My error_log says this:


File does not exist: /usr9/website/htdocs/cgi-bin/print.pl

So it seems that it's now looking for an actual file; and the path is actually the real path to my script. What am I missing here?

[edited by: jdMorgan at 9:38 pm (utc) on June 26, 2006]
[edit reason] Repaired formatting [/edit]

9:46 pm on June 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


RewriteRule ^p(/.+)$ /cgi-bin/print.pl?p=$1 [L]

That should work. Your original code invokes an external redirect, which would 'expose' the direct path to your cgi script -- not generally desirable.

File does not exist: /usr9/website/htdocs/cgi-bin/print.pl

So it seems that it's now looking for an actual file; and the path is actually the real path to my script. What am I missing here?


Yes, it's looking to invoke the print.pl file, if that's what you mean. However, if that is the real path to your script, then why does the server respond with a 404? Take a look at your error log to see if there's any more info in there. Because of Alias or ScriptAlias directives, that may not be the 'actual' path to the cgi-bin files, it may only be a HTTP alias path. You'll need to investigate and find the actual filepath and use it, perhaps with "RewriteBase" because an internal rewrite comes after mod_alias processing and so may not be subject to the (necessary) path changes that it makes.

This problem is probably not a 'how' problem -- your code looks right. This is more likely a 'where' problem, in that files are not where the mod_rewrite code thinks they are.

Jim

10:21 pm on June 26, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 19, 2005
posts:44
votes: 0


Thanks Jim. I'm getting close. You're right about the undesirable external redirect. I'm back to using sample syntax from your 4/20 document:

RewriteRule ^/p(/.+)$ /cgi-bin/print.pl?p=$1 [L]

The URL I'm requesting is http://www.example.com/p/fye/schedule.html

I'm expecting it to run http://www.example.com/cgi-bin/print.pl?p=/fye/schedule.html

I'm still getting a 404 error, but I see now what it's doing. Here's my rewrite_log (with paths changed to protect the innocent):


(2) init rewrite engine with requested uri /p/fye/schedule.html
(3) applying pattern '^/p(/.+)$' to uri '/p/fye/schedule.html'
(2) rewrite /p/fye/schedule.html -> /cgi-bin/print.pl?p=/fye/schedule.html
(3) split uri=/cgi-bin/print.pl?p=/fye/schedule.html -> uri=/cgi-bin/print.pl, args=p=/fye/schedule.html
(2) local path result: /cgi-bin/print.pl
(2) prefixed with document_root to /usr9/website/htdocs/cgi-bin/print.pl
(1) go-ahead with /usr9/website/htdocs/cgi-bin/print.pl [OK]

It's pre-pending my DOCUMENT_ROOT, but I don't want that. The URI that it created (see line 3 of the above log) is absolutely correct. But my print.pl script is not on my filesystem under DOCUMENT_ROOT. It's located elsewhere per my ScriptAlias directive.

I'm getting close eh?

10:45 pm on June 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Well, as stated, you may be able to use RewriteBase to get by this. Or perhaps a SymLink. Or remove the ScriptAlias. I'm sorry I can't give you 'the one true answer' because "All servers are different"(TM) and you can only muddle about until you find the magic combination that works on your server.

With mod_rewrite doing an internal rewrite, you're basically stuck with DocumentRoot, so using a SymLink may be the best solution.

Jim

5:46 am on June 27, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Sept 19, 2005
posts:44
votes: 0


Thanks Jim. I was hoping that perhaps you'd know of a way to rewrite it to a URI request and not specifically a full path file.

It seems that I'm just trying to make mod_rewrite do something it can't; and it's not necessarily a limitation so much as it is the way my server is setup and my filesystem is laid out.

I'll mess around with it some more. Maybe I'll just actually create a '/p/' directory at the top level of my DOCUMENT_ROOT and be done with it. :) Hmmm.

6:18 am on June 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, that's the way it works, you can internally rewrite to a filepath, externally redirect to a URL, force a proxy through-put, and do a few other things with mod_rewrite. But the rewrite-filepath and redirect-URL relationships are fixed by definition, since a redirect ends the current HTTP request and tells the client to start a new one with the new URL given in the redirect response, while a rewrite just 'maps' a requested URL to a different filepath within the context of the current HTTP request.

And that's why your alias worked with the redirect syntax -- The redirect response causes the client to start a new HTTP request using the new URL, and that new request enters your server 'at the top' and invokes the ScriptAlias, whereas this won't happen with an internal rewrite. Unfortunately, a redirect exposes your script by changing the URL in the browser address bar, which is undesireable.

Please post if you find a good solution!

Jim