Forum Moderators: phranque

Message Too Old, No Replies

Hiding mod_rewrite destination URLs.

Question about using mod_rewrite with .htaccess on Apache.

         

WitchKing

1:54 pm on Nov 25, 2004 (gmt 0)

10+ Year Member



Hi there,

First time poster, long-time lurker. I have a question for the experts. I would like my page to use canonical URLs; I accomplish this using mod_rewrite in the following manner (simplified example):


RewriteRule subject[/](.*)[/]? /subject.php?var=$1 [NC,L]

So as it is,

www.example.com/subject/whatever

becomes
www.example.com/subject.php?var=whatever

My question is, how can I (thru Apache magic, or perhaps otherwise), get rid of access directly to the non-canonical URL version? In other words, can I make Apache serve, say, a 404 whenever users go for

www.example.com/subject.php
directly?

I think the answer lies with a

RewriteCond
using
%{REQUEST_URI}
and/or
%{IS_SUBREQ}
, but I haven't succeeded so far.

Thanks for your time!
Cheers, WK.

jdMorgan

4:45 pm on Nov 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



WK,

Welcome to WebmasterWorld!
(as a poster)

There's no good direct way to do this. The problem is that in an .htaccess context, the URL-to-filename translation process is restarted as soon as a URL is rewritten. That is, processing re-starts "at the top" in httpd.conf whenever a rewrite is done. Therefore, the internal subrequest test doesn't work as expected.

As a result, any attempt to rewrite A to B, but to disallow or redirect direct access to B will fail, and result in an "infinite loop" of rewriting.

The solution is to start by renaming B -- say we call it 'C' for now. Then rewrite A to C, and let direct requests for the old B URL fail with a 404.

So a solution would be:

1) Rename subject.php to topic.php
2) Rewrite the static local URL-path /subject/<anything> to the dynamic local URL-path /topic.php?var=<anything>
3) Let direct requests for the old /subject.php path go 404

With a few tweaks to remove unnecessary stuff and improve performance, that gives:


RewriteRule ^subject/([^/]*)/?$ /topic.php?var=$1 [L]

You should add the [NC] flag back in if it really is possible for your script to output links with case variations in the URL, but otherwise it's not needed. The use of the regex pattern ".*" should be avoided whenever possible because it is the most ambiguous and slowest possible pattern to process. Your patterns should be start- and end-anchored whenever possible -- again to improve performance, but also to reduce the possibility of unexpected operation by reducing abiguity. For more information on anchoring, see the Regular Expressions tutorial cited in our forum charter (link in upper left area of this page).

Jim

WitchKing

8:45 pm on Nov 25, 2004 (gmt 0)

10+ Year Member



Hi there,

Thanks for the reply!

There's no good direct way to do this. The problem is that in an .htaccess context, the URL-to-filename translation process is restarted as soon as a URL is rewritten. That is, processing re-starts "at the top" in httpd.conf whenever a rewrite is done. Therefore, the internal subrequest test doesn't work as expected.

Yes, this is really depressing. Is there ANY difference at all between the original and the rewrite pass (Other than the rewritten URL)?

Now, I tried the solution, but, if I understood it correctly, it doesn't solve my problem entirely. Given what you said,

topic.php
would still be accessible to all. Right? If this is the case, then it's not exactly what I meant.

What I mean is, is there a way to block any given page from being accessed directly via it's proper URL, but allow the it to be served when using a (alternate) rewritten URL.

So using the same example (asuming the REAL page is still

topic.php
):
www.example.com/subject/whatever
=> works.
www.example.com/subject.php
=> real 404.
www.example.com/topic.php
=> faked 404.

That's why I asked if there's any difference between the original request and the Rewritten request. Like you mentioned,

%{IS_SUBREQ}
doesn't work, for some strange reason. AFAIK, it should be a sub-request.

I have also tried passing an environmental variable via

[E=var:value]
, but to no avail. It seems this variable is not made available to the next request (the rewritten one). Or am I missing something?

I have also considered adding a special string to the URL, and then checking against it, but that means a hacker could still insert that string and access the page. Unless the string is really complicated (meaning, a password almost), in which case it's equivalent to the following: I could name the real pages with some really complicated string (e.g. a GUID), so that it's virtually impossible to "guess" it. But I'd rather not use either of those methods.

Finally, I'm thinking perhaps those "backreference" variables (

%1
,
$1
,
\1
, and so forth), could be of help. However, I don't yet fully understand them. If anyone has any ideas, it'd be great.

So I think the question comes down to being able to differenciate between the original

.htaccess
pass and the rewritten one. There's just gotta be a way. Either thru env. variables, or backreference, or... something. Can anyone think of anything?

Alright, I hope I remained coherent thru my explanation. Once again, thanks a lot for the help.
Cheers, WK.

WitchKing

3:52 am on Nov 27, 2004 (gmt 0)

10+ Year Member



Hi there,

I just wanted to mention I found an answer to this particular little dilemma. To reiterate the problem:

1) You have a real page called

www.example.com/secret.php
.
2) You want that page to load when the user requests
www.example.com/secret/
.
3) You DON'T want that page to load when the user requests
www.example.com/secret.php
.

What does this accomplish? It hides the very existence of any page in the first place. The page can only be accessed thru the specific canonical URL you desire. It's a "virtual" page of sorts.

The solution:
One of Apache's variables made available thru mod_rewrite is

%{THE_REQUEST}
. This is the ONLY variable (AFAIK) that remains the same as the very original request. It does NOT get updated whenever a URL gets rewritten. So we extract our request information from there. The result is the following:

RewriteCond %{THE_REQUEST} "^(GET¦HEAD) (.*)[/](.*) HTTP[/][0-9][.][0-9]$"
RewriteCond %3 "^secret[.]php$"
RewriteRule ^secret[.]php$ /nonexistent.php [L]
RewriteRule ^secret[/]?$ /secret.php [L]

The first line gets the true page request and loads it into

%3
.

The second and third lines redirect a direct page request for

/secret.php
to
/nonexistent.php
, which would presumably result in a 404 (what we want).

The fourth line lets an "indirect" request coming thru

/secret/
(your desired canonical URL) to go thru to
/secret.php
.

(Of course, any other HTTP method, like POST, should be added in case you use it).

Isn't that neat? Is that cool, or what? You can hide all your pages and allow them to be accessed only in the way YOU want. They are separated from their filenames!

Any comments from the experts? Does anyone see any fatal errors resulting from this method? ;)

Cheers! :)
--WK.

jdMorgan

4:13 am on Nov 27, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No fatal problems that I see... Very good.

You can shorten it up quite a bit, though:


RewriteCond %{THE_REQUEST} ^(GET¦HEAD)\ /secret\.php\ HTTP
RewriteRule ^secret\.php$ /nonexistent.php [L]
RewriteRule ^public/?$ /secret.php [L]

(I changed the local URL-path in the pattern to "public" to make it clearer what's going on here.)

This could be further modified to "hide" all php files, too:


RewriteCond %{THE_REQUEST} ^(GET¦HEAD)\ /.+\.php\ HTTP
RewriteRule \.php$ - [F]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)/?$ /$1.php [L]

(The "file exists" check is used to prevent directory requests from being rewritten. This also shows the method used to generate a 403-Forbidden response, instead of rewriting to "nonexistent.php".)

I'll have to jot this one down in my notebook... Very good "thinking out of the box" here, WK.

Jim

WitchKing

4:34 am on Nov 27, 2004 (gmt 0)

10+ Year Member



That's awesome! The generalized rule is even better, I hadn't thought of that!

Anyway, just giving back a bit to WebmasterWorld. I've gotten a ton of good info from here, I hope somebody finds this useful too.

Cheers :),
--WK.