Forum Moderators: phranque
First time poster, long-time lurker. I have a question for the experts. I would like my page to use canonical URLs; I accomplish this using mod_rewrite in the following manner (simplified example):
RewriteRule subject[/](.*)[/]? /subject.php?var=$1 [NC,L]
So as it is,
www.example.com/subject/whatever www.example.com/subject.php?var=whatever My question is, how can I (thru Apache magic, or perhaps otherwise), get rid of access directly to the non-canonical URL version? In other words, can I make Apache serve, say, a 404 whenever users go for
www.example.com/subject.php directly? I think the answer lies with a
RewriteCond using %{REQUEST_URI} and/or %{IS_SUBREQ}, but I haven't succeeded so far. Thanks for your time!
Cheers, WK.
Welcome to WebmasterWorld!
(as a poster)
There's no good direct way to do this. The problem is that in an .htaccess context, the URL-to-filename translation process is restarted as soon as a URL is rewritten. That is, processing re-starts "at the top" in httpd.conf whenever a rewrite is done. Therefore, the internal subrequest test doesn't work as expected.
As a result, any attempt to rewrite A to B, but to disallow or redirect direct access to B will fail, and result in an "infinite loop" of rewriting.
The solution is to start by renaming B -- say we call it 'C' for now. Then rewrite A to C, and let direct requests for the old B URL fail with a 404.
So a solution would be:
1) Rename subject.php to topic.php
2) Rewrite the static local URL-path /subject/<anything> to the dynamic local URL-path /topic.php?var=<anything>
3) Let direct requests for the old /subject.php path go 404
With a few tweaks to remove unnecessary stuff and improve performance, that gives:
RewriteRule ^subject/([^/]*)/?$ /topic.php?var=$1 [L]
Jim
Thanks for the reply!
There's no good direct way to do this. The problem is that in an .htaccess context, the URL-to-filename translation process is restarted as soon as a URL is rewritten. That is, processing re-starts "at the top" in httpd.conf whenever a rewrite is done. Therefore, the internal subrequest test doesn't work as expected.
Yes, this is really depressing. Is there ANY difference at all between the original and the rewrite pass (Other than the rewritten URL)?
Now, I tried the solution, but, if I understood it correctly, it doesn't solve my problem entirely. Given what you said,
topic.php would still be accessible to all. Right? If this is the case, then it's not exactly what I meant. What I mean is, is there a way to block any given page from being accessed directly via it's proper URL, but allow the it to be served when using a (alternate) rewritten URL.
So using the same example (asuming the REAL page is still
topic.php): www.example.com/subject/whatever => works. www.example.com/subject.php => real 404. www.example.com/topic.php => faked 404. That's why I asked if there's any difference between the original request and the Rewritten request. Like you mentioned,
%{IS_SUBREQ} doesn't work, for some strange reason. AFAIK, it should be a sub-request. I have also tried passing an environmental variable via
[E=var:value], but to no avail. It seems this variable is not made available to the next request (the rewritten one). Or am I missing something? I have also considered adding a special string to the URL, and then checking against it, but that means a hacker could still insert that string and access the page. Unless the string is really complicated (meaning, a password almost), in which case it's equivalent to the following: I could name the real pages with some really complicated string (e.g. a GUID), so that it's virtually impossible to "guess" it. But I'd rather not use either of those methods.
Finally, I'm thinking perhaps those "backreference" variables (
%1, $1, \1, and so forth), could be of help. However, I don't yet fully understand them. If anyone has any ideas, it'd be great. So I think the question comes down to being able to differenciate between the original
.htaccess pass and the rewritten one. There's just gotta be a way. Either thru env. variables, or backreference, or... something. Can anyone think of anything? Alright, I hope I remained coherent thru my explanation. Once again, thanks a lot for the help.
Cheers, WK.
I just wanted to mention I found an answer to this particular little dilemma. To reiterate the problem:
1) You have a real page called
www.example.com/secret.php. www.example.com/secret/. www.example.com/secret.php. What does this accomplish? It hides the very existence of any page in the first place. The page can only be accessed thru the specific canonical URL you desire. It's a "virtual" page of sorts.
The solution:
One of Apache's variables made available thru mod_rewrite is
%{THE_REQUEST}. This is the ONLY variable (AFAIK) that remains the same as the very original request. It does NOT get updated whenever a URL gets rewritten. So we extract our request information from there. The result is the following:
RewriteCond %{THE_REQUEST} "^(GET¦HEAD) (.*)[/](.*) HTTP[/][0-9][.][0-9]$"
RewriteCond %3 "^secret[.]php$"
RewriteRule ^secret[.]php$ /nonexistent.php [L]
RewriteRule ^secret[/]?$ /secret.php [L]
The first line gets the true page request and loads it into
%3. The second and third lines redirect a direct page request for
/secret.php to /nonexistent.php, which would presumably result in a 404 (what we want). The fourth line lets an "indirect" request coming thru
/secret/ (your desired canonical URL) to go thru to /secret.php. (Of course, any other HTTP method, like POST, should be added in case you use it).
Isn't that neat? Is that cool, or what? You can hide all your pages and allow them to be accessed only in the way YOU want. They are separated from their filenames!
Any comments from the experts? Does anyone see any fatal errors resulting from this method? ;)
Cheers! :)
--WK.
You can shorten it up quite a bit, though:
RewriteCond %{THE_REQUEST} ^(GET¦HEAD)\ /secret\.php\ HTTP
RewriteRule ^secret\.php$ /nonexistent.php [L]
RewriteRule ^public/?$ /secret.php [L]
This could be further modified to "hide" all php files, too:
RewriteCond %{THE_REQUEST} ^(GET¦HEAD)\ /.+\.php\ HTTP
RewriteRule \.php$ - [F]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)/?$ /$1.php [L]
I'll have to jot this one down in my notebook... Very good "thinking out of the box" here, WK.
Jim