Forum Moderators: phranque

Message Too Old, No Replies

Combining .htaccess functionality (chaining?)

magically convert domain.com to www.domain.com using .htaccess

         

kbthomas

1:47 am on Jan 6, 2009 (gmt 0)

10+ Year Member



I need to combine the following functionality in my .htaccess file:

RewriteCond %{HTTP_HOST}!^www\.domain\.com [NC]
RewriteRule ^(.*) [domain.com...] [R]

...and...

# Rewrite /index.php?uri=method/action to /method/action
RewriteRule (.*)$ /index.php?uri=$1 [L]

All the combination I have tried has led me to 500 errors and infinite redirect loops. To those mod-redirect gurus out there, thanks in advance for the lesson.

jdMorgan

6:53 am on Jan 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your second rule doesn't exactly do what the comment says it does, and will loop recursively with or without the first rule's functionality.

And besides which, you certainly should not "combine" them anyway, since a redirect is required for the first function, while an internal rewrite is required for the second. Doing otherwise would 'expose' your internal script path.

It isn't clear what the meaning or scope of "method/action" is, so it is difficult to suggest much. If you mean to accept *any* "subdirectory path" as a "method" and *any* "filename" as an "action" as in "example.com/<any-method-string-value>/<any-action-string-value>", then the second rule would be:

 RewriteRule ^([^/]+/[^/]+)$ /index.php?uri=$1 [L] 

but the first time a browser comes looking for the customary /w3c/p3p.xml file, you'd better be ready to serve it a proper privacy policy from your script as "method=w3c&action=p3p.xml" ...

If you can be more specific about the acceptable values for these two parameters, that would be good. For example. if they always follow some letter/number format, or if they never contain certain characters, then you can be more specific about URLs to be rewritten vs. those that should not. For example, as I coded the pattern, there must be a slash between the method and the action, but neither the method string nor the action string can contain a slash. You should "tighten" that pattern more, say by disallowing anything but letters, numbers or hyphens in these strings (or some other restriction of the character-set so you don't have collisions with 'real' files).

The fall-back (if that approach won't work) is to check for "file exists" and only do the rewrite if no file exists when the requested URL is resolved to a filepath, but that's not a very efficient solution.

BTW, the likely reason that your second rule failed is that it included nothing to stop /index.php from being rewritten to itself in an "infinite" loop, which is probably what happened (check your server error log). The new rule I propose won't loop, simply because it requires the slash between the method and the action, and "index.php" does not contain a slash. So it won't match, and you won't get a loop.

The RewriteCond in your first rule needs a space between "}" and "!" and should *not* include the [NC] flag. The rule itself should use [R=301,L] if you want to get any benefit from using it. If you don't specify the 301, then it's a 302 by default, and search engines will keep the "wrong" domain in their database. If you don't specify the [L], then the second rule will be processed before the external redirect occurs, exposing your script path to the client (check this with the Live HTTP Headers add-on for Firefox/Mozilla). Neither of these are good things...

Jim

kbthomas

8:13 pm on Jan 6, 2009 (gmt 0)

10+ Year Member



Jim, thanks for the detailed response.

Here is my updated .htaccess file after reviewing your comments:


# Follow all symbolic links
Options +FollowSymlinks
# Mod Rewrite
RewriteEngine On
RewriteBase /
# Do not enable rewriting for files that exist
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite /index.php?uri=method/action to /method/action
RewriteRule (.*)$ /index.php?uri=$1
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule ^(.*) http://www.example.com/$1 [R=301,L]

When I pull up 'domain.com' in Firefox I am automatically redirected to www.domain.com - so that's perfect! Just wanted to ensure that I am not at risk for any security breaches using this new .htaccess file. Thanks again for the mod-rewrite lesson.

Kyle

[edited by: jdMorgan at 4:11 am (utc) on Jan. 8, 2009]
[edit reason] example.com [/edit]

jdMorgan

10:37 pm on Jan 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, I assume you don't care about your future server performance, and that there is absolutely no restriction on the valid values of the "uri=" parameter that your could use to reduce the two calls to the filesystem to check for file and directory exists *for every single page or object requested from your server*. If you really cannot narrow down the size of the URL-set to eliminate those wasteful calls, then I'd suggest the following, with redundancies removed, a few corrections, and the rules properly documented and in the proper order. I've added a RewriteCond to at least eliminate the wasteful filesystem check if we've already rewritten the request to the script:

# Enable SymLinks to allow mod_rewrite execution
Options +FollowSymlinks
#
# Enable the rewriting engine
RewriteEngine on
#
# Externally redirect to canonical hostname
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
# Rewrite URLs which do not resolve to existing files or directories to index.php script
RewriteCond $1 !^index\.php$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# internally rewrite *anything at all* to index.php?uri=
RewriteRule (.*) /index.php?uri=$1 [L]

Jim

[edited by: jdMorgan at 11:04 pm (utc) on Jan. 6, 2009]

kbthomas

3:52 am on Jan 8, 2009 (gmt 0)

10+ Year Member



Jim,

You are truly a mod-rewrite guru and I have learned a lot from you in my brief two days here. Thanks a ton for that. Here is my final question (for now; hopefully ;)):

How would I easily go about making ?uri=controller/method dissapear from the {QUERY_STRING}? It hasn't been an issue but cross-domain development (i.e. third-party requests containing a query string) is giving me issues due to my server still "seeing" ?uri as the first parameter even though it has been mod-rewritten.

jdMorgan

4:10 am on Jan 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm sorry, maybe I don't understand your question.

We've been discussing a rule that rewrites the URL example.com/controller/method to the internal server filepath /index.php?uri=controller/method, and now you're asking about making it (or something) disappear?

Backing up a step, you should modify the script(s) that produces your HTML pages, so that all links 'seen' by the client (browser) are output in the example.com/controller/method format. When one of those links is clicked and the browser requests that URL from your server, mod_rewrite will deliver that request to your index.php script with a "GET string" of uri=controller/method.

Understand that mod_rewrite works after a request for s URL arrives at your server, and before any content-handlers are invoked. It then changes the URL-to-filepath mapping from the default URL-to-filepath association. It cannot "change a link" on a Web page, if that's what you were expecting. Links on Web pages define URLs, server translate URLs to filepaths and respond with the requested content -- That is their fundamental job. So HTML on Web pages define URLs, and servers define filepaths.

As I said, maybe I'm not understanding your question; Posting more details and examples might help.

Jim

kbthomas

6:06 am on Jan 8, 2009 (gmt 0)

10+ Year Member



Jim,

I understand that all links should have the GET string of uri=controller/method (using the current scheme that is) and that .htaccess is invoked before PHP runs...

An example is utilizing the YouTube API- when authenticating a user, the user is directed to Google where he/she verifies that it is OK for our site to do some action on behalf of the associated YouTube account. I pass a 'next' parameter in the http request so Google can direct the user back to my site upon completion, appending a 'token' as the only query parameter such as:

[mysite.com...]

The problem is that Google expects my web server to handle the query parameters normally but even with a mod-rewritten URL, ?uri=controller/method is still "seen" as the first query parameter, rendering 'token' useless. If Google sent the token in [mysite.com...] it would work.

I hope that helps you understand my situation.

Kyle

jdMorgan

3:22 pm on Jan 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, I'm sorry but it doesn't. Please post in the following format:

"When a user types or clicks on an on-page link containing (or a search engine spider requests) the URL <insert full example.com URL and any query strings here>, I want the content to be delivered by the script <insert server file path here> with the query string <insert query string here>."

If you are having a problem, then follow that up with: "But currently, when a user types or clicks on an on-page link containing the URL <insert full URL and any query strings here>, the content is being delivered by the script <Insert file path here> with the query string <insert query string here>, or I'm getting a server error <describe error here>."

-and-

"So the problem is that <insert analysis of actual versus desired behavior>"

This whole process of using SEO-friendly URLs in on-page links and rewriting requests for those URLs to some other internal server filepath only makes sense if you start at the on-page URL and follow the process through from a click on that link to the delivery of the "next page" that results from that click. So solid examples of clicked URLs and actual server filepaths are the best way to go here.

I think perhaps you might be saying that any query string appended to one of your "friendly" URLs is now being dropped by the rewrite rule (which is true by design, since an appended query string disqualifies the URL as being "SEO friendly") and that you don't want that to happen. But if so, it's not clear that that is the case...

This quote is problematic:

I understand that all links should have the GET string of uri=controller/method (using the current scheme that is)

If you mean that right now, your on-page links have a query string appended to them, then this is wrong. Your on-page links must now be in the "example.com/controller/action" format, with no query string ("GET string") at all. The whole purpose of the internal rewrite we've been discussing is to invisibly convert a request for the "example.com/controller/action" URL into a script request in the "/index.php?uri=controller/action" format.

The code will not "change" the on-page URLs -- If they have not been updated to the "example.com/controller/action" format, then that must be done now before any of this will work. The link on your HTML page defines the URL, and having done that, it's too late to change it.

Not intending to pick at nits here, but it's best practice to clearly- and fully-define requirements before attempting to code a solution. Also, when I don't understand something, I say so: It saves time and prevents errors.

Jim

kbthomas

6:21 pm on Jan 8, 2009 (gmt 0)

10+ Year Member



Jim, can we chat via skype? My skype username is 'kb.thomas' if that is alright! Thanks for your time.

kbthomas

7:28 pm on Jan 8, 2009 (gmt 0)

10+ Year Member



"When a user types or clicks on an on-page link containing (or a search engine spider requests) the URL http://www.example.com?token=<string>, I want the content to be delivered by the script /var/www/framework/main_html/index.php with the query string = 'token=<string>'."

"But currently, when a user types or clicks on an on-page link containing the URL http://www.example.com/controller/method?token=<string>, the content is being delivered by the script /var/www/framework/main_html/index.php with the query string 'uri=controller/method'."

-and-

"So the problem is that my hidden ?uri=controller/method still "seen" by my webserver as the first query parameter."

g1smd

7:52 pm on Jan 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, you want the user to no longer see /controller/method/ appearing in the URL, and you want to go back to using a parameter in the URL; in this case, token=<something> too?

The confusion comes from you describing "/controller/method" (written exactly like that) as being a "parameter". In that format it isn't a URL parameter, even if it might be treated as one once the request gets right into the internals of your server (after the rewrite).

[edited by: g1smd at 7:56 pm (utc) on Jan. 8, 2009]

kbthomas

7:53 pm on Jan 8, 2009 (gmt 0)

10+ Year Member



yes!

kbthomas

8:00 pm on Jan 8, 2009 (gmt 0)

10+ Year Member



Actually, controller/method SHOULD be seen in the URL.

But there should be NO QUERY STRING in this new URL.

http://www.example.com?uri=controller/method => http://www.example.com/controller/method (with no hidden ?uri= query parameter so that the first query parameter can be defined as http://www.example.com/controller/method?myFirstNewQueryParam=value)...

g1smd

8:22 pm on Jan 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you persist in putting the domain name on everything here, we can't tell which are URLs (which DO have a domain name) and internal server filepaths (which do NOT have a domain name).

Please clarify, as per jd's question, as above.

That is example.com/some-path is a URL, as requested by a browser.

Once the request is being processed that request will be passed to the intenal filepath /somepath to get the content.

Don't think about stuff stored inside your server as being a URL. URLs are things out on the web. Once inside the server, it is only a path, and it does not have a domain name.

kbthomas

9:06 pm on Jan 8, 2009 (gmt 0)

10+ Year Member



OK,

I have managed to get closer to where I need to be. Here is my new .htaccess:

# Enable SymLinks to allow mod_rewrite execution
Options +FollowSymlinks

# Enable the rewriting engine
RewriteEngine on

# Externally redirect to canonical hostname
RewriteCond %{HTTP_HOST} !^www\.domain\.com$
RewriteRule (.*) [domain.com...] [R=301,L]

# Rewrite URLs which do not resolve to existing files or directories to index.php script
RewriteCond $1 !^index\.php$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

# internally rewrite *anything at all* to index.php
RewriteRule (.*) /index.php [L]

-----

The problem now is that any query string appended to my URL (e.g. domain.com?foo=bar) throws a 404 file not found error.

:)