Forum Moderators: phranque

Message Too Old, No Replies

Rewrite/redirect conflicts for multiple domains

Getting stuck in an infinite loop

         

jenstechs

6:27 pm on Aug 28, 2009 (gmt 0)

10+ Year Member



Hi,

I am trying to set up multiple domains on my host, each domain's files being in a subfolder. But I don't want direct access to those subfolders to work. For now, the htaccess file at the root works, but I'm still able to get to the files at the subfolder, and that's what I cannot get to work without running into the "too many redirects" problem.

The Setup
www.site1.com has its files located at www.site1.com/site1/. URLs pointing to www.site1.com/page.php are successfully rewritten to point to www.site1.com/site1/page.php with the URL NOT showing the site1 folder - this is correct.

Problem
Files located at www.site1.com/site1/page.php can still be accessed like that. I would like www.site1.com/site1/page.php to be redirected (the URL to change) to www.site1.com/page.php, but the code I've tried draws me to an infinite loop of redirects.

The code I tried was that a few posts below by Jim, with the 'example.com' site - that works in my root to successfully point things to the subfolder, but I don't know how to point things away from the subfolder.

I am a relative n00b at htaccess/rewrite - I've read a LOT about it but I don't fully understand it yet. Thanks for your help, everybody!

jdMorgan

12:49 am on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The simplest solution is to rename the "site" subdirectories so that they all start with a common name, for example /sites/site1/ or /s_site1/. This greatly simplifies the site-to-subdirectory rewriting and makes it quite efficient. It also avoids problems should you decide to go with extensionless pages in the future, by avoiding 'collisions' between extensionless URLs... Say for example, example.com/site1 mapped to example.com/site1/php or something.

It also makes fixing your current problem easy, in that any request for this common name can be detected and externally redirected back to the correct domain root.

So the whole "package" is a domain to sites-subdirectory-path internal rewrite, and a client-request-for-sites-subdirectory-URL-path back to site-root-domain-URL-path external redirect.

The only real trick is preventing an infinite rewrite-redirect loop. This can be done in several ways, but one method is to examine the server variable %{THE_REQUEST}. Only in the case where this variable indicates that the request for /sites/x is coming direct from the client, and is not the result of previously executing your internal rewrite, do you want to redirect.

You will probably also want to remove FQDN-format trailing periods and port numbers. Otherwise, the requests will be rewritten to non-existent subdirectories. I also strongly suggest that you standardize on www or non-www hostnames, unless you want to support two different subdirectories per domain, one for www and another for non-www. If you do standardize, then enforce that standardization by 301-redirecting non-canonical hostname requests to the canonical domains.

The following code maps arbitrary hostnames to same-named subdirectories of the /sites/ subdirectory, and redirects direct client requests for that subdirectory back to the appropriate domain.


# Redirect direct client requests for /sites/ subdirectories back to domain
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /sites/[^/\ ]+/[^\ ]*\ HTTP/
RewriteRule ^sites/([^/]+/.+)$ http://$1 [R=301,L]
#
# Redirect to remove trailing period from FQDN-format hostnames and remove port numbers if present
RewriteCond %{HTTP_HOST} ^([^.:]+(\.[^.:]+)+)(\.¦\.?:[0-9]+)$
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]
#
# Rewrite hostname requests to appropriate /sites/ subdirectory
RewriteCond $1 !^sites/
RewriteCond %{HTTP_HOST} ^([^.:]+(\.[^.:]+)+)$
RewriteRule ^(.*)$ /sites/%1/$1 [L]

Replace the broken pipe "¦" character with a solid pipe before use; Posting on this forum modifies the pipe characters.

Jim

[edited by: jdMorgan at 1:52 am (utc) on Aug. 29, 2009]

jenstechs

1:27 am on Aug 29, 2009 (gmt 0)

10+ Year Member



Jim, thank you so much. I am only going to host 2 or 3 sites on it, so I don't mind keeping different sets of rules for each site - in fact, doing that may make it easier on me, so I can understand it better. But I do see the benefit of keeping everything in a sites/domain.com/ subfolder, and I would like to go with optionally-extension-less sites, if only I knew how!

Can you help me understand what the rules do? I know a little bit about regular expressions but these are a little confusing.

1. If THE_REQUEST is an actual http request from the client (and not part of the loop), prefixed by anything alpha plus sites plus (things I assume are part of the full request path), then rewrite it to the sites/[domain name] and treat the domain name as the redirect.

2. I understand the concept of this one, I don't get the execution. If the HTTP_HOST contains dots and port numbers.. then do what? Why %1 and then $1?

3. If (what is $1 here?) does not start with sites and if the host has periods and ports.... then... send the request to sites/[domain name without the ports]/fileinput

Once I set that up in the root, do I need anything in the subfolders to block access? May I set up other rules in the subfolders, say, when needing to redirect a certain old filename to new filename, and set error documents?

jdMorgan

1:53 am on Aug 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In all cases, $1 through $9 refer to the parts of the request matching the first through ninth parenthesized sub-patterns in the RewriteRule pattern, and %1 through %9 refer to the parts of the request matching the first through ninth parenthesized sub-patterns in the last-matched RewriteCond.

Because the RewriteRule pattern must match before any RewriteConds are executed, $1 in the thord rule above is the localized URL-path examined by the RewriteRule.

THE_REQUEST is the entire request line as recieved from the client, and is exactly what you see logged in your raw server access log, e.g.

GET /sites/site1/page.html HTTP/1.1

Any period or port number appended to the HTTP_HOST gets dropped, although there was a bug in that rule (now fixed to prevent propagation of bad code).

.htaccess is a per-directory config file, and you may use as many as you like.

Parentheses in regular-expressions patterns can be nested. To determine back-reference numbers (i.e. $1-$9 or %1-%9), count left parentheses.

Jim

jenstechs

1:45 am on Sep 8, 2009 (gmt 0)

10+ Year Member



Following up on this (had to focus on another project last week)...

I am still confused. :/

I don't particularly want to do the "sites/" directory right now. At least, not until I'm comfortable with it. I don't mind, for now, writing two sets of these rules while I learn it. But I don't know what to change.

I have two problems.

1. /site1/ access still works
I can still get to the site while typing in site1.com/site1/(*). I want it to redirect any user-generated requests back up to site1.com/$1, but when I do that, I get an infinite loop.

2. local site redirects don't redirect properly
If I go to site1.com/page.html, it redirects me to site1.com/site1/page.php. Obviously, I do not want this to happen. :/ The similar thing is happening with my 404 page - it points to site1.com/site1/missing.php instead of just site1.com/missing.php

This is what I have for the htaccess file in ROOT.


Options +FollowSymLinks
RewriteEngine on
RewriteBase /
##try with something from webmasterworld forums##
# Externally redirect direct client requests (only) for URL
# <any-domain.com>/example/<anything> to URL www.example.com/<anything>
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /site1/[^\ ]+\ HTTP/
RewriteRule ^site1/(.*)$ http://www.site1.org/$1 [R=301,L]
#
# Externally redirect any requested hostname which contains "example.com" but is
# not *exactly* "www.example.com/<anything>" to URL www.example.com/<anything>
RewriteCond %{HTTP_HOST} site1\.org [NC]
RewriteCond %{HTTP_HOST} !^www\.site1\.org$
RewriteRule ^(.*)$ http://www.site1.org/$1 [R=301,L]
#
# Internally rewrite add-on domain requests to subdirectories
RewriteCond %{HTTP_HOST} ^www\.site1.org$
RewriteCond %{REQUEST_URI} !^/site1/
RewriteRule ^(.*)$ /site1/$1 [L]

This is what I have for the htaccess file in SITE1 subdirectory.


Options +FollowSymLinks
RewriteEngine on
RewriteBase /site1/
RewriteRule ^duedates\.html$ calendar.php [R=301,L]
RewriteRule (.*)\.html $1.php [R=301,L]
ErrorDocument 404 /missing.php

[edited by: jdMorgan at 4:20 pm (utc) on Sep. 8, 2009]
[edit reason] de-linked domains [/edit]

jdMorgan

3:55 pm on Sep 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, get rid of the redirects to simplify testing, and then just use an internal rewrite:

# Internally rewrite requests for site1 subomain to /site1 subdirectory unless already done
RewriteCond %{HTTP_HOST} ^www\.site1\.org$
RewriteCond $1 !^site1/
RewriteRule ^(.*)$ /site1/$1 [L]

Once you get that working, then you can work on adding the redirect that prevents clients from requesting main-site.com/site1/<anything> directly. Divide and conquer.

Jim

jenstechs

6:55 pm on Sep 8, 2009 (gmt 0)

10+ Year Member



Right.. that part works. :) I've got that in my code already, in the root. Before that, I have code to force it to go to www, too. If I DIDN'T want to force www, would this work?


RewriteCond %{HTTP_HOST} ^(www\.)?site1.org$
....

So that problem is done. Now I want to conquer the direct access problem of site1.org/site1/(.*) - I had this working at one point and I forget what code I used.

The question I also have is - to prevent direct access, do I put it in the ROOT htaccess file or in the "site1" folder?

jenstechs

7:25 pm on Sep 8, 2009 (gmt 0)

10+ Year Member



Update - here's what I have gotten to work.

The RewriteCond host site1.org RewriteRule /site1/ works fine. The code to force www works fine.

I added on a second domain. In THAT subfolder, I have the following code.


RewriteEngine On
RewriteCond %{HTTP_HOST} ^(www.)?site1.org$ [NC]
RewriteCond %{REQUEST_URI} ^/site2/(.*)$
RewriteRule (.*) / [R=301,L]

This sends anything of site1.org/site2/ back to site1.org, no questions asked. :)

However, if I put the SAME CODE in the /site1/ folder, it puts me in an infinite loop and crashes.

This is what crashes:


RewriteCond %{HTTP_HOST} ^(www\.)?site1.org$ [NC]
RewriteCond %{REQUEST_URI} ^/site1/(.*)$
RewriteRule ^(.*)$ http://www.site1.org/$1 [R=301,L]

I think it crashes because, in that local site1/htaccess file, I set the RewriteBase to be /site1/. This is so the local redirects (/missing.php, *.html to *.php, etc) will work. But that might be why the rewrite gets into an infinite loop - even though I send it a complete, external redirect with the last rule? [R=301,L]

jdMorgan

4:08 am on Sep 9, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To prevent the loop (which is the result of your new external redirect countermanding the previously-working internal rewrite), use the format shown/suggested above, examining THE_REQUEST to confirm that the /site1 subdirectory path is being requested directly by the client, and not as a result of the internal rewrite.

When using [R=301,L], I strongly suggest that you provide a full URL.

Jim