Forum Moderators: phranque

Message Too Old, No Replies

Rewrite conflict

removing file extension on domains re-directed to sub-directories

         

mrbinary

3:16 am on Jun 21, 2007 (gmt 0)

10+ Year Member



I'm trying to accomplish the following with rewrites:
1. Aim many top-level domains to associated sub-directories on one server
2. Remove the need for the .php extension in URLs

I've been trying to use some variation of the following three code segments, but when the trailing slash rule is in place it always cancels the php extension rule regardless of where it is in the code or how it's been implemented. Without the trailing slash rule, URLs are automatically redirected (due to mod_dir I assume, which I have no control over on my server) from http://example.com/directory to http://example.com/example.com/directory/. I've tried dozens of different variations of these rules, and I'm out of ideas on how to get them working together.


## Add trailing slash on directories
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^/*(.+/)?([^.]*[^/])$ http://%{HTTP_HOST}/$1$2/ [L,R=301]


## Remove the need for .php extension
RewriteCond %{REQUEST_FILENAME}!-d
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteCond %{REQUEST_URI}!/$
RewriteRule ^(.*)$ $1.php [L]


# Aim all domains from /htdocs to /htdocs/example.com/
RewriteCond %{REQUEST_URI}!example1.com/
RewriteCond %{REQUEST_URI}!example2.com/
RewriteCond %{REQUEST_URI}!example3.com/
RewriteRule ^(.*)$ /%{HTTP_HOST}/$1 [L]

jdMorgan

4:09 am on Jun 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Other than the "\.php" text-string in the second rule's RewriteCond, I don't see exactly what the problem might be, but the code is overly-complex and has too many unconditional filesystem checks in it to be 'good'. So, I'd suggest trying a simpler, optimized version -- maybe it will help, and if so, should be much faster:

## Add missing trailing slash to directory URLs
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule (.*) http://%{HTTP_HOST}/$1/ [R=301,L]

# Map all domains into /htdocs/<domain>/ subdirectories
# (user variable subdirectoryRewriteDone prevents a loop & eliminates
# the need to check for each possible domain subdirectory)
RewriteCond %{ENV:SDrDone} !^Yes$ [NC]
RewriteRule (.*) %{HTTP_HOST}/$1 [E=SDrDone:Yes]

## Map extensionless URL requests to .php files
RewriteCond %{REQUEST_URI} !/$
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.*) $1.php [L]


Note that the domain-to-subdirectory logic has been changed. It creates its own 'lockout' flag to prevent looping. This method eliminates the need to check to make sure you haven't already rewritten to each domain subdirectory -- an approach which works, but is hard to maintain and inefficient if you have a lot of domains or add new ones.

I also changed the extensionless-URL-to-php-file rewrite. It now gives priority to existing php files, instead of to existing extensionless files. That's because a 'real' extensionless file isn't likely to be useful when served via HTTP (It can't be assigned a valid MIME-type, for example). And checking for a trailing slash eliminates the need to check extensionless files as existing directories -- If they existed, then the first rule would have already added a slash to them and redirected. Therefore, there's no need to check for 'directory exists' for a second time.

The purpose of these changes is to eliminate, remove, or add conditions to as many 'file-exists' and 'directory-exists' checks as possible. 'Exists' calls to the operating system's filesystem manager are relatively slow --especially if the actual disk has to be accessed, and should be avoided or optimized whenever possible. 'Exists' checks should always be done last in a 'stack' of RewriteConds to avoid doing them if any preceding condition fails.

I changed the rule order as well. The order shown assumes that each domain has its own php scripts, located in that domain's subdirectory -- If that's the case, I believe you'll want to do both rewrites sequentially before ending mod_rewrite processing.

If that is not the case, then reverse the order of the last two rules, and put the [L] flag back on the domain-to-subdirectory rewrite rule.

I also suggest you look into canonicalizing your domains, so that you don't have www.example.com and example.com mapped to two different subdirectories...

Jim

mrbinary

5:31 am on Jun 21, 2007 (gmt 0)

10+ Year Member



Thanks for the great breakdown, I understand it much better now! Unfortunately the new domain mapping rule produces an internal server error. My host is running Apache 1.3 if this is relevant. I've never worked with environment variables before, so is there anything more I need to do to get it working?

One other problem - the trailing slash rule still exposes the sub-directory that the domain is mapped to (http://example.com/example.com/directory/).
Strangely, if I comment out

RewriteCond %{REQUEST_FILENAME}/ -d
then the sub-directory is properly hidden. Why would this be?

Also, canonicalization of my domains is next on my list. Thanks for the reminder.

jdMorgan

5:55 am on Jun 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On the trailing slash issue -- Go with what works! I just dashed this off on the keyboard from memory, and didn't test it...

[added]
Try omitting the trailing slash on the RewriteCond test-string:

 RewriteCond %{REQUEST_FILENAME} -d 

You do need to test whether the directory exists, so that's a likely fix.
[/added]

On the internal server error, please post the relevant error info from your server error log. I use an almost-identical piece of code on several of my servers to map subdomains to subdirectories, so it would be helpful to see what error you're getting.

Another approach is to put all your domains' files into second-level subdirectories, such as "/sites/<domain>/<files>. Then the RewriteCond can simply check for "sites/" in the URL-path to prevent a rewrite loop. However, the lockout flag method is better because it allows for any domain or subdomain name to be used without a false match, *and* allows for first-level domain-subdirectories.

Anyway, post the error log entry for that 500 error if possible.

Jim

[edited by: jdMorgan at 5:58 am (utc) on June 21, 2007]

sc112

8:08 pm on Jun 21, 2007 (gmt 0)

10+ Year Member



Need L flag on this rule?

RewriteRule (.*) %{HTTP_HOST}/$1 [E=SDrDone:Yes]

mrbinary

2:05 am on Jun 22, 2007 (gmt 0)

10+ Year Member



Oh boy... I think the server is somehow misconfigured, or perhaps intentionally crippled. Environment variables work only some of the time, and I have no access to the error log. Hopefully my host will respond to my support request, but they've made it clear that with htaccess issues you're on your own.

I've discovered that as soon as a RewriteCond starts checking for directories then all of my redirects stop working as expected. The redirects still occur, but once it has finished going through all of the rules it automatically shoehorns in the subdirectory at the beginning of the URL.

Luckily I've found a partial solution:

RewriteCond %{REQUEST_URI}!/$
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule [b]^example.com/?[/b](.*) http://%{HTTP_HOST}/$1/ [R=301,L]

This hides the sub-directory while allowing my php extension rule to work, but unfortunately the rule fails if I use %{HTTP_HOST} in place of the hard-coded domain. Is there a solution I'm not seeing?

jdMorgan

4:06 am on Jun 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Need L flag on this rule?

RewriteRule (.*) %{HTTP_HOST}/$1 [E=SDrDone:Yes]

No, not necessarily. Please see last two paragraphs of my initial code post.

Jim

jdMorgan

4:16 am on Jun 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> shoehorns in the subdirectory...

Please provide an example request URL and the resulting "shoehorned" rewritten URL-path, so we can be crystal-clear on this point.

For the HTTP_HOST problem, another way to code that would be:


RewriteCond %{REQUEST_URI} !/$
RewriteCond %{REQUEST_FILENAME} -d
RewriteCond %{HTTP_HOST} (.+)
RewriteRule ^example.com/?(.*) http://%1/$1/ [R=301,L]

but there's no reason the original would fail and this modified code would work on a normal server.

Jim

mrbinary

7:26 am on Jun 22, 2007 (gmt 0)

10+ Year Member



To make certain that the redirect is actually happening I've run some of my tests with gibberish at the end of the substitution string. Here's the result of one of these tests:

Code:

RewriteCond %{REQUEST_URI}!/$
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://%{HTTP_HOST}/$1/xyz [R=301,L]

URL:
http://example.com/test
Result:
http://example.com/example.com/test/xyz

Code:

RewriteCond %{REQUEST_URI}!/$
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^example.com/?(.*) http://%{HTTP_HOST}/$1/xyz [R=301,L]

URL:
http://example.com/test
Result:
http://example.com/test/xyz

mrbinary

7:45 am on Jun 22, 2007 (gmt 0)

10+ Year Member



For the HTTP_HOST problem, another way to code that would be:
...
RewriteRule ^example.com/?(.*) http://%1/$1/ [R=301,L]

Oops, my post was completely ambiguous. I was referring to the pattern rather than the substitution. This is what I first tried:

RewriteRule ^[b]%{HTTP_HOST}[/b]/?(.*) http://%{HTTP_HOST}/$1/ [R=301,L]

This only seemed to work when I created an explicit rule for each of my domains, like this:

RewriteRule ^[b]example.com[/b]/?(.*) http://%{HTTP_HOST}/$1/ [R=301,L]

I'm probably missing something simple (variables can't be used in the pattern?)

jdMorgan

2:52 pm on Jun 22, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, variables definitely cannot be used in a pattern.

Unfortunately, I still can't spot any clue as to what might be wrong with this setup, and need to ponder it more when I get more time. Have a look at the RewriteBase directive, and see if that might be applicable to your situation. Also, it may be possible to move some of this code into .htaccess files in subdirectories, if that might simplify it.

Jim

mrbinary

5:54 pm on Jun 22, 2007 (gmt 0)

10+ Year Member



Thanks for all the help! I'm glad to know I'm not crazy at least. I'll try my best to get support from my host, but I doubt it will happen. In the meantime I'm going to place the following .htaccess in all of my sub-directories:

RewriteEngine On
RewriteBase /
## Add missing trailing slash to directory URLs
RewriteCond %{REQUEST_URI}!/$
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule (.*) http://%{HTTP_HOST}/$1/ [R=301,L]

Does this seem appropriate, barring a more elegant solution?

jdMorgan

10:27 pm on Jun 23, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't know -- If it works, it works.

There's apparently something odd about your configuration, and I can't make any sense of it.

Jim