Forum Moderators: phranque
I have multiple numeric sub domains that are listed in a rewrite map file like this:
me 100
you 101
...
him 297
I have vhost rewrite in httpd.conf like this:
UseCanonicalName off
RewriteEngine on
RewriteMap lowercase int:tolower
RewriteMap sub txt:/path/to/map.txt
RewriteCond %{HTTP_HOST} !^www\.demo.com [NC]
RewriteCond %{HTTP_HOST} ^(www.\.)?([a-z.-]+)\.demo.com [NC]
RewriteRule ^/(.*) /doc_root/sites/${sub:%2¦100}/${lowercase:$1} [L]
The above works. There are some files, of the same name, that may exist in two places such as /doc_root/bogus.html and /doc_root/sites/101/bogus.html.
What I wish to do is check for the file first in the ../sites/101 directory. If not there, I would like to get the file from /doc_root.
So, users will enter you.demo.com/bogus.html which will need to be mapped to /doc_root/sites/101/bogus.html. If not there, return /doc_root/bogus.html.
There are many numeric sub domains, so I do not want to write rules for them all.
I need to rewrite first. I thought maybe:
RewriteCond %{HTTP_HOST} !^www\.demo.com [NC]
RewriteCond %{HTTP_HOST} ^(www.\.)?([a-z.-]+)\.demo.com [NC]
RewriteRule ^/(.*) /doc_root/sites/${sub:%2¦100}/${lowercase:$1}
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ...
And that is where I am stumped. I am not sure how to rewrite here.
Any thoughts?
# Lowercase the Requested URL-path
RewriteRule ^/(.*) /{lowercase:$1}
#
# Rewrite if URL-path resolves to existing file in non-www subdomain filespace
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond %{HTTP_HOST} ^(www.\.)?([a-z.-]+)\.demo\.com [NC]
RewriteCond /doc_root/sites/${sub:%2¦100}/$1 -f
RewriteRule ^/(.*) /doc_root/sites/${sub:%2¦100}/$1 [L]
#
# Else rewrite if URL path resolves in default filespace
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond /doc_root/$1 -f
RewriteRule ^/(.*) /doc_root/$1 [L]
# Save off the lowercased requested URL-path
RewriteRule ^/(.*) - [E=SavLcURL:${lowercase:$1}]
#
# Rewrite non-www subdomain requests to subdomain's filespace if URL resolves to existing file there
# If not www subdomain
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
# get subdomain to %2
RewriteCond %{HTTP_HOST} ^(www.\.)?([a-z.-]+)\.demo\.com [NC]
# Rewrite using "sub" map, chain to next rule
RewriteRule . /doc_root/sites/${sub:%2¦100}/%{ENV:SavLcURL} [C]
#
# **Chained Rule** If previous rule invoked, check for file exists
RewriteCond /$1 -f
# Complete the rewrite & quit if file exists in subdomain-subdirectory
RewriteCond ^/(.*) - [L]
#
# Else restore lowercased requested URL and continue
Rewrite Rule . /%{ENV:SavLcURL}
#
# Rewrite non-www subdomain request to default filespace if URL resolves to existing file there.
# If not "www" subdomain
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
# If file exists in default path
RewriteCond /doc_root/$1 -f
# Rewrite using default path
RewriteRule ^/(.*) /doc_root/$1 [L]
#
You may not need the "exists check" in the second ruleset if you want to rewrite to the default filespace regardless of the file's existence in that default path.
There may be a slicker way to do this, but this post represents the first time I've considered this problem. It also made my head hurt a little, so beware of typos -- Hopefully, this will give you some ideas, though.
Jim
I have not used environment variables before, so I decided to build this step by step until no failure. If Apache could talk about restarts!
bogus.php reports its current directory and echoes the PHP $_SERVER variable. I move it around and rename it for testing.
Here are the responses:
www.demo.com/ 200
www.demo.com/bogus.php 200
www.demo.com/mybogus.php 404
me.demo.com/ 200
me.demo.com/mybogus.php 200
me.demo.com/bogus.php 200 (does not exist in this directory)
me.demo.com/yourbogus.php 404 (exists in you, not here or root)
Here are the settings:
Rewrite map:
me 1001
you 1002
..
him 2999
Relevant vhost container in httpd.conf:
UseCanonicalName off
RewriteEngine on
RewriteMap lowercase int:tolower
RewriteMap sub txt:/path/to/map
RewriteRule ^/(.*) - [E=SavLcURL:${lowercase:$1}]
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z.-]+)\.demo\.com [NC]
RewriteRule . /DOCUMENT_ROOT/websites/${sub:%2¦1000}/%{ENV:SavLcURL} [C]
RewriteCond /$1 -f
RewriteRule ^/(.*) - [L]
RewriteRule . /%{ENV:SavLcURL}
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond /DOCUMENT_ROOT/$1 -f
RewriteRule ^/(.*) /DOCUMENT_ROOT/$1 [L]
Works great. This also works very fast compared to the other methods I tried. I did not try the other proposal that you had. I will in the future. It would be interesting to know.
Thanks again!
Here is the failed response:
www.demo.com/some_directory/real.file 404
This one still works:
me.demo.com/some_directory/real.file (non existing directory or file) 200
I made two changes that corrected this problem. The tests from my previous post still work as intended.
Here are the settings:
Rewrite map:
me 1001
you 1002
..
him 2999
Relevant vhost container in httpd.conf:
UseCanonicalName off
RewriteEngine on
RewriteMap lowercase int:tolower
RewriteMap sub txt:/path/to/map
RewriteRule ^/(.*) - [E=SavLcURL:${lowercase:$1}]
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z.-]+)\.demo\.com [NC]
RewriteRule . /DOCUMENT_ROOT/websites/${sub:%2¦1000}/%{ENV:SavLcURL} [C]
RewriteCond /$1 -f
RewriteRule ^/(.*) - [L]
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC] <--- Change 1
RewriteRule . /%{ENV:SavLcURL} [C] <--- Change 2
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond /DOCUMENT_ROOT/$1 -f
RewriteRule ^/(.*) /DOCUMENT_ROOT/$1 [L]
Change 1 fixed the problem.
Change 2 was probably voodoo. However, it seemed that the reasoning of the first [C] above, as I understood, seemed to apply here, too.
Thanks again.
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC] <--- Change 1
RewriteRule . /%{ENV:SavLcURL} [C] <--- Change 2
RewriteCond /DOCUMENT_ROOT/$1 -f <--- Change 3: Redundant domain-check RewriteCond preceding this one has been removed.
RewriteRule ^/(.*) /DOCUMENT_ROOT/$1 [L]
Jim
[edited by: jdMorgan at 11:57 pm (utc) on Mar. 5, 2008]
I have found some more failures in my testing. However, these are very puzzling.
These directories exist only in /DocumentRoot and /DocumentRoot/administrator. Each directory has a copy of bogus.php in it, and I can walk the path up and down with no failures from either www or me.
[demo.com...] OK
[me.demo.com...] OK
[demo.com...] OK
[me.demo.com...] OK
Results are mixed for this path. The administrator directory exists as both /DocumentRoot/administrator and /DocumentRoot/websites/number/administrator. However, there are no subdirectories within /DocumentRoot/websites/number/administrator.
[demo.com...] OK
[me.demo.com...] OK
[demo.com...] OK
[me.demo.com...] NOT FOUND
I thought I was on to something with the en-GB name, but alas:
This directory tree exists only as /DocumentRoot/includes/js.
[demo.com...] OK
[me.demo.com...] OK
[demo.com...] OK
[me.demo.com...] NOT FOUND
As far as I understand the rewriting rules in previous posts, all requests to www domains are ignored. So it makes sense that existing paths are found.
At first I thought it had something to do with the en-GB, yet the other directory path fails, also. Next, I thought it was the depth, yet up to 6 non-existent paths work just fine.
Apache reports the correct paths for the URI in its 404 page. Requests for other files in the directories have the same responses. Ownership and permissions are correct.
Here is the return from Live HTTP headers:
[me.demo.com...]
GET /includes/js/ThemeOffice/bogus.php HTTP/1.1
Host: me.demo.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080207 Ubuntu/7.10 (gutsy) Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.7,pt-br;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Authorization: Basic d2VibWFzdGVyOmthbnNhc2NpdHk=
Cache-Control: max-age=0
HTTP/1.x 404 Not Found
Date: Thu, 06 Mar 2008 00:39:57 GMT
Server: Apache/2.2.4 (Ubuntu) PHP/5.2.3-1ubuntu6.3
Content-Length: 351
Keep-Alive: timeout=15, max=95
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
I am not at all sure that this is a rewrite problem. However, does anything stand out that I am missing here?
As implied above, the "file exists" checks are the Achilles heel of any code that uses them. This is because the filepath that is tested is 'invisible' to user and Webmaster alike. It often happens that the wrong path is being checked for "file exists" -- sometimes leading to bizarre behaviour.
There are several ways to 'reveal' the actual path: If you get a 404-error when you don't expect one, look at the filepath (as opposed to the URL) in the server error log. When you get a 200-OK instead of an expected 404, then things are tougher. You can temporarily change the code to do an external redirect as opposed to an internal rewrite, and then attach the file check path to the substitution URL as a 'fake' query_string. Or in your case, you could create another user-variable in your mod_rewrite code, and then 'echo' that variable from your "bogus.php" script to reveal it.
None of these methods are much fun, but if you can't be sure that the fiel-exists-check path is "correct by construction" then it's often worthwhile to test it -- If only for your own ability to sleep well at night. :)
One more point: We're testing only for files here, and not directories. If the URL you request resolves to a directory, it will not pass a "file exists" check, since it is not a file.
The directory-exists check flag is "-d". For example, adding it to the file-check in the first rule, you use:
[/code]
RewriteCond /$1 -f [OR]
RewriteCond /$1 -d
[/code]
Use the same approach for the second ruleset: Copy the file-check RewriteCond, add the [OR] flag, paste the copied RewriteCond below the original, change the "-f" to "-d".
Jim
Perusing my server logs led me to the need for checking with the -d flag. I just had not reported on it yet. I plan to after testing. I run through a test series after changes. I have DirectoryIndex set, but not Indexes. I use index.html and index.php. It is interesting that my Apache 2.2 server returns index.php with a '/somepath/' request but my Apache 1.3 server does not. I will be looking more into that.
I wrote OK instead of 200 and NOT FOUND instead of 404. The idea is to test for a file in a mapped subdirectory and return it if it exists, if not, then look for it in DocumentRoot. If found in DocumentRoot return it from there. If not found in DocumentRoot, 404 NOT FOUND is expected from Apache.
The lengthy list was typed into my web browser exactly as given except to change from my development domain name. I provided that list to show that I had tried to cover the bases. The directory series one/two/three/four/five/six exist under DocumentRoot and DocumentRoot/admininstrator. Each directory has the bogus.php in it. Each DocumentRoot/websites/number directory has an administrator directory. Both of these also have bogus.php in them. (There are a couple hundred numbered directories. me, you, and him just have bogus.php, others do real work.)
Anyway, thank you for the help, it was very useful.
Again, I'd recommend checking the 'correctness' of the 'file-exists' checking functions, as path errors in these functions can cause unexpected strange behaviour.
Also, you can end up going nuts if you don't remember to flush your browser cache between tests! :)
Jim
The reason I had added a lowercase internal function was to find subdomains in the rewrite map file if entered with the wrong case.
However, upon re-reading the manual, it seems that the NC flag should work.
The Apache documents state that the NC flag is applied to expansions on both the test and condition strings. Except that the NC flag is not valid with -f or -d flags.
I would rather just convert all paths and file names to lower case on the servers, but this is beyond my control. Normally, the users do not need to type in anything other than the subdomain and server name.
Here are the settings:
Rewrite map:
me 1001
you 1002
..
him 2999
Relevant parts of vhost container in httpd.conf:
UseCanonicalName off
RewriteEngine on
RewriteMap sub txt:/path/to/map
RewriteRule ^/(.*) - [E=SavLcURL:$1]
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?([a-z.-]+)\.demo\.com [NC]
RewriteRule . /DocumentRoot/websites/${sub:%2¦1000}/%{ENV:SavLcURL} [NC,C]
RewriteCond /$1 -f [OR]
RewriteCond /$1 -d
RewriteRule ^/(.*) - [L]
RewriteCond %{HTTP_HOST} !^www\.demo\.com [NC]
RewriteRule . /%{ENV:SavLcURL} [C]
RewriteCond /DocumentRoot/$1 -f [OR]
RewriteCond /DocumentRoot/$1 -d
RewriteRule ^/(.*) /DocumentRoot/$1 [L]
These rules work as desired with the domains. Entering me.demo.com or Me.demo.com or Me.deMo.com matches in the rewrite map. (I use wildcard DNS, ie... *.demo.com.)
These rules fail with me.demo.com/adMinistrator/ or me.demo.com/AdminiStrator/bogus.php or me.demo.com/administrator/bogus.pHp.
For me, the answer to the improper case failures will be customized error pages.
I am extremely thankful for the test code that you provided, jdMorgan. Although I had read about the E flags, I had not delved into them and the documentation was terse. Searches on the web returned many pages that were just regurgitations of the Apache documentation. With your help, I have reduced my rewrite rules from fifteen rules to just those five listed here. Also, seeing an example of how the environment variables are used was very enlightening. The experimenter in me has a lot to chew on...