Forum Moderators: phranque
Seems like this is a forum for the advanced/pro's, thus, the question might be out of place, but it seems you guys should very well know everything on issues like 'Apache redirecting/rewriting' (and this is the best place, that google and i could find).
The question (in brief):
Are there any drawbacks in using symbolic links as a means of controlling where a web request lands on instead of using Apache's powerful, but sophisticated and takes-a-few-years-to-learn features?
The long story:
You've got this on your Debian server (with Apache):
1. hosting/ (the main dir used for serving pages)
2. --example.com/ (a web-project dir, the app, see below)
3. ----.../
4. ----www/
5. ------htdocs/
6. --------.htaccess
7. --------...
8. ----ssl/
9. ------htdocs/
10.--------.htaccess
11.--------...
12.--project2/
13.--...
1. Is where all browser requests land on (e.g.: "http://server") on a server you control.
2. Is a typical project folder organized in a way, mimicking the structure your virtual hosting provider assigns you.
3. - (additional dirs, not important)
5,9. Are the directories, where HTTP and HTTPS respectively requests will fall.
6,10. The files you use to route requests through a single file (e.g.:'index.php'). Frameworks, using a 'front controller', require this. In my case, it's the Zend Framework. Contents:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ index.php [NC,L]
So, the idea was to have a project, identical to the variant, that will operate at the provider's server and at the same time allow the customer (or other people) to access it on the development server (that's under your control).
The 'live' variant would be accessible under 'http://example.com'
The 'under-development' variant would be accessible under 'http://server/example.com'.
Requesting it as 'http://server/example.com', however, won't get the application working - it'll just get you inside the project dir ("2"). To get it working, you actually need to request it as 'http://server/example.com/ssl/htdocs/' or 'http://server/example.com/www/htdocs/' or, in essence, a way of emulating the provider's internal redirect was necessary.
Having spent a couple of days reading Apache manuals, looking for GUI tools of controlling it or writing htaccess files, tampering with everything at once, the thing i understood best is that 'i am LAME', my lifespan is not long enough to grasp how Apache really works and there are no manuals for the 'lame' 0-)
Moreover, i really don't like "configuring" applications and it's not something i find very interesting.
After that a simple idea dawned upon me:
"ln -s example.com/ssl/htdocs/ example"
or, i.e.
placing a symbolic link with a similar name (in this case '.com' was omitted), referencing the dir, that the user is expected to land on, if he would be requesting the page out of the internet from the provider.
The best (or maybe worst) part is, that it actually works O-)
Requesting 'http://server/example' lands the request at "9" and brings up the application (because of 10) just like 'http://server/example/haha' throws an error in it, if there is no such controller in the app or no file with the name 'haha'.
I don't know where this will get me - probably there will be problems with resolving file names or something like that, it can't all go so smoothely...
So, what do you guys think? Maybe some recommendations on manuals to read?
The 'live' variant would be accessible under 'http://example.com'
The 'under-development' variant would be accessible under 'http://server/example.com'.
If this for dev purposes only, I wouldn't care. But I wouldn't test on a live server...
1. hosting/ (the main dir used for serving pages)
I guess you're blocking search engine spidering of 'http://server/example/...' via robots.txt exclusion?
1:
Dev server. Something that lives in your network and you own and control:
A place of collaboration (other developers and interested people access and use), i.e.
a place for developing web-applications and testing them out of a browser; so, as to enable the customer to see how the project is growing, approve templates, design, etc.
Access via internet:
'http://devserver.com' - if you're renting a static ip address and own a domain name, then you would be able to hook the domain to the ip
'http://#*$!xx.dyndns.com:#*$!x' - if you don't have a static ip address at your disposal and want to save some money (especially, since you're only starting up)
LAN access is simple - 'http://server' and you're there
2:
Virtual hosting provider's server.
In most cases, you would usually use a provider for the live application, because his technical park provides much better accessibility, speed, responsiveness, etc. than a metal box, that humms in your living room O-)
On top of that, it's usually rather cheap and you're definitely not in a state when you should or could afford to maintain your park and act as a hosting provider. Renting is more feasible.
So, a live 'app' sits on 2.
An app being developed sits on 1 (well, and actually continues to live there, since i intend to have 'long-time' relationships with customers and have things 'built on' the current versions). Complete consistency.
At the virtual hosting, you're given a dir, some subdirs of which are already 'prescribed' - 'www' and 'ssl', for instance.
'http://example.com' - lands the request at '/www/htdocs/'
'https://example.com' - lands the request at '/ssl/htdocs/'
You don't control this behavior, but this is what you need to mimick at the devserver. At that was basically the question - mod_rewrite or symlink (as jdMorgan has correctly summarized).
Robots - you're not gonna have much of them on the devserver (i think O-). And you wouldn't probably give a damn about them.
So, in my case - a symlink seemed to solve the issue (for now). It's simple and stupid. However, i haven't yet confronted any cases when SSL was needed (thus, 'http://server/example' and 'https://server/example' have to be mapped to different places). In this event, probably mod_rewrite would have to be used - either in an 'htaccess' in the 'hosting' dir or the app dir). So, 'symlinking' will soon get me in a dead end.
But, say, you actually have a decent hosting machine. You need to serve a number of web-apps...
1) Would it be possible to somehow route requests via symlinks? (different zones, domains, sub-domains, http(s))
2) How bad would that be? Are there any gains?
route requests via symlinks? (different zones, domains, sub-domains, http(s))
(thus, 'http://server/example' and 'https://server/example' have to be mapped to different places)
Since the dev server is my own machine, i can ... 'heck' ... around with it all i want, so 'example_ssl' will do just fine. Should it become necessary.
Getting the hang of mod_rewrite is like reaching the stars... is there a sane way of debugging it? Not by setting a 'RewriteLog' and having 'tail -f' pointed to it? "Mod_rewrite can do this and it can do that and it should just do about anything that ever crosses your mind..." - i searched a dozen of places (including apache.org), but nowhere i found a way to actually see such basic things like the values of HTTP_USER_AGENT, HTTP_REFERER, HTTP_COOKIE, HTTP_FORWARDED when processing a request (the best i came up - is writing a php-script that dumps all superglobals).
Returning to symlinks -
Under 1 - i was thinking would it be possible, for instance, to have a dozen of 'chained' symbolic links run instead of mod_rewrite and would there be an advantage to it? The idea, of course, is most likely insane, but... ?
For instance, you would have a lot of "internal redirects", something like:
1. 'http' or 'https'
2. domain
3. subdomain
4. sub-subdomain.
How do virtual hosting providers do all this redirecting? Since they host a bunch of websites, it should be correct to assume, that .htaccess does it all? (since you can't bring down/restart a server, once you've modified the main apache config)
"<VirtualHost>" can be used out of the server config only... so, how does it get done? (something to discuss in another topic?)
Thanks for the feedback, Caterham
It should be different on a Win-web-server probably, though...
Since they host a bunch of websites, it should be correct to assume, that .htaccess does it all? (since you can't bring down/restart a server, once you've modified the main apache config)
For instance, you would have a lot of "internal redirects", something like:
For symlinks; the symlinks are resolved during the directory_walk. The dir_walk is complex, I don't have the time to check if the cache is used if parts of the physical path matches a previous dir_walk. Anyway, since you are on a dev system and haven't to deal with 50 requests per second, I won't care if the dir_walk needs to run from root again. If you have a homogeneous filesystem layout *I*'d use a generic solution which doesn't need to be modified for internal access each time you setup a new dev project because that sounds annoying.
Variables... yes, printing them is a good idea. Describing them is difficult because the values of some server-side variables differ between the different phases of the request processing.
How did you setup the config for 'http://server/' and 'https://server/'? Via two different <virtualhost> sections, one for port 80, one for 443 (trying to figure out how you could use a httpd.conf based solution in the uri-to-filename translation phase)?
Besides setting the DocumentRoot variable, are there other reasons? Maybe this provides better security as well?
If you're using a front-controller-based framework (i may be mistaken), it's probably that you refer to 'directory/filesystem' variables set by the web-server once - in the bootstrap:
"realpath(dirname(__FILE__)..."
__FILE__ - is one of those variables actually coming from a set DocumentRoot?
>>A graceful restart is also possible
i thought the idea behind htaccess was to actually provide a mean of configuring some of the apache's 'behaviors' when it's serving certain dirs without having to restart it. Having looked at /etc/apache2/ you see a couple of dirs like 'sites-available' - each site (judging from the example) is configured via an individual file. And so, these configs (i guess) can be (un)loaded at run-time.
Is this the graceful restart?
>>*I*'d use a generic solution which doesn't need to be modified
Most people would; if they could O-).
Really, to get it all automated, wise and dandy - you need to learn quite a few things... and that would probably be like 'apache <VirtualHost>', bash scripting and debugging.
currently, the result is really not worth the effort and time...
>> How did you setup the config for 'http://server/' and 'https://server/'?
Actually, i haven't done anything about it yet. The thing is (again), because you're using ZF, it appears that all scripts are 'hidden' (can not be accessed from 'http' or 'https' directly)...
Be it a 'http' or 'https' - both requests are to be routed through the same bootstrap file. So, actually, a request to 'http' or 'https' should be 'land on the same thing' in the filesystem. But, the server has to be configured to use different protocols for it. So, i'm in for some Apache-manual-reading (i hope i don't break my eyes)...
i'll post it here once i have ... 'something'...
__FILE__ - is one of those variables actually coming from a set DocumentRoot?
i thought the idea behind htaccess was to actually provide a mean of configuring some of the apache's 'behaviors' when it's serving certain dirs without having to restart it.
Is this the graceful restart?
each site (judging from the example) is configured via an individual file.
And so, these configs (i guess) can be (un)loaded at run-time.
and that would probably be like 'apache <VirtualHost>', bash scripting and debugging.
If you have multiple domains on your dev machine (external access) which should point to the desired project, yes. But I was thinking about the internal access via 'http://server/' which should serve the correct folder while requesting 'http://server/example/foo' automatically.
i'll post it here once i have ... 'something'...