Forum Moderators: phranque

Message Too Old, No Replies

Using SymLinks vs. mod_rewrite

Are there any drawbacks?

         

moonraker

12:18 pm on Jan 17, 2009 (gmt 0)

10+ Year Member




System: The following message was cut out of thread at: http://www.webmasterworld.com/apache/3818768.htm [webmasterworld.com] by jdmorgan - 11:25 am on Jan. 17, 2009 (CT -6)


Hello, everybody

Seems like this is a forum for the advanced/pro's, thus, the question might be out of place, but it seems you guys should very well know everything on issues like 'Apache redirecting/rewriting' (and this is the best place, that google and i could find).

The question (in brief):
Are there any drawbacks in using symbolic links as a means of controlling where a web request lands on instead of using Apache's powerful, but sophisticated and takes-a-few-years-to-learn features?

The long story:
You've got this on your Debian server (with Apache):
1. hosting/ (the main dir used for serving pages)
2. --example.com/ (a web-project dir, the app, see below)
3. ----.../
4. ----www/
5. ------htdocs/
6. --------.htaccess
7. --------...
8. ----ssl/
9. ------htdocs/
10.--------.htaccess
11.--------...
12.--project2/
13.--...

1. Is where all browser requests land on (e.g.: "http://server") on a server you control.
2. Is a typical project folder organized in a way, mimicking the structure your virtual hosting provider assigns you.
3. - (additional dirs, not important)
5,9. Are the directories, where HTTP and HTTPS respectively requests will fall.
6,10. The files you use to route requests through a single file (e.g.:'index.php'). Frameworks, using a 'front controller', require this. In my case, it's the Zend Framework. Contents:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ index.php [NC,L]

So, the idea was to have a project, identical to the variant, that will operate at the provider's server and at the same time allow the customer (or other people) to access it on the development server (that's under your control).
The 'live' variant would be accessible under 'http://example.com'
The 'under-development' variant would be accessible under 'http://server/example.com'.

Requesting it as 'http://server/example.com', however, won't get the application working - it'll just get you inside the project dir ("2"). To get it working, you actually need to request it as 'http://server/example.com/ssl/htdocs/' or 'http://server/example.com/www/htdocs/' or, in essence, a way of emulating the provider's internal redirect was necessary.

Having spent a couple of days reading Apache manuals, looking for GUI tools of controlling it or writing htaccess files, tampering with everything at once, the thing i understood best is that 'i am LAME', my lifespan is not long enough to grasp how Apache really works and there are no manuals for the 'lame' 0-)
Moreover, i really don't like "configuring" applications and it's not something i find very interesting.

After that a simple idea dawned upon me:
"ln -s example.com/ssl/htdocs/ example"
or, i.e.
placing a symbolic link with a similar name (in this case '.com' was omitted), referencing the dir, that the user is expected to land on, if he would be requesting the page out of the internet from the provider.
The best (or maybe worst) part is, that it actually works O-)
Requesting 'http://server/example' lands the request at "9" and brings up the application (because of 10) just like 'http://server/example/haha' throws an error in it, if there is no such controller in the app or no file with the name 'haha'.

I don't know where this will get me - probably there will be problems with resolving file names or something like that, it can't all go so smoothely...
So, what do you guys think? Maybe some recommendations on manuals to read?

Caterham

11:15 pm on Jan 19, 2009 (gmt 0)

10+ Year Member



The 'live' variant would be accessible under 'http://example.com'
The 'under-development' variant would be accessible under 'http://server/example.com'.

If this for dev purposes only, I wouldn't care. But I wouldn't test on a live server...

1. hosting/ (the main dir used for serving pages)

Why not /hosting/main/? Making a website available through 'http://example.com' and 'http://server/example.com/....' may not be a good setup since one could bypass <Location> settings. I understand you remove the symlink if your site goes live, but it would still be accessible through the full folder path.
The purpose of different www and ssl htdocs is, that the ssl one isn't accessible for non ssl rquests, but it looks like one can access it via 'http://server/example.com/ssl/htdocs/' or is there another mechanism in place to force SSL for that path if requested through 'http://server/'?.

I guess you're blocking search engine spidering of 'http://server/example/...' via robots.txt exclusion?

moonraker

2:11 pm on Jan 20, 2009 (gmt 0)

10+ Year Member



i guess i forgot about those 'other' details...
There are 2 servers involved.

1:
Dev server. Something that lives in your network and you own and control:
A place of collaboration (other developers and interested people access and use), i.e.
a place for developing web-applications and testing them out of a browser; so, as to enable the customer to see how the project is growing, approve templates, design, etc.

Access via internet:
'http://devserver.com' - if you're renting a static ip address and own a domain name, then you would be able to hook the domain to the ip
'http://#*$!xx.dyndns.com:#*$!x' - if you don't have a static ip address at your disposal and want to save some money (especially, since you're only starting up)

LAN access is simple - 'http://server' and you're there

2:
Virtual hosting provider's server.
In most cases, you would usually use a provider for the live application, because his technical park provides much better accessibility, speed, responsiveness, etc. than a metal box, that humms in your living room O-)
On top of that, it's usually rather cheap and you're definitely not in a state when you should or could afford to maintain your park and act as a hosting provider. Renting is more feasible.
So, a live 'app' sits on 2.
An app being developed sits on 1 (well, and actually continues to live there, since i intend to have 'long-time' relationships with customers and have things 'built on' the current versions). Complete consistency.

At the virtual hosting, you're given a dir, some subdirs of which are already 'prescribed' - 'www' and 'ssl', for instance.
'http://example.com' - lands the request at '/www/htdocs/'
'https://example.com' - lands the request at '/ssl/htdocs/'
You don't control this behavior, but this is what you need to mimick at the devserver. At that was basically the question - mod_rewrite or symlink (as jdMorgan has correctly summarized).

Robots - you're not gonna have much of them on the devserver (i think O-). And you wouldn't probably give a damn about them.

So, in my case - a symlink seemed to solve the issue (for now). It's simple and stupid. However, i haven't yet confronted any cases when SSL was needed (thus, 'http://server/example' and 'https://server/example' have to be mapped to different places). In this event, probably mod_rewrite would have to be used - either in an 'htaccess' in the 'hosting' dir or the app dir). So, 'symlinking' will soon get me in a dead end.

But, say, you actually have a decent hosting machine. You need to serve a number of web-apps...
1) Would it be possible to somehow route requests via symlinks? (different zones, domains, sub-domains, http(s))
2) How bad would that be? Are there any gains?

Caterham

6:46 pm on Jan 20, 2009 (gmt 0)

10+ Year Member



route requests via symlinks? (different zones, domains, sub-domains, http(s))

A symlink is not related to some server setup, protocols etc. It links a physical filesystem path to another physical filesystem path.

(thus, 'http://server/example' and 'https://server/example' have to be mapped to different places)

If the "example" part is not important for your app, what about using another symlink 'https://server/example_ssl'? If you're flexible this way, you shouldn't run into issues. If not, the rewrite docs should be your friend. :-) Speaking for an more generic mod_rewrite solution, life would be easier if your web-project dir is the same you're requesting (i.e. 'example' instead of 'example.com').

moonraker

10:53 pm on Jan 20, 2009 (gmt 0)

10+ Year Member



"It links a physical filesystem path to another physical filesystem path." - in a way, i understand this. It should be different on a Win-web-server probably, though...

Since the dev server is my own machine, i can ... 'heck' ... around with it all i want, so 'example_ssl' will do just fine. Should it become necessary.

Getting the hang of mod_rewrite is like reaching the stars... is there a sane way of debugging it? Not by setting a 'RewriteLog' and having 'tail -f' pointed to it? "Mod_rewrite can do this and it can do that and it should just do about anything that ever crosses your mind..." - i searched a dozen of places (including apache.org), but nowhere i found a way to actually see such basic things like the values of HTTP_USER_AGENT, HTTP_REFERER, HTTP_COOKIE, HTTP_FORWARDED when processing a request (the best i came up - is writing a php-script that dumps all superglobals).

Returning to symlinks -
Under 1 - i was thinking would it be possible, for instance, to have a dozen of 'chained' symbolic links run instead of mod_rewrite and would there be an advantage to it? The idea, of course, is most likely insane, but... ?

For instance, you would have a lot of "internal redirects", something like:
1. 'http' or 'https'
2. domain
3. subdomain
4. sub-subdomain.

How do virtual hosting providers do all this redirecting? Since they host a bunch of websites, it should be correct to assume, that .htaccess does it all? (since you can't bring down/restart a server, once you've modified the main apache config)
"<VirtualHost>" can be used out of the server config only... so, how does it get done? (something to discuss in another topic?)

Thanks for the feedback, Caterham

Caterham

1:11 am on Jan 22, 2009 (gmt 0)

10+ Year Member



It should be different on a Win-web-server probably, though...

Symlinks were introduced into windows with vista and windows server 2008 (NT 6.0). Other "links" (shortcuts) in prior versions aren't symlinks.

Since they host a bunch of websites, it should be correct to assume, that .htaccess does it all? (since you can't bring down/restart a server, once you've modified the main apache config)

Hopefully no one uses .htaccess files if this is some sort of serious hosting. Not generally speaking, but from what I've seen <virtualhost>s are used. Other solutions (mod_rewrite, mod_vhost_alias) don't or better can't provide a DocumentRoot variable. Unfortunately many programmers rely on such unreliable variables in their scripts.
A graceful restart is also possible, so there's no real "down time" when modifying the configuration file and forcing the main process and child processes to re-read the config.

For instance, you would have a lot of "internal redirects", something like:

There are no internal redirects (speaking of the internal apache function) when you setup things in per-server context, prior the mapping to the filesystem occurs.

For symlinks; the symlinks are resolved during the directory_walk. The dir_walk is complex, I don't have the time to check if the cache is used if parts of the physical path matches a previous dir_walk. Anyway, since you are on a dev system and haven't to deal with 50 requests per second, I won't care if the dir_walk needs to run from root again. If you have a homogeneous filesystem layout *I*'d use a generic solution which doesn't need to be modified for internal access each time you setup a new dev project because that sounds annoying.

Variables... yes, printing them is a good idea. Describing them is difficult because the values of some server-side variables differ between the different phases of the request processing.

How did you setup the config for 'http://server/' and 'https://server/'? Via two different <virtualhost> sections, one for port 80, one for 443 (trying to figure out how you could use a httpd.conf based solution in the uri-to-filename translation phase)?

moonraker

2:07 pm on Jan 22, 2009 (gmt 0)

10+ Year Member



>>Hopefully no one uses .htaccess files if this is some sort of serious hosting.

Besides setting the DocumentRoot variable, are there other reasons? Maybe this provides better security as well?

If you're using a front-controller-based framework (i may be mistaken), it's probably that you refer to 'directory/filesystem' variables set by the web-server once - in the bootstrap:
"realpath(dirname(__FILE__)..."

__FILE__ - is one of those variables actually coming from a set DocumentRoot?

>>A graceful restart is also possible

i thought the idea behind htaccess was to actually provide a mean of configuring some of the apache's 'behaviors' when it's serving certain dirs without having to restart it. Having looked at /etc/apache2/ you see a couple of dirs like 'sites-available' - each site (judging from the example) is configured via an individual file. And so, these configs (i guess) can be (un)loaded at run-time.
Is this the graceful restart?

>>*I*'d use a generic solution which doesn't need to be modified

Most people would; if they could O-).
Really, to get it all automated, wise and dandy - you need to learn quite a few things... and that would probably be like 'apache <VirtualHost>', bash scripting and debugging.
currently, the result is really not worth the effort and time...

>> How did you setup the config for 'http://server/' and 'https://server/'?

Actually, i haven't done anything about it yet. The thing is (again), because you're using ZF, it appears that all scripts are 'hidden' (can not be accessed from 'http' or 'https' directly)...
Be it a 'http' or 'https' - both requests are to be routed through the same bootstrap file. So, actually, a request to 'http' or 'https' should be 'land on the same thing' in the filesystem. But, the server has to be configured to use different protocols for it. So, i'm in for some Apache-manual-reading (i hope i don't break my eyes)...
i'll post it here once i have ... 'something'...

Caterham

3:55 pm on Jan 22, 2009 (gmt 0)

10+ Year Member



__FILE__, esp. with __ looks like a constant which is defined in that script/framework somewhere.

__FILE__ - is one of those variables actually coming from a set DocumentRoot?

May be something from ORIG_PATH_TRANSLATED or PATH_TRANSLATED or ORIG_SCRIPT_FILENAME or SCRIPT_FILENAME after evaluating the variables and/or for php, php_sapi_name(). The values of the variables which should contain the full physical path to the requested resource can differ or aren't available. Either your script is invoked via a "direct" content handler, or CGI setup. It could be a direct "exec call" (shebang line), an indirect call of the executable (interpreter) via mod_rewrite or the Action directive or other solutions (mod_fastcgi, mod_fcgid).
In other words, determining the physical path is not just 10 characters of code because there is not one single variable which will contain the physical path to the originally requested resource, which is accessible to the cgi handler. If you'd like to take care of aliases, userdirs (/~user/), symlinks or anything else which causes a resource not to be served from documentroot+request_uri, you'd have to check for the right one.
Side note: In such cases, using mod_rewrite in per-dirctory context, e.g. in .htaccess files, you must use the RewriteBase directive to specify the URL-path.
Your .htaccess setup works without, because your folders are still directly related under the document root (only the path differs).

i thought the idea behind htaccess was to actually provide a mean of configuring some of the apache's 'behaviors' when it's serving certain dirs without having to restart it.

The purpose is to give end users without access to the main configuration a chance to modify things, but not to replace the main config for users with access to the main server configuration. See When (not) to use .htaccess files [httpd.apache.org].

Is this the graceful restart?

The process is described at [httpd.apache.org ]

each site (judging from the example) is configured via an individual file.

For external domains, yes. Internal access ('http://server/') should have one setup only.

And so, these configs (i guess) can be (un)loaded at run-time.

Depending upon the config, you'll have to comment-out an Include directive in your main file; if the include directive includes all *.conf files of a folder, you'll have to rename the conf file which should be unloaded (i.e. not fond by the include directive anymore).

and that would probably be like 'apache <VirtualHost>', bash scripting and debugging.

If you have multiple domains on your dev machine (external access) which should point to the desired project, yes. But I was thinking about the internal access via 'http://server/' which should serve the correct folder while requesting 'http://server/example/foo' automatically.

i'll post it here once i have ... 'something'...

Ok, I'll monitor. May be [httpd.apache.org...] is a good point to start.