I've finally had enough examples to actually grasp Apache's syntax however I certainly wouldn't claim to be anywhere as talented as some of the regulars here so I'm asking for folks to please critique my code for order (of code), performance, security and syntax in general.
The Setup The setup is very simple, multiple websites with the same structures but each has their own content and skin/themes. To work professionally I test everything locally
first to minimize the chances that a visitor to whatever site I'm working on will encounter a problem because of bad code that was committed before it was written correctly which means the Apache .htaccess rules below
have to work on both localhost and a live domain without modifying the server configuration files (as I have to use shared hosting at least for the time being).
Visually the actual site folders may appear like so...
http:// localhost/site1/
http:// localhost/site2/
The asset directories appear like so...
http:// localhost/admin/
http:// localhost/blog/
http:// localhost/forums/
http:// localhost/scripts/
http:// localhost/themes/
The rewrite rules rewrite the specific modules (e.g. admin, blog, forums) and shared assets (e.g. scripts, themes) like so...
http:// localhost/site1/blog/
http:// localhost/site2/blog/
http:// localhost/site1/scripts/
http:// localhost/site2/scripts/
Specific Modules and the CMS Module I've built my own CMS as well as various specific modules (e.g. blog, forums, private messaging). Regardless of what module Apache rewrites to
that given module handles HTTP codes (e.g. 200, 403, 404, etc).
So one of the important goals was to ensure that specific modules captured requests relative to their path while all other requests are handled by the CMS module.
The image module is what I consider a
mixed module as clients will be able to upload their own images however there are certain specific paths that get rewritten. Thankfully these two paths don't conflict so I get both shared assets in the shared images directory and clients get to still use their own image directory for uploading and using their own images (without those images becoming accessible to other clients).
#
Shared Image Module http:// localhost/images/
#
Client-specific Images http:// localhost/site1/images/
Exceptions One of my goals was to retain my personal homepage that contains quick links to various things I use including links to client websites.
I also noticed that some files that had similar strings for names were catching. I'm not entirely sure I've resolved all of those issues (e.g. scripts/admin.js was being effected by the admin directory rule until I added the condition however in the unlikely condition that I have the work 'admin' in another directory it may or may not effect that URL.
Security Concerns When you share
anything I think security concerns automatically minimally double. Concerns like one client uploading an undesirable image and having it appear on other client's websites are things I've taken in to consideration. In regards to CMS and other module content each client has their own dedicated copy of the database. The database (and file path structures) are determined by the domain in PHP (or the sub-directory for localhost when I do interchangeable testing). So the software can work with any domain I
allow it to work with though all the content remains separate. I'm very open to any concerns this setup may incur. No clients have permission to execute any of their own server side code (e.g. PHP) and eval isn't used at all. Since I write
literally all of my own software I don't have to worry about third parties deciding to do those things on my behalf. I use exceptionally strict coding practices, log all errors (JavaScript, PHP and MySQL) as well as HTTP responses that aren't 200. I
know exactly what's going on and can easily see if someone is attempting to do an SQL injection attack in example or if JavaScript is being from a different domain and numerous other things. I think paranoia is a good code of ethic for security concerns so if anyone has a thought as how to approach an attack against my setup I'd love to hear about any details as I want to address any and every possible security concern.
The Code http:// localhost/.htaccess
AddHandler application/x-httpd-php .css .js
AddType text/javascript .js
RewriteEngine on
RewriteRule ^$ index.php [QSA]
RewriteCond %{REQUEST_URI} !\.(css|js|zip)$
RewriteRule .*/admin(.+) admin$1 [QSA]
RewriteRule .*/blog(.+) blog$1 [QSA]
RewriteRule .*/forums(.+) forums$1 [QSA]
RewriteRule .*/images/$ .*/images/$ [QSA]
RewriteRule .*/messages(.+) messages$1 [QSA]
RewriteRule .*/redirect\.php redirect\.php [QSA]
RewriteRule .*/scripts(.+) scripts$1 [QSA]
RewriteRule ^(index\.php|test1\.php|test2\.php|images/|scripts/|themes/|redirect\.php) - [L]
RewriteCond %{REQUEST_URI} !.*/(admin|blog|forums|images|messages)
RewriteRule !\.(css|js|zip)$ rewrite.php
What the code does... The first two lines allow me to execute PHP inside of JavaScript and CSS files. This essentially is used for site visitor preferences.
----
The third line (of code) I obviously turn the RewriteEngine on.
----
The fourth line of code I create an exception for the root index (http:// localhost/). the ^ (starts with) and $ (ends with) with nothing in between equates to http:// localhost/ which is wonderfully simple although I did attempt ="" to avoid regular expressions unsuccessfully. This line allows me to continue using my customized homepage as it works just fine.
----
RewriteCond %{REQUEST_URI} !\.(css|js|zip)$
The fifth line of code is for exceptions for the rules to follow. I added this because
scripts/admin.js was being effected (HTTP 404) by the admin rule below. I don't think this line prevents different matches though (e.g. localhost/site1/something/administrative-conduct).
----
RewriteRule .*/admin(.+) admin$1 [QSA]
RewriteRule .*/blog(.+) blog$1 [QSA]
RewriteRule .*/forums(.+) forums$1 [QSA]
RewriteRule .*/images/$ .*/images/$ [QSA]
RewriteRule .*/messages(.+) messages$1 [QSA]
RewriteRule .*/redirect\.php redirect\.php [QSA]
RewriteRule .*/scripts(.+) scripts$1 [QSA]
These lines are for specific module rewrites that are for all URLs that
aren't handled by the CMS module. The .* bit dynamically matches the second directory (e.g. "site1" in localhost/site1/whatever.html) up until a forward slash and the matching directory name. I left out the ending slash and used (.+) instead so that the index
and all requests inside of those directories would be rewritten to the shared directory paths (e.g. localhost/blog/).
Two things...
First I should note again is that I'm not entirely sure these rules won't match
localhost/site1/example/blog-page-concerns, so in other words I'm not certain matching is strictly limited to the first directory (that example the term blog is in the second directory for the client).
Secondly I'm not sure (though I imagine it would be possible) if I can merge these rules in to a single rule (item1|item2|item3). I've gotten this far though I'm not
that good or at least not yet.
----
RewriteRule ^(index\.php|test1\.php|test2\.php|images/|scripts/|themes/|redirect\.php) - [L]
These are general exceptions, I intend to keep this list as minimal as possible. I've added test file as examples. Neither the specific modules nor the CMS modules will rewrite these URLs.
----
RewriteCond %{REQUEST_URI} !.*/(admin|blog|forums|images|messages)
This line essentially is an exception list for the specific modules so that the CMS module doesn't rewrite them. If I don't add a directory here then the CMS module rewrite (in the last/next line) is applied.
----
RewriteRule !\.(css|js|zip)$ rewrite.php
Everything else gets rewritten to rewrite.php which handles the CMS module. The list of file extensions is truncated intentionally.
Final Thoughts I never thought I'd ever get this far with Apache though I have. I don't think the code is perfect hence why I've posted it here for others to critique though I
can say at least it works! I'm sure there is redundant code in there and I could very likely adjust some of the rewrite rules to be more restrictive so I'm open to trying that out. If I can understand the
why aspect of why the code exists as it does then I can usually grasp on to the
how aspect of the syntax involved, that has been the main issue I've had in trying to understand how to write Apache syntax.
So I'm looking for any critiquing no matter how minimal the concerns or syntax changes may be. Any potential security concerns (doesn't have to be an explicit security hole) are especially important though I also want to improve performance and eliminate redundancy wherever possible as well please.
Alright, fire away please! :)
- John