Forum Moderators: phranque

Message Too Old, No Replies

Converting certain folders in URL to parameters

Using mod_rewrite to convert certain folders to parameters

         

pr0fess0r

9:24 pm on Jan 20, 2010 (gmt 0)

10+ Year Member



Hi I have a site where I'd like to convert the first subfolder into a parameter (from a set list of subfolders), so

[localhost...]

is displayed as shown above, but to PHP it looks like:

[localhost...]

I only need to do this on certain folders (i.e. not ... localhost/mysite/admin/ for example) and I need to take into account the fact that there may or may not already be parameters appended to the URL

I'm creating a site for multiple organisations to use, and based on the organisation, colours etc in the site will need to be different, but each organisation will be using the same pages, and the client wants the organisation shortcode in the URL.
I thought the best way to do this would be with mod_rewrite (my server is Ubuntu 9.10).

In .htaccess I have tried
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^folder1/(.*) $1&organisation=folder1 [NC,QSA]

If I browse [server...] I get a page not found error, and the htaccess debug log says:

(3) [perdir /var/www/sitename/] add path info postfix: /var/www/sitename/folder1 -> /var/www/sitename/folder1/destination.php
(3) [perdir /var/www/sitename/] strip per-dir prefix: /var/www/sitename/folder1/destination.php -> folder1/destination.php
(3) [perdir /var/www/sitename/] applying pattern '^folder1/(.*)' to uri 'folder1/destination.php'
(2) [perdir /var/www/sitename/] rewrite 'folder1/destination.php' -> 'destination.php&organisation=folder1'
(3) [perdir /var/www/sitename/] add per-dir prefix: destination.php&organisation=folder1 -> /var/www/sitename/destination.php&organisation=folder1
(2) [perdir /var/www/sitename/] strip document_root prefix: /var/www/sitename/destination.php&organisation=folder1 -> /sitename/destination.php&organisation=folder1
(1) [perdir /var/www/sitename/] internal redirect with /sitename/destination.php&organisation=folder1 [INTERNAL REDIRECT]
(3) [perdir /var/www/sitename/] strip per-dir prefix: /var/www/sitename/destination.php&organisation=folder1 -> destination.php&organisation=folder1
(3) [perdir /var/www/sitename/] applying pattern '^folder1/(.*)' to uri 'destination.php&organisation=folder1'
(1) [perdir /var/www/sitename/] pass through /var/www/sitename/destination.php&organisation=folder1

What am I doing wrong?

Many thanks in advance
Lucas

jdMorgan

11:20 pm on Jan 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




RewriteRule ^folder1/(.*)$ [b]/$1?o[/b]rganisation=folder1 [[b]QSA,L[/b]]

  • End-anchor the pattern
  • Precede substitution path with "/" to prevent malicious full-path injection.
  • Query strings in the RewriteRule substitution are demarcated by "?". Apache will take care of the rest with the [QSA] flag set.
  • Use [L] flags on all rules, unless you have a rare and specific (provable) case where you cannot use it.
  • Do not use [NC] on internal rewrite patterns. Allowing more than one URL to resolve to the same content creates duplicate-content problems with search engines, which can negatively impact ranking. If incorrect-case URLs exist in incoming links, use a separate (preceding) rule to externally redirect them to the correct-case URL. If incorrect-case links exist on your own site(s) then correct them in source.
  • Be sure to validate the "organisation" value passed to your script. If it is invalid, then your script should return either a 301-Moved Permanently redirect to the correct URL, or a 404-Not Found or 410-Gone error response as appropriate. Failure to do so again results in a duplicate-content problem.

    If you have a short list of first-level virtual "organisation" directories which *should* be rewritten, or an even shorter list of first-level 'real' directories which should *not* be rewritten, you can rewrite all of the "organisation" requests with a single rule, either including the "should be" list in the RewriteRule or RewriteCond pattern(s), or adding a negative-match RewriteCond to exclude the "should not be" URLs from being rewritten.

    I'll make one more recommendation that may help over the long term: Putting all of the 'organisation' virtual folders in the top-level directory risks a long-term maintenance nightmare, and also risks 'collisions' between future organisation names and required site infrastructure folder names. If the number of 'organisations' grows too large, then it could even affect server performance. It is generally better to use a hierarchical approach, and put all 'organisations' into a top-level (virtual, in this case) folder, such as "/orgs/<organisation>" instead of putting all virtual "/<organisation>" folders in the root.

    Using a hierarchical approach can also lead to better efficiency, as only one simple rule (testing for "^orgs/(.*)$") is needed in the top-level .htaccess file, and any 'details' related to rewriting the individual virtual directories can be handled in a separate /orgs/.htaccess file.

    Jim

  • pr0fess0r

    12:15 am on Jan 21, 2010 (gmt 0)

    10+ Year Member



    Wow, thats awesome, thanks so much for the detailed response :)

    pr0fess0r

    12:59 am on Jan 21, 2010 (gmt 0)

    10+ Year Member



    Unfortunately the client want the site be in the form [domain.com...] so I cant use your /orgs/ idea.
    When customers are created a I have a PHP script to check that the name doesn't match any existing folder in the root directory, and I'll have the PHP script update the .htaccess file with a new entry for each organisation - not ideal, but I'm under a time constraint (aren't we all!) Thanks again for your help
    Lucas

    pr0fess0r

    1:07 am on Jan 21, 2010 (gmt 0)

    10+ Year Member



    Sorry to be a pain, but with this .htaccess

    Options +FollowSymLinks
    RewriteEngine on
    RewriteRule ^folder1/(.*)$ /$1?organisation=folder1 [QSA,L]

    I still get a 404 error, and the rewrite log says:

    (3) [perdir /var/www/sitename/] add path info postfix: /var/www/sitename/folder1 -> /var/www/sitename/folder1/destination.php
    (3) [perdir /var/www/sitename/] strip per-dir prefix: /var/www/sitename/folder1/destination.php -> folder1/destination.php
    (3) [perdir /var/www/sitename/] applying pattern '^folder1/(.*)$' to uri 'folder1/destination.php'
    (2) [perdir /var/www/sitename/] rewrite 'folder1/destination.php' -> '/destination.php?organisation=folder1'
    (3) split uri=/destination.php?organisation=folder1 -> uri=/destination.php, args=organisation=folder1
    (1) [perdir /var/www/sitename/] internal redirect with /destination.php [INTERNAL REDIRECT]

    My internal server is Ubuntu 9.10 and I'm viewing the page with Chrome on a Mac.

    It DOES work however, if I use
    RewriteRule ^folder1/(.*)$ $1?organisation=folder1 [QSA,L]

    Cheers

    g1smd

    2:45 am on Jan 21, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    I cannot stress enough that you need a simple way to discover the request is for 'orgs'. If not a folder /orgs/<name>, then orgs-<name> or somesuch instead.

    To ignore this will make the job harder to do in the first place and will cause multiple problems in years to come. It will be a difficult site to maintain.

    What the client 'wants' and what the best technical solution is, might be two separate things.

    I'd ask the client if they want pretty URLs on a slow and unresponsive site (and do note that Google looks at page load times now) that will be difficult to maintain, or a technically robust solution with faster loading pages, while still using simple-ish URLs.

    pr0fess0r

    3:33 am on Feb 22, 2010 (gmt 0)

    10+ Year Member



    Hi Guys

    I have this working great now:

    Options +FollowSymLinks
    RewriteEngine on
    RewriteRule ^companya/?(.*)$ $1?organisation=companya [QSA,L]
    RewriteRule ^companyb/?(.*)$ $1?organisation=companyb [QSA,L]
    RewriteRule ^companyc/?(.*)$ $1?organisation=companyc [QSA,L]

    On my local server it works fine. On the live server, 2 of the redirects arent working. None of the subfolders of the site match the company names, so it's not like a clash or anything, and the company names are words with no spaces or punctuation or anything - why would only a couple of the rules be working?

    Cheers

    g1smd

    8:39 am on Feb 22, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    "Not working" gives no clues to be going on with.

    Use "Live HTTP Headers" to see the requests and server responses.

    You have missed point 2 of post #:4064603 - Precede substitution path with "/" to prevent malicious full-path injection.

    "2 of the redirects aren't working" - be aware these aren't redirects, they are internal rewrites.

    jdMorgan

    1:01 pm on Feb 22, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The use of the /orgs subdirectory has nothing whatsoever to do with URLs -- It has only to do with file organization inside the server.

    To be clear, I'm only saying that the filesystem should be organized as
    /orgs/site1
    /orgs/site2
    /orgs/site3

    Instead of
    /site1
    /site2
    /site3

    This results in better organization and less clutter in the top-level directory, and allows one single rule to handle all "add-on domain"-to-subdirectory rewriting instead of having to add and maintain a new rule for each "add-on" domain, slowing down the server.

    This change has no effect whatsoever on the URL used "out on the Web" to access the files belonging to each domain; it only affects the file organization inside the server, and the efficiency/simplicity of the supporting configuration code.

    If the client ignores this advice even after understanding it correctly, it may cost them dearly in terms of on-going maintenance, slower server response, difficulty in adding/changing/removing domains, and 'collisions' between the names of 'virtual domain folders' and names of 'real folders' needed by the server. Look to the example in the FTP view of any commercially-hosted virtual server: There's a reason that virtual server DocumentRoots on shared hosting are almost always pointed to a path such as /users/account_name -- It makes *everything* easier, more efficient, and easier to administrate.

    We try to be polite, but many recommendations here are based on years of experience 'fixing' costly mistakes like this... and this is one of them. What the client wants and what the client needs are often very different -- Which is why specialist Web consultants exist and thrive.

    Jim

    pr0fess0r

    12:00 am on Feb 23, 2010 (gmt 0)

    10+ Year Member



    Hi Guys
    Thanks so much for your in-depth responses.
    Preceding the substitution path with / actually fixed the rewrites that didnt work.
    The way this site works is that
    blah.com/orgname/file.php becomes blah.com/file.php?organisation=orgname
    I've been pretty robust with this - the /orgname/ cant match an existing folder, and the organisation request variable is sanitised and checked against the db to ensure it matches an existing organisation. Unfortunately the client insisted on the url structure and of course had printed the documentation and sales material before we even began development. Such are the ways of clients *sigh*. What was happening with the earlier bug was I had a php test page that just echoed the organisation request variable. For some urls it would echo the organisation name, for others it was blank. For example
    blah.com/digitalus/test.php would output "digitalus"
    blah.com/manukauwater/test.php would output nothing. And there were no files in the site caled manukauwater. The / change mentioned above fixed this. I couldnt turn on verbose rewrite logging as the live server is a dedicated one with a lot of sites in it. But the issue seems to be fixed now, and while I'd love to know what caused the problem, in the high-paced world of web development we have no time for such luxuries! Thanks again :D

    g1smd

    12:11 am on Feb 23, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    I would want to be sure I knew what had caused the problem, and doubly sure that it was 100% fixed.

    If it ever breaks again, the entire site is offline, someone is losing money, and searchengines could be removing your pages from their index.

    pr0fess0r

    12:36 am on Feb 23, 2010 (gmt 0)

    10+ Year Member



    It's a private site that's only online for a short period of time and not indexed by Google. It's not critical if it crops up again, the problem was that some pages were displaying blah.com/index.php instead of blah.com/orgname/index.php because the rewrite didnt trigger, but this wasnt affecting the functionality of the site
    Cheers :)