Forum Moderators: phranque

Message Too Old, No Replies

url rewrite with multiple subdirectories

         

HDClown

11:34 pm on Jan 26, 2015 (gmt 0)

10+ Year Member



I'm trying to adjust/add to an existing rewrite rule and am having some problems. Here's the layout:

A file structure exists like the following.


\
\page1.php
\news\page1.php
\template\page1.php
\template\news\page1.php
\userA\index.php

Extrapolate page#.php to dozens of pages. There are also other subdirectories other than \news with pages in them.

There is 100+ of \user? directories, and the only file in any \user? subdirectory is index.php

\template is a duplicate copy of the files/dirs. in the root (with some code changes)


End users would only ever visit the following URLs:


http:/www.example.com/page1.php
http:/www.example.com/news/page1.php
http:/www.example.com/userA/index.php
http:/www.example.com/userA/page1.php
http:/www.example.com/userA/news/page1.php


Notice how there are URL's accessed under /userA that do not exist in the file system. This is because I do not want 100's of copies of these files in every users directory, I want one master template set, hence the \template directory. Likewise, end users never access any URL's under /template (they aren't even aware they exist)

This was accomplished using this rule:

RewriteCond %{DOCUMENT_ROOT}/$1/$2 !-f
RewriteCond %{DOCUMENT_ROOT}/$1/$2 !-d
RewriteCond %{DOCUMENT_ROOT}/template/$2 -f
RewriteRule ^([^/]+)/(.*)$ template/$2 [L]


This rule is working ONLY for URL of http://www.example.com/userA/page1.php. When I go to http://www.example.com/userA/news/page1.php, I get a 404.

I've tried two things so far:


#RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-f
#RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-d
#RewriteCond %{DOCUMENT_ROOT}/phmc/LO-template/$2/$3 -f
#RewriteRule ^([^/]+)/(.*)$ LO-template/$2 [L]


This one does nothing

#RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-f
#RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-d
#RewriteCond %{DOCUMENT_ROOT}/phmc/LO-template/$2/$3 -f
#RewriteRule ^([^/]+)/([^/]+)/(.*) $ LO-template/$2/$3 [L]


This one ends up causing HTTP 500 errors on every URL.


Not sure how to make this work, seems like I'm missing something basic.

[edited by: incrediBILL at 11:22 pm (utc) on Jan 31, 2015]
[edit reason] added [CODE] formatter [/edit]

lucy24

1:59 am on Jan 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ouch, what a lot of -f and -d lookups.

Frankly I'm surprised it isn't the first version that leads to a 500-class error. That's the one whose conditions involve a $3 capture that doesn't seem to exist. Can you confirm that the errors come in the second version?

:: wandering off to experiment meanwhile ::

OK, let's both exclaim "D'oh!" in unison:

RewriteRule ^([^/]+)/([^/]+)/(.*) $ LO-template/$2/$3 [L]


See the space in the middle? That's your 500-class error. Error logs list it as "bad flag delimiter"; it really means "I can't deal with these extra spaces".

:: wandering off again ::

Oddly, the server doesn't seem to care a squat whether the three captures actually exist or not. This sounds good, but may actually be bad, because it increases the chance that the !-f and !-d tests will succeed. (No such URL = no such physical file.) It may be more efficient to put the positive condition -- the one with intended -f result -- first.

[edited by: lucy24 at 2:22 am (utc) on Jan 27, 2015]

HDClown

2:20 am on Jan 27, 2015 (gmt 0)

10+ Year Member



I removed the extra space and the 500 error is gone, but still no go with the rule working.

I tested with both rules active and only with one of each active. No errors, but whenever I try to hit /userA/news/page1.php it just 404's

lucy24

2:31 am on Jan 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{DOCUMENT_ROOT}/phmc/LO-template/$2/$3 -f

Is /phmc/LO-template/ a real, physical directory that you could theoretically navigate to as
example.com/phmc/LO-template/subdir/filename
? Take a moment to verify that the rewrite works if you comment-out this condition. The other two are supposed to fail; this one's supposed to succeed. And while you're in there, put a / slash at the front of your rewrite target. It won't affect rule execution; it's just a good habit.

It's generally OK to use your real file and directory names, and sometimes it helps avoid confusion.

For testing purposes, you might add the [R] flag. Then you'll see the address bar changing and it's easier to see that things are working as intended. A 404 in response to an internal rewrite can be unhelpful, because you don't know if it's the originally requested file or the target that doesn't exist.

Once you get the rule working, you'll want to start with a RewriteCond along the lines of
RewriteCond %{REQUEST_URI} !(realfile|otherfile|thirdfile)

so the server doesn't have to perform those energy-hogging lookups on requests that you already know aren't supposed to be rewritten. You'll also want to constrain the body of the rule so it only applies to requests for pages.

HDClown

2:49 am on Jan 27, 2015 (gmt 0)

10+ Year Member



Need to clarify something since i messed up some stuff in my first post.

This is the current functioning rule, which works properly


RewriteCond %{DOCUMENT_ROOT}/phmc/LO-template/$2 -f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2 !-f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2 !-d
RewriteRule ^([^/]+)/(.*)$ LO-template/$2 [L]


Note that I changed the -f to the first condition as suggested. if I go to http://www.example.com/userA/page1.php the URL in the address bar stays the same but it's serving me http://www.example.com/LO-template/page1.php

I'm using some GoDaddy hosting with multiple sites on the same hosting Account, so it's a little funky. The doc root is something like /var/server123/html/ and has another site running on it in that root directory.

This site is running out of subdirectory, /var/server123/html/phmc/ for the site root, but there's only one doc root for the hosting account, which is why I have the /phmc/ path on the RewriteCond. There is no actual /phmc/ directory you can access via the URL for this http://www.example.com site.

Now that I've clarified that, this is the ruleset I am working with


RewriteCond %{DOCUMENT_ROOT}/phmc/LO-template/$2/$3 -f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2/$3 !-d
RewriteRule ^([^/]+)/([^/]+)/(.*) $ LO-template/$2/$3 [R]


I moved the -f match first as suggested and made it [R]

I'm still getting a 404, so it seems like the RewriteCond aren't matching.

The original rewrite ruleset was created by someone else and it worked, so I stuck with it. I figured there may be better ways to do this and I'm open to them, but I'd like to get this working with a second subdirectory level before I look at re-writing them all together.

lucy24

3:15 am on Jan 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



multiple sites on the same hosting Account

Oh, yikes, you mean multiple sites passing through the same htaccess? RewriteRules are always easier if only one site is involved. You'd better add one more RewriteCond, at the beginning of the list, looking at %{HTTP_HOST}.

So this works:
RewriteRule ^([^/]+)/(.*)$ LO-template/$2 [L]

And this doesn't:
RewriteRule ^([^/]+)/([^/]+)/(.*) $ LO-template/$2/$3 [R]
Oops! I hope you really meant (for testing purposes)
RewriteRule ^([^/]+)/([^/]+)/(.*)$ http://example.com/LO-template/$2/$3 [R,L]

The only difference between the two rules is that the first one captures in the pattern
$1 first directory
$2 rest of request
while the second one splits the capture into three parts
$1 first directory
$2 second directory
$3 rest of request

if I go to http://www.example.com/userA/page1.php ... it's serving me http://www.example.com/LO-template/page1.php

We may need someone who speaks Apache (phranque? you out there?) because I don't understand the seeming duplication of /phmc/

What I do see is that a request for
/userA/page1.php
wouldn't match the second rule, since there's only one directory and the rule looks for two. So the conditions wouldn't even be evaluated. Is it intended for the specific case of
/template/news/page1.php
from your first post? Never anything but /news/ ? If so, the pattern should say
^([^/]+)/news/(.+)
and then everywhere else in the rule and its conditions, "$1 $2 $3" becomes "$1 news $2".

Most occurrences of
.+
should probably really be
[^.]+\.php
if those are the pages you're rewriting.

HDClown

4:05 am on Jan 27, 2015 (gmt 0)

10+ Year Member



Well, after all that, it appears that there was some issue with rules that force URL's to lowercase not behaving properly now that I was trying this with another subdirectory level. I renamed all file names/paths in question to lowercase and am using this rule exclusively:


RewriteCond %{DOCUMENT_ROOT}/phmc/lo-template/$2 -f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2 !-f
RewriteCond %{DOCUMENT_ROOT}/phmc/$1/$2 !-d
RewriteRule ^([^/]+)/(.*)$ /lo-template/$2 [L]


And it works fine. If I understand correctly, since this rule does not reference specific extensioa and use (.*), it's capturing all the subdirectories I might throw at it (which is what I want). So my original rule was the only rule I needed.

lucy24

6:36 am on Jan 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



there was some issue with rules that force URL's to lowercase not behaving properly

If it's not one damn thing, it's another :)

If I understand correctly, since this rule does not reference specific extensioa and use (.*), it's capturing all the subdirectories I might throw at it (which is what I want). So my original rule was the only rule I needed.

The rule won't miss anything it is intended to get. No false negatives. But I'm concerned about false positives: how many requests will potentially match the pattern, and then not fit the rule itself after one or more server-intensive conditions have been evaluated.

Remember that mod_rewrite works on a "two steps forward, one step back" principle: conditions are evaluated only after the pattern in the body of the rule has been matched. As currently written, the body of the rule applies to all requests all the time everywhere; the only constraint is that they have to be within a subdirectory. That doesn't just mean pages. It means images, stylesheets, scripts ... everything.

That's why I suggest expressing the rule as something like

RewriteCond %{REQUEST_URI} !^/(images|scripts|styles|subdir|otherdir|not-this-dir)
{ existing conditions here }
RewriteRule ^([^/]+)/((?:[^/]+)/)*[^./]+\.php) /lo-template/$2 [L]

replacing each [^/] with [^./] if no directory names contain literal periods. That way the server doesn't have to waste time on conditions when it already knows that the rule won't apply. And put in a preliminary exclusion for non-user directories, listed by name, before you start on the -f and -d tests.

If you say (?:[^/]+)/)* in the middle-- as part of the $2 capture-- you can have any number of subdirectories after the first one.

Is "index.php" (from your first post) part of an URL that your users will request by name? If so, you may be able to leave off the -d test, since "index.php" is itself a filename.

HDClown

12:54 pm on Jan 27, 2015 (gmt 0)

10+ Year Member



I would certainly like the rule to be less intensive and be more accurate, just to avoid any wrong matches, like you say. I will test out your suggestions.

I would never specifically have users request a URL with index.php in the URL itself, as index.php is default document for every directory (if I need a default document in that directory) I'm trying to avoid using any actual filenames ending in anything else as default document, but it's possible I may need to use an .html at some point.

Since index.php is default document, it never ends up appearing in the address bar and thus I can't imagine any instance where someone would craft a URL with it listed.

I'm actually considering adding a rule that would mask off the .php extension and add a trailing slash, giving you that WordPress type URL look (all extensionless URL's), but I'm not sure if that will cause an issue with these rule we've been working on, or if I can combine it into this rule, etc.

Now that I've worked out this multiple subdirectory rule thing, I've come across another issue, related to my menu navigation, and I think I can fix it with .htaccess, as opposed to in PHP.

Any .php files under /lo-template include /lo-template/header.php which has my mnu navigation in it. All of the menu navigation links listed in /lo-template/header.php are listed as such:
<a href="section1.php">section1</a>
.

When you browse to http://www.example.com/userA/page1.php and then use the navigation menu, they work great. You end up with a URL: http://www.example.com/userA/section1.php.

Problem is, when you add in subdirectories, those subdirectories end up in the navigation links. When I load up the page http://www.example.com/userA/news/page1.php and then go to my navigation menu, the links are as such: http://www.example.com/userA/news/section1.php.

As you see, the subdirectory is part of the URL in the link. Because of the URL rewriting going on here, I can't put a static prefix (such as /) on my HREF's in /lo-template/header.php or else you end up linked back to pages in the root site, and not within the /userA directory, which I can't have.

I know one way I can fix this is with a string function within /lo-template/header.php which will check for a specific directory name in the URL (such as /news), or a more complex pattern match that looks for multiple directories, and if it sees them, add a relative path (../section1.php). But, I'm wondering if I can do this with .htaccess instead, perhaps an .htaccess file that sits in /lo-template/news or any other subdirectories under /lo-template?

lucy24

8:48 pm on Jan 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Links from rewritten files are always icky, since the user's browser "thinks" it's in a different place than it really is. Absolute links are obviously the best and simplest solution from the linking/URL point of view.

RewriteRules in supplemental directories should be seen as a last resort, because mod_rewrite inheritance doesn't behave the way you'd want it to.

Sure, you can reverse the error with a few lines in htaccess creating a, hm, retro-redirect. But you shouldn't point your own links to a redirect if you can help it-- both for the server's sake and for search engines (assuming all this content is crawled and/or indexed).

Here I think it's most appropriate to put those few lines in the php file that builds the links in the first place, as you said:
a string function within /lo-template/header.php which will check for a specific directory name in the URL (such as /news), or a more complex pattern match that looks for multiple directories, and if it sees them, add a relative path (../section1.php)

For the php output you can use either a relative path or a path that incorporates the username (known because it's already present in the URL). Whichever is more convenient for you.

Some people absolutely hate relative links, especially ones that begin in ../ But I think they're completely appropriate when you have material that exists as a package, and the package will always stay together.