Forum Moderators: phranque

Message Too Old, No Replies

htaccess rewrite issue - folders redirect to file in sub folder

         

semmelbroesel

5:00 pm on Jul 7, 2010 (gmt 0)

10+ Year Member



Hi.
I've been working on this for maybe 10 hours without success and hope someone can help me here.

Sounds simple:

I want the user to enter URLs similar to this:
www.site.com/thisuser
or
www.site.com/thisuser/
(never underestimate the stupidity of the user, so I want it to work with or without the last slash)

The folder /thisuser/ (or whatever else is entered) does not exist.

This should redirect the user to:
www.site.com/users/index.php?id=thisuser

And for some reason I can't get it to work. All I get is either Firefox or Error 500 telling me about redirecting loops.

I keep seeing this version here that should work, but for some stupid reason it just won't for me:

#Options +FollowSymlinks
Options +FollowSymLinks -MultiViews
# // found -MultiViews somewhere and tried it without success
RewriteEngine on

#RewriteCond %{SCRIPT_FILENAME} !-f
#RewriteCond %{SCRIPT_FILENAME} !-d
# // for testing purposes, I simply disabled these for now to make sure it is the next line that does the work:

RewriteRule ^([^/]+)/?$ /testredirect.php?$1 [QSA,L]

# //used /testredirect.php for now to see if it would work - tried /test/$1 before, also without success


According to one or two different forum posts I found, this should be exactly the code I want.

Why does it create these loops?

Any idea on what I can do?

Thanks so much in advance!

jdMorgan

4:30 am on Jul 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is that mod_rewrite in .htaccess is recursive. Because your output path matches your rule pattern, the rule loops.

The -f check would have prevented this, assuming that the path /testredirect.php resolves to a physically-existing file, but you commented that line out.

The code does what you say you want to do, aside from one detail: It *rewrites* requests for the URL "www.site.com/<anything>" or "www.site.com/<anything>/" to the filepath /<DocumentRoot>/users/index.php?id=<anything> as long as <DocumentRoot>/<anything> does not exist as a physical file or directory, and as long as /<DocumentRoot>/users/index.php does exist as a physical file.

To preserve your sanity, I suggest that you make two changes, the first of which will also greatly improve your site's performance:

First, require the 'users' to access example.com/users/<username> instead of putting all 'users' in your root directory. This will prevent 'collisions' between usernames and real files and directories. It will also allow you to eliminate the costly (very slow) 'file and directory exists' checks, and simply rewrite anything that starts with "/users/" to your users/index.php script unless this rewrite has already been done (loop prevention).

One line of code:
 RewriteRule ^users/([a-z0-9_\-]+)/?$ users/index.php?id=$1 

Note that the 'allowed character set' excludes periods, and therefore prevent recursion.

An alternative, if you have a unique IP address for your server, is to give each user a subdomain, and rewrite <username>.example.com/ to /users/index.php?user=<username> as long as the requested URL-path is not already /users/index.php (again, loop-prevention, but much faster than checking 'exists'). All this requires is that you configure "wildcard subdomains" in your DNS configuration.

Three lines of code:

RewriteCond %{HTTP_HOST} ^([a-z0-9\-]+)\.example\.com
RewriteCond %1 !^www$
RewriteRule ^$ /users/index.php?user=%1

My second suggestion, if you stick with the URL-path username, is to NOT accept both "/<username>" and "/<username>/". Instead, pick one or the other format, and if the wrong format is requested, redirect to the correct format.

The same would apply with the subdomain approach: if, for example, www.<username>.example.com/ is requested, redirect to <username>.example.com/

These suggestions come from several years of helping people here 'recover' from problems that occur when they are not done initially, so consider carefully... :)

Jim

semmelbroesel

5:26 pm on Jul 10, 2010 (gmt 0)

10+ Year Member



Hi, thanks for the reply! It cleared up a lot for me that I hadn't been able to figure out by myself.

Yesterday, after more putzing around, I managed to get my script to work after all:

Options +FollowSymlinks
RewriteEngine on

# EXCEPTIONS
RewriteRule ^(testredirect.php)(/.*)?$ - [L]
RewriteRule ^(products|services)(/.*)?$ - [L]

# REDIRECT RULES
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^([^/]+)/?$ /newfolder/testredirect.php?id=$1 [R,QSA,L]

That worked well for me, and any existing directories can always go into the exceptions section.

I would prefer a different system, too, for users to get to their folders, e.g. site.com/?username or even cleaner site.com/u/username/ but that was thoroughly vetoed by my boss who thinks that we should keep it as simple for the users as possible, and after spending too many years in tech support I have learned not to underestimate human stupidity, so I have to somewhat agree with it. I wish I could change it, but the boss made his stand very clear...

The good news is that this whole system is only supposed to do something for the entry page - after that, we use our regular folder and file structure, and we can also send out real links via email if needed, but every user is supposed to have one simple and easy to remember start URL, and this is it. So it is absolutely OK to have the user see the real URL once they made it to our site.

We expect thousands of users to signup eventually, and I don't know if our ISP has any way to allow us the automated creation of sub domains (and I don't know anything about automating that, either). And I sure ain't doing it all by hand ;-)

You definitely have a point in excluding characters, and I will update my script to only allow letters and numbers and underscores.

I think the exclusions section I added should help a lot in avoiding collisions with existing folders, right?

I also like the idea of redirecting /username to /username/ - if I understand you correctly, that would increase performance, right?

I think I can find the code for that somewhere else, I know I have seen it somewhere while searching.

So other than the new ideas you have given me here (redirect /x to /x/, exclude special characters), is there anything else you think I should handle in here?
Is there anything blatantly wrong in my file?

Thanks again!

jdMorgan

6:45 pm on Jul 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only that you always keep an eye out for optimizations, and mind the details...

RewriteRule ^(testredirect.php)(/.*)?$ - [L]
RewriteRule ^(products|services)(/.*)?$ - [L]
#
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^([^/]+)/?$ /newfolder/testredirect.php?id=$1 [R,QSA,L]

becomes

RewriteCond $1 !^(newfolder/testredirect\.php|products|services)(/.*)?$
RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js|pdf|xml)$
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^([^/]+)/?$ /newfolder/testredirect.php?id=$1 [R,QSA,L]

The newfolder/testredirect.php exclusion is not needed if you plan to 'keep' the newfolder level -- That is, if the rewrite/redirect target goes to a different folder, then recursion is not a problem.

The filetypes exclusion would not be necessary if you implement the restricted character-set in the RewriteRule itself, using a pattern like "^[a-z0-9\_]$" as previously suggested. Since this pattern won't accept periods, the filetype exception becomes moot.

The purpose of the filetypes exclusion (or the equivalent effect of the acceptable-character-set "no periods" restriction) is to prevent executing the "-f" and "-d" disk checks unless required. These operations are very slow and they may actually require physical accesses to your hard drive, so they should be avoided whenever possible.

Jim

semmelbroesel

10:40 pm on Jul 10, 2010 (gmt 0)

10+ Year Member



Man, that is so great!

Thank-you so much for writing AND explaining all of this!

So after the character restriction, the code should be:

RewriteCond $1 !^(newfolder/testredirect\.php|products|services)(/.*)?$
# RewriteCond $1 !\.(gif|jpe?g|png|ico|css|js|pdf|xml)$
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^([a-z0-9\_]+)/?$ /newfolder/testredirect.php?id=$1 [R,QSA,L]
# CHANGED HERE
Is that correct?
Also, in this part: [a-z0-9\_] - do I need to add the code to ignore capital letters? (I think it was /i or something like that - I'm at the wrong computer that doesn't have that tutorial on it that I found). There IS a chance that someone might try something with capital letters in the name - probably wouldn't make a difference when it gets passed on to the PHP file, but you never know...

jdMorgan

9:23 pm on Jul 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As it stands, the pattern accepts only lowercase letters, so no change is needed.

However, you can remove the exclusion for "newfolder/testredirect\.php" from the first rewritecond as well, since that URL-path contains a period, which will not be accepted by the rewriterule pattern.

Once, you get this working, be sure to remove the [R] flag, as it is undesireable for this application.

Jim

semmelbroesel

11:13 pm on Jul 11, 2010 (gmt 0)

10+ Year Member



What I meant was adding the flag "I don't care if it's capital letters or not", sorry, bad wording on my part... But if I can't find the flag again I can just add A-Z to it, done.

Good point on removing the testredirect.php file, but I should leave the newfolder part in, right?

RewriteCond $1 !^(newfolder|products|services)(/.*)?$

And in my case, [R] is actually just what we want.

We only want to offer a simply way for users to access their site, but once the user has reached his site, everything else will happen with the actual file and folder structure.
E.g.:
www.mysite.com/johnsmith/
will redirect to
www.mysite.com/newfolder/index.php?id=johnsmith
and from there link to other files like
www.mysite.com/newfolder/about.php?id=johnsmith

Yes, eventually the whole id part will go away in favor of hidden session parameters, but that's low priority right now.

Thanks again so much for your help!

g1smd

12:31 am on Jul 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[R] gives you a 302 redirect to the parameter driven filepath exposed as a URL.

What you actually need is a Rewrite, so remove the [R] flag.

semmelbroesel

3:18 am on Jul 12, 2010 (gmt 0)

10+ Year Member



Oh, OK, I must have misunderstood something in the tutorial I read...

Thanks so much for clearing that up!

jdMorgan

1:10 am on Jul 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To make a rule case-insensitive, use the [NC] or [NoCase] flag on the rule. This is 50% faster than using "[A-Za-z]" in the regex pattern.

Jim