Welcome to WebmasterWorld Guest from 3.227.233.6

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

mod_rewrite and subdirectories

     
3:19 am on Mar 18, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 18, 2004
posts:4
votes: 0


I'm having a weird mod_rewrite problem (Apache 2.0.40/Linux). I have the following physical directories:

  1. /home/me/www/app1/
  2. /home/me/www/app1/a/
  3. /home/me/www/app1/b/
  4. /home/me/www/app1/b/subdir/

From the web, these would be accessible as:

  1. http-//domain.com/app1/
  2. http-//domain.com/app1/a/
  3. http-//domain.com/app1/b/
  4. http-//domain.com/app1/b/subdir/

I have my .htaccess file in the app1 directory with "RewriteBase /app1". No problems. I have a bunch of rules like:


RewriteRule ^a/?$ a.php [L]
RewriteRule ^a/([0-9]+)$ a.php?main=$1 [L]
RewriteRule ^a/function1/([0-9]+)$ a.php?function1=$1 [L]

That works great, and I can still put graphics and stuff in that "a" directory and access them like "http-//domain.com/app1/a/graphic.jpg" or whatever, while the script /home/me/www/app1/a.php handles requests for the "a" module.

This second set of rules doesn't work so well:


RewriteRule ^b/?$ b.php [L]
RewriteRule ^b/([^/.]+)/?$ b2.php?name=$1 [L]

The first rule works fine. The second rule doesn't. If I navigate to "http-//domain.com/app1/b/subdir2" (which does not physically exist), the b2.php script works as expected. However, if I navigate to "http-//domain.com/app1/b/subdir" (which does exist), the b2.php script works BUT the "?name=$1" bit of the RewriteRule gets added to the URL in the browser's location bar, e.g., "http-//domain.com/app1/b/subdir/?name=subdir". That's not what I want, but it only happens when the subdirectory physically exists.

Looking at my logs, requests for "/app1/b/subdir2" are logged as 200, while requests for "/app1/b/subdir" are logged as 301 immediately followed by the unwanted requests for "/app1/b/subdir/?name=subdir". Why is this redirect happening, and how can I make requests for "subdir" work the same way requests for "subdir2" do?

I've been banging my head on the desk trying to figure this out, and any help (or band-aids!) would be greatly appreciated.

Thanks,
Scott

5:09 am on Mar 18, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Scott,

Welcome to WebmasterWorld [webmasterworld.com]!

The 301 external redirect is not happening because of your mod_rewrite code; All of the rewrites you posted here are internal.

You should look carefully at the action of your script to see if it modifies the server response headers, and also at any other code you may have in httpd.conf or in .htaccess files along the path to the file that exists.

Redirect, RedirectMatch, and some other directives in mod_alias can do 301 redirects.

Somebody, somewhere, is doing an external redirect, and that is what you're seeing.

Jim

6:24 am on Mar 18, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 18, 2004
posts:4
votes: 0


Thanks Jim,

Like your's, my first reaction was to suspect the script and so I've removed everything from it except for a line to dump the name variable I'm passing in from "b2.php?name=$1":


echo $_GET["name"];
exit;

...if you speak PHP. That's the only application code that is executing. Maybe there's a PHP setting that is triggering (or allowing) the redirect - like output buffering, perhaps. I'll look into that.

There are no other matching rules in my .htaccess file, and since all my rules end in [L], I feel pretty confident saying so. I'm not using mod_alias at all (in the .htaccess, anyway)... the only non-mod_rewrite directives in the file are a PHP configuration setting and some ErrorDocument handlers for 400, 401, 403, 404, and 500. There also aren't any other .htaccess files along that path.

The httpd.conf file has a DirectoryIndex directive that looks for index.php in addition to index.html and I at first thought that maybe there was an index.php script in the directory that was getting executed in addition to the intended "b2.php" script. However, there's nothing in the subdirectory other than graphics. I don't see anything else in httpd.conf that looks suspicious. There are a few Alias directives, but none that would affect these directories.

What I can't figure out is why the b2.php script is still being executed after the redirect. Isn't the execution order basically: 301 -> new request to Apache -> evaluate httpd.conf/.htaccess -> mod_rewrite -> send response? Adding the query string after the trailing slash (e.g., "/subdir/?name=subdir") should cause the request to not match the RewriteRule, but it apparently does still get matched since the script is being executed. This is the only RewriteRule that references the b2.php script, and it obviously is not part of the request itself. This is the main reason I thought mod_rewrite might be the problem. Any thoughts on that?

Thanks again,
Scott

6:45 am on Mar 18, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Adding the query string after the trailing slash (e.g., "/subdir/?name=subdir") should cause the request to not match the RewriteRule, but it apparently does still get matched since the script is being executed.

Query strings are not seen as part of the URL by mod_rewrite. They are parameters to be passed to the resource at the URL. So, adding a query string to your URL won't affect whether it matches a RewriteRule pattern. If you do want to test or manipulate query strings, it can be done by using
RewriteCond %{QUERY_STRING} name=([^&]*)
which then makes the name= value available for use in the following RewriteRule's substitution string using %1 as a back-reference.

If your exists/doesn't exist cases were reversed, I'd say you have a problem in your ErrorDocument directives -- a very common mistake is to use a canonical path (full URL), which turns everything into a 301 (or 302?) and drops the 40x or 50x server response code. But that is not the case here.

Jim

9:01 pm on Mar 18, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 18, 2004
posts:4
votes: 0


Hmm, well I guess that explains that part. I've done some more digging around in my logs, and it looks like what is happening is that Apache is seeing the request for "/app1/b/subdir", realizing that "subdir" is a directory and not a file and throwing the 301 to append a trailing slash. I'm not sure when, where, or why my .htaccess-generated query string (?name=$1) is getting appended.

I've verified that this is what is happening. Looking at my access logs, requests for "/app1/a" (as in my original post) actually result in a similar 301 redirect to "/app1/a/" which then gets internally rewritten to "/app1/a.php". Requests for "/app1/a/9" instantly return 200 and are internally rewritten to "/app1/a.php?id=9" since there is no subdirectory "9" in that "a" directory. I also have another rule nearly identical to my first "a" rule:


RewriteRule ^c/?$ c.php [L]

The difference is that there is no "c" directory, and so like the "/app1/a/9" requests, these "/app1/c" requests don't get redirected before they are rewritten.

There aren't any mod_rewrite rules anywhere in .htaccess or httpd.conf that would tell it to do this. Is there a way to short-circuit this behavior from my .htaccess file? I know the documentation in the httpd.conf talks about trailing slashes on directories when working with the Alias directive (you know, like "/icon" is not the same as "/icon/"), but it isn't practical for me to create an Alias for each subdirectory since they are created dynamically.

In the meantime, I have come up with a workaround. I've added this directive just after my RewriteBase directive:


RewriteRule ^([^\.]+)([^/])$ $1$2/ [R=301]

This normalizes all non-file requests (assuming my directory names won't contain periods, and all my file names will) to have a trailing slash. Basically, I'm forcing that 301 for all requests whether the directory exists or not. I've also added "/?" just before the "$" in all my rules. Everything seems to work now, but I'm wondering if there's a more elegant way to do this. It seems like by forcing the additional 301s, I'm creating a lot of unnecessary requests. Is there a way to just prevent the 301s that occur when the directory exists?

9:08 pm on Mar 18, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Sure, see RewriteCond for:

RewriteCond %{REQUEST_FILENAME} -d

(Your auto-redirect is currently being done by mod_dir, apparently... I forgot about that one.)

Jim
<edited> To correct environment variable name </edit>

2:15 am on Mar 19, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 18, 2004
posts:4
votes: 0


Thanks Jim, mod_dir is indeed the culprit. It looks like I'm just going to have to live with the fact that if there is a matching physical directory, mod_dir is going to fire a 301 to append the trailing slash. I don't see any way around that, but I think I can get the two modules to play nice.

I've replaced my problem rule:


RewriteRule ^b/([^/.]+)/?$ b2.php?name=$1 [L]

With this ruleset:


RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^b/([^/.]+)$ - [L]

RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^b/([^/.]+)/?$ - [L]

RewriteRule ^b/([^/.]+)/$ b2.php?name=$1 [L]

For the benefit of anyone else reading along (I know you don't need me to explain, Jim :) ). The first RewriteCond/RewriteRule pair matches requests for any physical directory that lacks the trailing slash. I let it pass through unaltered so that mod_dir can add the slash for me.

The second pair catches any request that looks like a directory (with or without the trailing slash), but isn't. It passes the request through unaltered and ultimately returns a 404. I probably neglected to mention it before, but what my script does is act as a DirectoryIndex for multiple subdirectories. If the subdirectory isn't there, then I'll let Apache return the 404 rather than unnecessarily executing my script to generate a 404.

The final rule matches anything that ends in a trailing slash and rewrites the request to my script. This should catch any legitimate subdirectory (without a period in the directory name) and let my script run its course.

Jim, thanks again for your help and getting me pointed in the right direction!

Scott

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members