Forum Moderators: phranque

Message Too Old, No Replies

Weird Issue with trailing slash

         

nickCR

10:13 pm on Jul 12, 2007 (gmt 0)

10+ Year Member



Hello Forum,

I am having one very strage issue with adding a trailing slash after a directory.

Since I am using .htaccess rewrites for most of my urls on the site example 1 (mydomain.com/directory1/directory2) is not the same as example 2 (mydomain.com/directory1/directory2/). The rewrite I have works fine for these cases however there is a problem when I request directory1 without the trailing /.

The .htaccess is located in the root of the directory1 and I have the following code which when I take out of the my .htaccess the directory1 adds the slash and works perfectly. According to the code below it should be affected by this code, however it is trying to load mydomain.com/directory1//home/public_html/directory1/. Kinda doesn't make sense to me?

RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_URI}!.html
RewriteCond %{REQUEST_URI}!.php
RewriteCond %{REQUEST_URI}!(.*)/(.*)/$
RewriteRule ^(.*)/(.*)$ http://example.com/directory1/$1/$2/ [R=301,L]

RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_URI}!.html
RewriteCond %{REQUEST_URI}!.php
RewriteCond %{REQUEST_URI}!(.*)/$
RewriteRule ^(.*)$ http://example.com/directory1/$1/ [R=301,L]

Also what is preferrable, R=301,L or L,R=301?

I actually will need the rewrite to add this trailing slash for any depth on my site. Is there a method of doing it without having to make a case for each depth?

Thank you kindly for your time,

Nick

[edited by: nickCR at 10:56 pm (utc) on July 12, 2007]

jdMorgan

11:56 pm on Jul 12, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure I fully understand the problem, but let's fix the "any number of subdirectories" problem, and re-arrange and clean up the code for efficiency:
 
# If Requested URI does not end with ".html", ".php", or "/"
RewriteCond %{REQUEST_URI} !(\.html¦\.php¦/)$
# and if it does not resolve to an existing file
RewriteCond %{REQUEST_FILENAME} !-f
# append a trailing slash
RewriteRule ^(([^/]+/)*)(.*)$ http://example.com/directory1/$1$3/ [R=301,L]

For efficiency, always put off "file exists" and "directory exists" checks and remote hostname lookups until last; They are very inefficient and should be avoided if possible, so we put all the other conditions ahead of these checks.

Replace the broken pipe "¦" characters with solid pipe characters before use; Posting on this forum modifies the pipe characters.

To resolve nested parentheses (as above) to back-references, count the left parentheses.

The order of the flags [R=301,L] makes no difference. I prefer the order shown here, since it corresponds to "time and order of application" and to the examples given by the author of mod_rewrite, and that's how I learned to order them. Otherwise, it is wholly a matter of style.

Jim

nickCR

4:48 pm on Jul 13, 2007 (gmt 0)

10+ Year Member



Jim,

I really liked your response, thank you.

I would like to ask for your help understanding this part of the code:

^(([^/]+/)*)(.*)$ http://example.com/directory1/$1$3/

Don't quite capture the meaning of ([^/]+/), it looks like it has something to do with the slash but not really certain.

I'm more thrown off by the *)(.*) which should in my experience make $1$2 not $1$3. I just don't understand why $2 isn't used?

Good to know the [R=301,L] can be ordered either way.

I didn't know you could make several cases on one line however it makes writing these rules much more efficient as you explained. With this knowledge i'll attack some of the other longer than necessary rules I have.

Can you suggest any good articles on "optimizing" the .htaccess code, in one file I have over 200 lines of rules which I feel may be either longer then necessary or in the wrong order.

Thanks Again.

Nick

phranque

6:29 pm on Jul 13, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I would like to ask for your help understanding this part of the code:

^(([^/]+/)*)(.*)$ http://example.com/directory1/$1$3/

Don't quite capture the meaning of ([^/]+/), it looks like it has something to do with the slash but not really certain.

I'm more thrown off by the *)(.*) which should in my experience make $1$2 not $1$3. I just don't understand why $2 isn't used?

[^/] means any non-'/' character.
([^/]+/) means one or more non-'/' characters followed by a single '/' character in a marked expression or block.

the marked expressions are labeled by counting left parentheses and can be nested.
$1 is (([^/]+/)*)
$2 is ([^/]+/)
$3 is (.*)

sc112

6:45 pm on Jul 13, 2007 (gmt 0)

10+ Year Member



Should there be a / in front?

^/(([^/]+/)*)(.*)$ http://example.com/directory1/$1$3/

jdMorgan

6:47 pm on Jul 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The pattern ^(([^/]+/)*)(.*)$ means, "Match one or more characters not a slash, followed by a slash, and accept any number (including zero, but as many as possible) of those 'not-slash-then-slash' sequences, followed by any number of any characters (including zero), but not ending in a slash."

This final "not ending in a slash" is implicit in the construct: If the final URL-part did end in slash, all of its characters would have been matched into the preceding part of the pattern, leaving the last subpattern with nothing to match, and $3 blank as a result. And the "but as many as possible" is implicit in the behaviour of the "*" quantifier, which matches zero or more of anything, but is 'greedy' and will always match as many as possible.

So, the outer parentheses (back-referenced as $1) will therefore contain the entire directory-path up to and including the last slash before any "filename" (any final substring not ending in a slash). The inner parentheses, which could be back-referenced as $2, would contain only the last subdirectory level found, which is why it isn't used. That's also why I said "count left parentheses to determine the back-references."

I'm not aware of any books or articles on optimizing mod_rewrite. I developed my opinions from reviewing mod_rewrite's source code, from understanding how regular expressions are processed, and from having written command-parsing routines many years ago. Because of the complexity of mod_rewrite and regular-expressions, because of the millions of ways it might be applied, and because of the difficulty in even trying to name some of the elements and concepts involved, writing about it might be a daunting task. So mostly, you'll just find general rules of thumb on the subject. I believe in "making the computer do the work" and prefer readable code over maximally-efficient code -- as long as it's reasonable efficient. Therefore, I limit my efficiency crusade to warnings about multiple ".*" subpatterns in patterns, avoiding unexpected rule recursion, and using start- and/or end-anchoring whenever possible to avoid ambiguity.

Perhaps a Google search may turn up something useful.

Jim

jdMorgan

6:50 pm on Jul 13, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Should there be a / in front?

Only for use in httpd.conf, conf.d, or some other server-config-level file. Never in .htaccess, where the path to the current directory is always stripped from the URL-path seen by RewriteRule. See note concerning "full URL-path" in RewriteRule documentation notes section.

Jim

nickCR

6:15 pm on Jul 14, 2007 (gmt 0)

10+ Year Member



Thank you all for your responses on this matter.

This will match to any directory level correct?

Let me see if I understand all this right, please correct me if i'm wrong.

Instead of using (.*)(.*) which would match anything we use ([^/]+/) which specifically searches for anything that "doesn't" include a back-slash thus the [^/]. The +/ from what I understand allows a / within the string as long as it's not on the end, which allows this to be used for any level on the site.

Just trying to clearly understand this sytax so i'm not just "copying & pasting"

Thanks again.

Nick

nickCR

6:17 pm on Jul 14, 2007 (gmt 0)

10+ Year Member



one more question. I have several .htaccess files on different levels of the site.

They do not load one then the other right? For example:

if i'm in root it will load the .htaccess in the root folder.

if i'm in root/dir1/ it will load the .htaccess in dir1 not the .htacess from root and .htaccess from dir1?

jdMorgan

7:31 pm on Jul 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



See the Apache mod_rewrite documentation of the RewriteOptions [httpd.apache.org] directive for the answer to your question about multiple- .htaccess file application.

As for the regex pattern, I described it fully above. For more information about regular expressions, see the tutorial cited in our forum charter [webmasterworld.com].

Jim

nickCR

8:12 pm on Jul 14, 2007 (gmt 0)

10+ Year Member



Thanks Jim and others for all your help I will read more to understand it better.