Forum Moderators: phranque

Message Too Old, No Replies

rewrite trouble from inside a folder

         

omoutop

3:35 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hello all and thanks for any tips

I have a main htaccess file in the root directory.
In there, there is a rule:
RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^-]+)-foo-([^.]+).htm$ folder/some_page.php?var1=$1&var2=$2&var3=$3&var4=$4&var1=$5 [nc, L]

Now i want inside a specific folder to create the following rewrite rule (apply only from a certain folder and deeper in its own htaccess file)

RewriteRule ^folder3/some-page.htm$ folder3/some-page-1.htm [R=301]

The folder3 is 2 folders deep.

So based on the first rule, i create one rewrite:
folder1/folder2/folder3/some-page-1.htm

And the second rule is needed to redirect an existing htm page to a dynamic one.

I hope this makes sence.

The second rule works not.
I understand that i must put part of the first rule inside the htaccess of the specified folder, but everything fails.

The second (folder-side) htaccess is:
RewriteEngine on
#rule here

jdMorgan

4:58 pm on Feb 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You should put the redirect in the top-level .htaccess file, and place it before the internal rewrite.

If you do not do this, then the internal rewrite will occur first, and then, if the newly-rewritten path matches the internal rewrite rule in the lower-level directory, the resulting external redirect will expose your internal filepath as a URL to the client. This will likely make a mess of your search engine listings.

An alternative is to set "RewriteOptions none" in the lower-level directory's .htaccess file. But you will then have to reproduce all rules from the top-level .htaccess file in this lower-level .htaccess file that you want to apply to this lower-level directory, because it will no longer 'inherit' those top-level rules (See RewriteOptions Inherit in the Apache mod_rewrite documentation).

As most sites have dozens (possibly hundreds) of rules which should be applied to all requests (e.g access-control and canonicalization rules), this could result in a huge maintenance nightmare...

When viewed overall, all external redirects must come first, ordered from most-specific to least-specific, followed by all internal rewrites, again ordered from most-specific to least-specific. This prevents unexpected operation, "chained" multiple redirects, and exposure of your internal filepaths as URLs.

Jim

omoutop

7:22 am on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should put the redirect in the top-level .htaccess file, and place it before the internal rewrite.

Won't this affect performance? We are talking about 1000 rules to be added that way.
Thats why i wanted to split the rules in seperate folders - with each individual htaccess to have 10-15 rules.

If there are no performance issues, then yes... all rules in one file is easier to maintain/control

jdMorgan

3:43 pm on Feb 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Of course there is a performance effect. You may be able to optimize things, depending on common attributes of the URLs to be redirected. But here, the choice is between three bad options:
  1. Put the redirects in subfolders, where they will either expose internally-rewritten filepaths as URLs due to execution order, or require that you disable 'RewriteOptions Inherit' and reproduce, modify, and maintain an additional copy of all rules in the main .htaccess file which might apply to this subdirectory.
  2. Suffer a moderate to slight performance reduction by putting the rules in the main .htaccess file, depending on your rule optimization efforts and their effectiveness.
  3. Give up the idea of redirecting these URLs, and keep this situation in mind the next time you're considering changing any URLs. URLs should never change, and URLs do not have to change just because the filenames change. If, over time, more than 1% of your URLs have ever changed, then that's too many.


As an example of "optimization," consider the following two alternative code snippets:

# 100 brute-force redirects
RewriteRule ^folder3/some-page1\.htm$ http://www.example.com/folder3/some-other-page-1.htm [R=301,L]
RewriteRule ^folder3/some-page2\.htm$ http://www.example.com/folder3/some-other-page-2.htm [R=301,L]
...
RewriteRule ^folder3/some-page99\.htm$ http://www.example.com/folder3/some-other-page-99.htm [R=301,L]
RewriteRule ^folder3/some-page100\.htm$ http://www.example.com/folder3/some-other-page-100.htm [R=301,L]

In this case, 100 rules are processed for every request to the server.


# Skip 100 brute-force redirects unless 'folder3' requested
RewriteRule !^folder3/ - [S=100]
#
# 100 brute-force redirects
RewriteRule ^folder3/some-page1\.htm$ http://www.example.com/folder3/some-other-page-1.htm [R=301,L]
RewriteRule ^folder3/some-page2\.htm$ http://www.example.com/folder3/some-other-page-2.htm [R=301,L]
...
RewriteRule ^folder3/some-page99\.htm$ http://www.example.com/folder3/some-other-page-99.htm [R=301,L]
RewriteRule ^folder3/some-page100\.htm$ http://www.example.com/folder3/some-other-page-100.htm [R=301,L]

In this case, only one rule is processed for most requests to the server, and an average of 50 rules is processed for every request starting with "folder3". This can be improved a bit more by putting the redirects in order based on the most-requested to least-requested "folder3/" URLs (check your server 'stats').

The second option works well, as long as all of the URLs have something in common, such as starting with "folder3" in this case, and as long as you correctly count the number of rules to be skipped -- and maintain this count if the number of rules ever changes.

Then there's this approach, which doesn't require creating or maintaining a 'skip rule' but is complex, and best maintained only by detail-oriented people who are very comfortable with mod_rewrite:

RewriteCond $1>some-other-page-1.htm ^some-page1\.htm>(.+)$ [OR]
RewriteCond $1>some-other-page-2.htm ^some-page2\.htm>(.+)$ [OR]
...
RewriteCond $1>some-other-page-99.htm ^some-page99\.htm>(.+)$ [OR]
RewriteCond $1>some-other-page-100.htm ^some-page100\.htm>(.+)$
RewriteRule ^folder3/(.+)$ http://www.example.com/folder3/%1 [R=301,L]

Here we take advantage of the fact that RewriteConds are not processed unless the RewriteRule pattern matches, and that RewriteConds can test against values composed of both fixed strings and server variables.

The stuff on the 'left side' of each RewriteCond is the 'replacement' URL-path and a back-reference to the requested URL-path-part captured by the RewriteRule pattern. The stuff on the right side is a generic pattern to 'capture' the replacement URL-path-part and create a back-reference (%1) to it for use in the RewriteRule substitution, plus a pattern to match the back-reference to the requested URL-path-part.

The ">" character is arbitrary, and functions only as a delimiter to allow the RewriteCond values be parsed quickly and reliably. Any "rare" character which cannot be sent in a URL without being URL-encoded will do here. You could use "~" or "<" just as well.

Again, the efficiency of this method can be improved by ordering the RewriteConds from most-requested URL-paths to least-requested based on your stats.

Note that the last RewriteCond must not have an [OR] flag on it!

This method allows you to build a 'translation table' with "the new sub-paths on the left and the old sub-paths on the right." But like I said, this kind of thing needs to be maintained by someone who understands it and can get all the details right every time.

If you have server config-level access, a final --and much better option-- would be to use a RewriteMap, triggered by a request for a specific folder path so that it only executes when needed.

Jim