Forum Moderators: phranque

Message Too Old, No Replies

Chaining multiple mod rewrite rules

Chaining multiple mod_rewrite rules

         

akatinic

4:48 am on Nov 17, 2007 (gmt 0)

10+ Year Member



Hi everyone,

I'm trying to figure out a more efficient way of doing something with mod_rewrite. I have to rewrite a series of old URLs to new URLs using regexp substitution, converting all underscores to hyphens in the process. I don't have access to httpd.conf, so everything has to be done in .htaccess and hence without RewriteMap.

The old URLs look like this: [url.tld...]

The new URLs look like this:
[url.tld...]

There are say a dozen different sections. The old section and page names have varying numbers of underscores, but the total number of underscores in an old URL is never more than six.

So far the best thing I've come up with is to convert old section names to new section names in one step, then convert underscores to hyphens in a second step, e.g.

RewriteRule old/section/name/(.*)$ [url.tld...] [R=301,L]
[followed by 11 more rules for other sections]

RewriteRule ([^_]+)_([^_\.]+).html [url.tld...] [R=301,L]
[followed by more rules covering the other possible numbers of underscores]

I like this because I only have to use 12+6=18 rules, instead of 12x6=72 rules were I to cover every possible combination of section name and underscore-hyphen rewrites in a unique rule.

What's inefficient about this (and potentially annoying to users) is that the only way I've gotten it to work is by using [R=301,L] after the first *and* second sets of rules, thus sending a partially rewriteen URL back to the browser and only giving the correct URL on the second try. Which is ugly.

I have tried removing the flags from the rules in the first stage, but when I do that I end up with a completely unrewritten URL. How do I get it to continue rewriting after the first stage, and only send the fully rewritten URL back to the user when it's finished?

Many thanks,
akitinic

jdMorgan

4:34 pm on Nov 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try changing the first rule(s) to internal rewrites with no [L] flag. It is possible, however, that this approach may trigger a bug in Apache mod_rewrite [searchengineworld.com]. If this happens, you will see part of the URL-path 'duplicated' at the end of the rewritten URL-path. There is a work-around for this but it is rather ugly. So test the simple approach first.

RewriteRule ^old/section/name/(([^_]+_)+[^_\.]+\.html)$ this-is-the-new-section-name/$1
RewriteRule ^([^_]+)_([^_\.]+)\.html$ http://www.example.com/$1-$2.html [R=301,L]

Note also that I've introduced the generalized underscored-URL-path pattern to the section-name rewrite rule so that you can be sure that any URL-path that gets internally rewritten by this first rule will later get externally redirected by the second. This avoids a potential duplicate-content problem that would arise if only the first rule were applied to the requested URL.

Jim

akatinic

9:45 pm on Nov 18, 2007 (gmt 0)

10+ Year Member



Jim,

Many thanks for the suggestions. I tried leaving the [L] flag off the first set of rules, and it didn't seem to continue correctly to run the second set. In the end, I got the following to work:

# 1. Rewrite underscores to hyphens.
RewriteRule ^(^([^_]+)_(.+)$ $1-$2 [L]

# 2. Rewrite section names.
RewriteRule ^old-section-name(.*)$ new-section-name$1 [R=301,L]
# Followed by additional section-specific rules.

I only used rule #1 once, but with the [L] flag (a) Apache reruns the single rule until it's replaced all underscores in a URL with hyphens and (b) it only generates a 301 when it gets to rule(s) #2. This makes no sense to me, as I though [L] was supposed to terminate rewriting, not redo it. It didn't work without the flag, and it didn't work with the [N] flag!

Happy I got it working, but a bit confused,
akatinic

akatinic

9:47 pm on Nov 18, 2007 (gmt 0)

10+ Year Member



Oops. Typo in rule #1 above. Should be:

RewriteRule ^([^_]+)_(.+)$ $1-$2 [L]

jdMorgan

12:13 am on Nov 19, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The [L] flag does stop rule processing, but only for the current pass through the .htaccess file. The factor that's missing is one that isn't explicitly documented, and that is that when processing .htaccess files, Apache re-runs the mod_rewrite code if any rewrites are invoked, so that access controls on the new URL can be checked. This is needed to prevent lower-level .htaccess files from being used to get access to higher-level files by rewriting URLs to avoid access control restrictions.

The [N] flag may not have worked because of the bug I mentioned, which can cause the rewritten URL-path to become 'corrupted' as described.

Jim

akatinic

2:28 am on Nov 19, 2007 (gmt 0)

10+ Year Member



I didn't know that, but it makes sense...and may explain other odd mod_rewrite behavior I've observed. Thanks for all the help. --akatinic