Forum Moderators: phranque

Message Too Old, No Replies

indeterminate parameter redirect

dynamic to static url with 301 redirect and "random" parameter order

         

vinnydtm

10:40 pm on Jul 20, 2007 (gmt 0)

10+ Year Member



Hi,

I've read the thread Changing dynamic to static URLs [webmasterworld.com], and some of the articles on rewrites here. That is essentially what I need to do. But there is one twist that I can't quite figure out.

In the thread, the example seems to assume your parameters will always be in the same order:


http://example.com/mypage.html?p1=v1&p2=v2&p3=v3

However, in my case, there is no such guarantee. So what I need to do is that for all of the following (for example):


http://example.com/mypage.html?p1=v1&p2=v2&p3=v3
http://example.com/mypage.html?p2=v2&p3=v3&p1=v1
http://example.com/mypage.html?p1=v1&p3=v3&p2=v2&

(I could have a trailing "&" as in the last example) need to map the SAME 301 redirect:

http://example.com/v1/v2/v3/mypage.html

Does this mean I need some kind of chain of RewriteRules? I can do this if I am only picking out one parameter, but the added parameters seem to put in a bit of twist (which may not be, but I'm new at this).

To add to this, the result may be a mixed URL. That is, I can have:


http://example.com/mypage.html?p2=v2&p3=v3&p1=v1&p4=v4

or

http://example.com/mypage.html?p4=v4&p2=v2&p3=v3&p1=v1

need to end up as:


http://example.com/v1/v2/v3/mypage.html?p4=v4

That is, for those remaining parameters I'm not picking out, they must remain as part of the query (and those that I've picked out need to be removed from the query).

Is this possible with the mod_rewrite? Or really should be looking at a completely different solution?

Thanks!

Vince

[edited by: jdMorgan at 4:19 am (utc) on July 21, 2007]
[edit reason] exampe.com [/edit]

phranque

11:08 pm on Jul 20, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, vince!

i would internally rewrite the url to a "canonical" form:

http://www.example.com/mypage.html?p1=v1&p2=v2&p3=v3&otherparams=othervalues

using 3 steps to move p3 then p2 then p1 to the front of the query string and then externally redirect to the "static" url:
http://www.example.com/v1/v2/v3/mypage.html?otherparams=othervalues

you can access the query string using RewriteCond and backreference grouped parts with %N syntax in subsequent RewriteCond/RewriteRule directives.

jdMorgan

4:06 am on Jul 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that's a mess you're in...

Here's what I'd try:


RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p1=([^&]+)((&[^&]+)*)&?$
RewriteRule ^mypage\.html$ %3/mypage.html?%1%4 [C]
RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p2=([^&]+)((&[^&]+)*)$
RewriteRule ^([^/]+)/mypage\.html$ $1/%3/mypage.html?%1%4 [C]
RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p3=([^&]+)((&[^&]+)*)$
RewriteRule ^([^/]+)/([^/]+)/mypage\.html$ http://www.example.com/$1/$2/%3/mypage.html?%1%4 [R=301,L]

Hopefully, you won't fall victim to a nasty Apache mod_rewrite bug ( See http://archive.apache.org/gnats/7879 [archive.apache.org] ) which affects sequential rewrites. The bug was supposed to be fixed in Apache 2.x, but my testing indicates that it remains. If you do encounter this bug, then you'll have to use a rather complicated approach to avoid it. The work-around involves setting environment variables instead of repeatedly rewriting the URL at each step, and we can discuss that approach if it's necessary -- It works, but it's ugly: This thread [webmasterworld.com] demonstrates the technique.

Jim

phranque

9:57 pm on Jul 21, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



vin:

i'd like to change my answer to what he said...

vinnydtm

5:16 pm on Jul 26, 2007 (gmt 0)

10+ Year Member



Thanks for the pointers. I don't think I have the bug, but I was working on another set of rewrites (yeah, all new to this and the next month or so is mostly Apache rewrite work!) and checked out that other link on fixing duplicate content & URL issues [webmasterworld.com]. (I needed to replace all occurrences of non-alphanum with "-" and then redirect, but that's another topic).

May be just my programmer background but I tried the implementation like this, using env var:
(recap, more precisely, mapping

http://example.com/search/?p2=v2&p1=v1&p4=v4&p3=p3&
http://example.com/search/?p1=v1&p4=v4&p3=p3&p2=v2
http://example.com/search/?p1=v1&p2=v2&p3=p3&p4=v4

all to

http://example.com/search/p1/p2/p3?p4=v4


#-- process only the ones start with "/search", skip for
#-- other types
RewriteCond %{REQUEST_URI}!^/search[/]*$ [NC]
RewriteRule .? - [S=4]

#-- extract the parameters
RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p1=([^&]+)&?(([^&]+&?)*)&?$ [NC]
RewriteRule .? - [E=pV1:%3,E=myQS:%1%4]

RewriteCond %{ENV:myQS} ^(([^&]+&)*)p2=([^&]+)&?(([^&]+&?)*)$ [NC]
RewriteRule .? - [E=pV2:%3,E=myQS:%1%4]

RewriteCond %{ENV:myQS} ^(([^&]+&)*)p3=([^&]+)&?(([^&]+&?)*)$ [NC]
RewriteRule .? - [E=pV3:%3,E=myQS:%1%4]

#-- reassemble in single redirect
RewriteRule . [%{HTTP_HOST}...] [R=301,L]

This seem to work and seems cleaner and clearer. Basically I'm extracting the parameters I need, then reassemble for redirect. If I'm writing a program, this might be what I'd do. Also may have some "default" variable set, like:


#-- default value
RewriteRule .? - [E=pV3:defaultV3]

#-- extract the parameter
RewriteCond %{ENV:myQS} ^(([^&]+&)*)p3=([^&]+)&?(([^&]+&?)*)$ [NC]
RewriteRule .? - [E=pV3:%3,E=myQS:%1%4]

Now, a couple more questions:

1) Is there any reason for choosing the first way (with chaining) versus the env variable? Like performance concerns (as I'm dealing with a potentially busy site)?

2) I notice in the "catch all" RewriteRule, sometimes "." is used and sometimes ".?" is used. I understand the regex as one being any one char (.) and the other could be nothing at all? (.?) Are there reasons for using one over the other (where in our case here we really isn't matching rather than saying (apply this rule)?

Thanks!

Vince

[edited by: jdMorgan at 6:41 pm (utc) on July 26, 2007]
[edit reason] Disable BBcode smilies [/edit]

jdMorgan

6:40 pm on Jul 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1)
Only performance testing under load can answer this.

2)
The "." pattern means at least one character, but since it's un-anchored it will match one or more characters.
The ".*" pattern means match zero or more characters.

Examining the context of the rules you're referring to --some with "." and some with ".*-- you will find that some have RewriteConds that will be false if there isn't at least one character in the URL-path, while others will match either way. Since RewriteConds are not processed unless the RewriteRule pattern matches (See "Rule Processing" in the Apache docs), we might as well use the RewriteRule pattern to avoid parsing the RewriteCond(s) if there is no chance that the RewriteCond(s) will match.

Regarding your exact implementation, Id say that

 RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p1=([^&]+)((&[^&]+)*)&?$ [NC] 

would be more efficient and less ambiguous than
 RewriteCond %{QUERY_STRING} ^(([^&]+&)*)p1=([^&]+)&?(([^&]+&?)*)&?$ [NC] 

Jim

[edited by: jdMorgan at 6:40 pm (utc) on July 26, 2007]