Welcome to WebmasterWorld Guest from 54.145.118.24

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Query strings in RewriteMap

     
12:18 am on Jan 21, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2009
posts:57
votes: 0


I'm working on adding a lot of 1:1 redirects for a site migration and the list I've been given contains a lot of URLs with query strings they'd like redirected. I'd like to use a single rewrite map file containing all URLs with and without query parameters. I came across the solution here [stackoverflow.com] which appears to be the same thing I'm after. Here is an example of the URLs in my redirect map list:


/ --> http://www.example.com/us/en/main.html
/?ps=1 --> http://www.example.com/us/en/main.html
/browse.cfm?prdID=99ABCDE-88FG-77HI--66JKLMNOP --> http://www.example.com/shop/catalog/item/12345
/pdfs/iteam-A.pdf --> http://www.example.com/shop/catalog/item/112233
/Products/View.cfm?cat=ItemClass&ID=0123ABC --> http://www.example.com/us/en/main/category1/subcategory1/productDetail.html
/previews/?d=7788 --> http://www.example.com/us/en/main/category1/subcategory2/productDetail.html
/Products/Search.cfm?prdID=88900ABCDEF-77GH-66IJ-55KLMNOP --> http://www.example.com/us/en/main/category1/subcategory2/productDetail.html


I've tried using the below in my config, but I keep getting a redirect loop with and without a query string. I know I don't have the proper conditions to capture all the query string URIs in my example, but would expect the one I'm working on to work. Any suggestions on what I should try next?


RewriteEngine on
RewriteMap rdm txt:/apps/httpd-2.4.10/conf/extra/redirects/rdm.txt

#Base URL contains a query string
RewriteCond %{HTTP_HOST} ^.*test.com
RewriteCond %{REQUEST_URI} ^/(b|B)rowse.cfm$
RewriteCond %{rdm:%1?%{QUERY_STRING}} ^(.*\?.*)$
RewriteRule ^.*$ %1 [R=301,L]

#Base URL does not contain query string
RewriteCond %{HTTP_HOST} ^.*test.com
RewriteRule ^.*$ ${rdm:$1} [R=301,L]
8:01 am on Jan 21, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13681
votes: 446


RewriteRule ^.*$ %1 [R=301,L]

Likeliest culprit: You're missing a ? at the end of the target to get rid of the existing query string. Otherwise the condition keeps being true forever. The targets are supposed to end up having no query, right? Otherwise there doesn't seem much point.

As long as we're here...

RewriteCond %{HTTP_HOST} ^.*test.com

The ^.* isn't needed. In fact the Condition as a whole isn't needed unless more than one site is passing through this same section. Are these RewriteRules lying loose in the config file, or in a <Directory> section limited to one site?

RewriteCond %{REQUEST_URI} ^/(b|B)rowse.cfm$

Never put something in a Condition that can go in the body of the rule. The server has to waste time evaluating conditions on every single request ever, when it could just take a quick look at the pattern in the Rule itself.

RewriteCond %{rdm:%1?%{QUERY_STRING}} ^(.*\?.*)$

Don't use non-final .* or .+ if you can possibly help it. What is this line intended to do? It seems a bit recursive.

RewriteRule ^.*$ %1 [R=301,L]

See above about Conditions and body of rule. It would be far more efficient to express the rule as
RewriteRule /(b|B)rowse\.cfm$ http://www.example.com/ {etcetera} [R=301,L]

(Replace the leading / in the pattern with the actual path, whatever it is at this point.)

Always include protocol-plus-domain in the target of an external redirect-- unless this part is variable and therefore has to be part of the RewriteMap. This doesn't seem probable. If protocol-and-domain are always the same, it makes no sense to repeat them over and over in the RewriteMap instead of just saying them once in the rule target.

#Base URL does not contain query string

But the rule doesn't say so. You need a RewriteCond that says
RewriteCond %{QUERY_STRING} !.

meaning "there ain't none".

The first rule is for a specific URL, so that goes in the body of the rule. Is the second rule for all (other) URLs, or only for pages? How do you stop the second rule from executing on requests that have previously been redirected via the first rule? Seems like you'd need at least one more Condition.
9:36 pm on Jan 21, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2009
posts:57
votes: 0


My setup is just a blank Apache 2.4.10 install where I'm using my local hosts file to point the DNS I'm working on these redirects for to it as a POC. All of the configs are in a virtual host block. I modified the RewriteRule for the block containing query strings to include the ? like below, but no luck. I actually did have the QUERY_STRING is null line in the second rewrite block initially and it did get cutoff when I pasted by accident.
RewriteRule ^.*$ %1? [R=301,L]


The purpose of this setup is to handle the sunset of a large site where all products are being moved to another site. As of right now they've given me a list of about 10K URLs containing query strings they want redirected to specific pages on the new site. This list will likely grow. There's only a handful of URLs that can be handled with a single rule based on the pattern and don't contain a query string. If I can't use the rewrite map, my other choice is to implement each of these URLs as a RewriteCond QUERY_STRING/RewriteRule combo. Regardless of my approach I have no way to avoid every single request being matched against the rules for query strings since that's the majority of the redirects. It's really frustrating how mod_rewrite just won't go a little deeper to include matching the query strings in rewrite maps by default. As an example, they took the individual rewrite rule line approach on a previously sunset site and that redirects config file has over 155K lines and over 100K of those lines are unique RewriteCond {QUERY_STRING}/RewriteRule entries.
10:23 pm on Jan 21, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13681
votes: 446


The question is: will the target URLs contain query strings? Matching the pattern doesn't seem to be a problem.

On the live site, will the old and new hostnames be on the same server, so both requests pass through the same ruleset? Or is this only an issue while you're testing locally?

There's always Option B: rewrite to a php script that assesses all parts of the request before issuing the redirect. php-- or other language of your choice-- can be a good deal more muscular than a RewriteMap. I don't think RewriteMaps were ever meant to give more than one-to-one correspondences: input A yields output B. They're not suited for assessing If/Then conditions, let alone anything more nuanced.

I have no way to avoid every single request being matched against the rules

Generally when I talk about "every single request" I really mean "request type". At a minimum, it should be possible to constrain the rule to requests for pages. Admittedly, if the whole site is moving, your only non-page requests will be from search engines-- but you can serve those a comprehensive 410 before the remaining rules kick in.
10:50 pm on Jan 21, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2009
posts:57
votes: 0


The destination URLs contain no query strings. Only the old URLs we want to redirect will contain query strings. The redirect configs will go on a server pretty much dedicated to redirects and send to new servers on a different domain. I haven't given any thought to using an external script since PHP isn't an option here and my coding skills are limited to shell scripting mostly.
11:32 pm on Jan 21, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2009
posts:57
votes: 0


I found this old post here [webmasterworld.com] and I think it got everything I needed! Below are the configs I'm using now and all are testing successfully! No idea why this wasn't coming up in my google results earlier. I know it's not the most efficient, but at this point I have something to work with.


RewriteEngine on
RewriteMap rdm txt:/apps/httpd-2.4.10/conf/extra/redirects/redirect_map.txt

#Base URL contains a query string
RewriteCond %{QUERY_STRING} !=""
RewriteCond %{REQUEST_URI} !=""
RewriteCond ${rdm:%{REQUEST_URI}?%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
RewriteRule ^.*$ ${rdm:%{REQUEST_URI}?%{QUERY_STRING}}? [R=301,L]

#Base URL does not contain query string
RewriteCond %{QUERY_STRING} !.
RewriteCond %{REQUEST_URI} ^(.*)
RewriteRule ^.*$ ${rdm:%1} [R=301,L]
11:35 pm on Jan 21, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13681
votes: 446


Question I should have asked sooner: Is the infinite loop internal (ErrorLogs and 500-class error) or external (repeated requests ending in browser message)? I'd assumed external, but I should double-check.

So far, this is happening on a test server, right? No live content? That's a big help because you can set both LogLevel (error logs) and RewriteLogLevel (exact name may depend on Apache version) to the highest possible level without fear of affecting performance.

Post a few log snippets (Error or Access, whichever is relevant) and we'll try to figure it out. You can use the real file and directory names; replace any hostname with "example.com". Or example.some-other-tld if you need to name more than one.
12:17 am on Jan 22, 2015 (gmt 0)

Junior Member

5+ Year Member

joined:Apr 3, 2009
posts:57
votes: 0


I think I responded right before you did, Lucy. Take a look at my previous message and let me know your thoughts. I always forget to explicitly say it, but thanks a lot for all the advice you provide here. I know you take a lot of time to help out everyone here.
10:41 pm on Jan 26, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13681
votes: 446


Whoops! Thanks to those overlaps, I didn't notice the follow-up posts.

RewriteCond %{QUERY_STRING} !=""
RewriteCond %{REQUEST_URI} !=""
RewriteCond ${rdm:%{REQUEST_URI}?%{QUERY_STRING}|NOT_FOUND} !=NOT_FOUND
RewriteRule ^.*$ ${rdm:%{REQUEST_URI}?%{QUERY_STRING}}? [R=301,L]

RewriteCond %{QUERY_STRING} !.
RewriteCond %{REQUEST_URI} ^(.*)
RewriteRule ^.*$ ${rdm:%1} [R=301,L]


Yowzuh. That should work. But note that the form
!.
in the second rule can perfectly well be used in the first rule too:

RewriteCond %{QUERY_STRING} .
RewriteCond %{REQUEST_URI} .


... except what does that second condition mean? Under what circumstances would a RewriteRule ever be invoked if there was no request? It's like prefacing every paragraph of a post with "If you are breathing..." I assume you don't mean "If the request is for something other than the root" since that would be a different pattern.

Alternatively, you should be able to capture the request-- just the URI/path part-- and feed it back into your map as $1 in the same way you use %1 in the first rule. Don't know if it ends up making any difference.

If everything is being redirected to a different server, it seems as if the second rule doesn't need conditions at all. It just catches the overflow, right?

Finally:
RewriteRule ^.*$ etcetera

If you're not capturing, all you need here is
RewriteRule .? etcetera

or
RewriteRule ^ etcetera

(with no text-to-match at all). The second form gives me the fantods, but that's just me.

That brings us back to: Is this rule located in a place where it will only, ever, encounter requests for pages? It still seems as if non-page requests, if they occur, should be handled differently.

thanks a lot for all the advice you provide

I keep waiting for the moment you figure out that I don't actually speak a word of Apache. I just know regular expressions.