Forum Moderators: phranque

Message Too Old, No Replies

Mod_rewrite to go from dynamic to static

Reverse of normal situation to clean up old robot spidered dynamic url's

         

lgn1

6:11 pm on Jan 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have converted my site from dynamic to static using mod_rewrite, however I am trying to get some old client bookmarked and robot accessed dynamic pages to go to a 301 redirect.

When a client(or robot) enters a url in the form:

www.example.com/cgi-bin/webscript.pl?page=rrrr&cart_id=ssss

I want a redirect to www.example.com/rrrr

Im using:

RewriteCond %{REQUEST_URI} webscript.pl
RewriteRule ^([^=]+)=([^&]+)&(.*)$ www.example.com/%2 [R=301,L]

but with no luck

The .htaccess file is under the cgi-bin directory.

The only thing I got to work, is redirecting to the home
page using:

RewriteCond %{REQUEST_URI} webscript.pl
RewriteRule ^(.*)$ www.example.com/ [R=301,L]

but as soon as i start playing around with the regular expression manipulation in the query portion of the string, everything breaks down, as if the query string is not visible to the rewriterule.

Any sugestions?

jdMorgan

6:49 pm on Jan 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In order to prevent interaction with your static->dynamic rules, and also as one way to access the query string, you need to use %{THE_REQUEST}. The query string is not directly-available for testing in RewriteRule, and must be tested using either %{QUERY_STRING} or %{THE_REQUEST} in a RewriteCond.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /webscript\.pl\?([^&]+&)*page=([^&]+)
RewriteRule ^webscript\.pl$ http://www.example.com/%2 [R=301,L]

Note that %{THE_REQUEST} is the entire request header sent by the browser:
GET /webscript.pl?prod=widget&color=blue&texture=fuzzy HTTP/1.1

The code, while appearing to be somewhat redundant, is what is required to prevent the looping mentioned above.

Jim

jd01

6:56 pm on Jan 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi lgn1,

You are close with the condition, but will need to use THE_REQUEST to match a request and prevent looping if you are still using the dynamic page(s) to serve your content from.

This can range from complicated to very complicated depending on how many variables you are passing on the dynamic pages.

The problem you are having with the query string is Apache does not treat information after the? as part of the URL, because it is technically data *not* location information.

If you do not need to use the dynamic pages to serve information, you can simply use a QUERY_STRING condition to match information after the?

RewriteCond %{QUERY_STRING} ^[^=]+=([^&]+)&
RewriteRule ^webscript.pl$ http://www.example.com/%1 [R=301,L]

By not 'end anchoring' ($) the condition, you can use the implicit 'and everything else...' rather than having to match a pattern. You can also remove the first back-reference since it does not appear you need to use it.

One of the problems you may have been running into is %1 = Condition back-refernece, while $1 = Rule back-reference.

There are some good examples around for the use of THE_REQUEST, but I am not sure if you will be needing to use it, so please, let us know how things work out, or if you get stuck...

Hope this helps.

Justin

ADDED: There is actually a nice example of THE_REQUEST above... Sometimes I think Jim checks to see if I am online and makes sure he posts his answer first =) Hey again Jim

lgn1

9:36 pm on Jan 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got the example to work with JD01 example, but not JDMorgan.

I also added a question mark after %1 to dump the query string as I don't need it, and without the question mark after %1, the query appears to get appended onto the static string.

I had to change the name of my main script also on the website, as I had two .htaccess files, one for going static to dynamic and the other for going dynamic to static. Was causing an infinite loop, on my test server, but that was an easy fix.

One thing in the first example by JDMorgan, what is the purpose of the {3,9}. I know its say match the regular expression from 3 to 9 times, but why is it important?

Thanks for the input.

jdMorgan

10:12 pm on Jan 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



[A-Z]{3,9} matches three to nine uppercase letters. This is to match the GET, HEAD, POST, PROPFIND, and the other HTTP Methods that are found in THE_REQUEST, as shown in the quote box of my first post. You could use [A-Z]+, or even just "(GET¦HEAD)" if you have no need to support other HTTP Methods.

> [...] causing an infinite loop

You might want to review my comments about looping, also in that first post.

Jim