Forum Moderators: phranque

Message Too Old, No Replies

Removing part of a query string

Yet another rewrite-rule request for a fresh look

         

AlexK

7:10 am on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



URLs to my site could previously include a
nocompress=1
within the query-part. This was to allow page-compression to be forced OFF for a page. That is causing duplicate penalties on Google, so now I'm trying to rewrite all such URLs to remove just that explicit part of the query-string. ie turn

http://www.mysite.com/mfcs.php?nocompress=1&mid=30
to
http://www.mysite.com/mfcs.php?mid=30

The following is working fine except for the highlighted line:

#
# redirect all `nocompress=1' requests
# 2005-05-16 added -AK
#
RewriteCond %{QUERY_STRING} ^nocompress=1\&(.*)$
RewriteRule ^/(.*) http://www.mysite.com/$1?%1 [L,R=permanent]
RewriteCond %{QUERY_STRING} ^nocompress=1$
RewriteRule ^/(.*) http://www.mysite.com/$1 [L,R=permanent]

RewriteCond %{QUERY_STRING} ^(.+)\&nocompress=1\&(.*)$
RewriteRule ^/(.*) http://www.mysite.com/$1?%1&%2 [L,R=permanent]
RewriteCond %{QUERY_STRING} ^(.+)\&nocompress=1$
RewriteRule ^/(.*) http://www.mysite.com/$1?%1 [L,R=permanent]
_____________________________________________________________

Because there is no `?' in the rewrite string, Apache is re-appending the wretched query to the redirect, leaving it the same as before! I cannot find a way to instruct Apache not to add the query.

So,

http://www.mysite.com/mfcs.php?nocompress=1
stays exactly the same.

Essentially, I need the opposite of the QSA key (QS-Kill, perhaps?).

Does anyone know? Have I missed something obvious?

[edited by: jdMorgan at 3:20 pm (utc) on May 16, 2005]
[edit reason] Examplified. [/edit]

jd01

7:40 am on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Alex,

To not append a current query string, you need to append your rule with a blank query string. IOW end with a?

RewriteCond %{QUERY_STRING} ^nocompress=1$
RewriteRule ^/(.*) http://www.yoursite.com/$1? [L,R=permanent]

Just as a technicality, I believe the preceding / that is included in your rules is stripped by Apache when comparing, so it might be wise to remove those from your rules, because they will never technically match, but if it works...

Hope this helps.

Justin

AlexK

10:16 am on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jd01:
you need to append your rule with a blank query string

Yes! Thanks very much, that works.

My fear, of course, was that Apache would leave the `?' in place, giving yet another dupe page, but no - the 301 comes up without the question-mark. Excellent, and thanks again.

I believe the preceding / that is included in your rules is stripped by Apache when comparing
I got into the habit through reading the original rewrite guide [httpd.apache.org] by Ralf S. Engelschall - it is in all the RewriteRule rules, so I do not think so.

I don't suppose that you can think of a neater way of acheiving this? It is an academic question, because all the rules do now work.

jdMorgan

3:30 pm on May 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I don't suppose that you can think of a neater way of acheiving this?

You can reduce this kind of rewrite to two rules. I worked this out for a client who apparently had other server config issues and ended up stiffing me for the effort, so you can have it:


# Case 1: More parameters follow nocompress=1
# Here we rewrite the query (foo=bar&)nocompress=1&(quux=naz) to foo=bar&quux=naz.
# The ampersand following the preceding parameters, if they are present, is retained.
#
RewriteCond %{QUERY_STRING} ^(.+&)?nocompress=1&(.+)?$ [NC]
RewriteRule (.*) http://%{HTTP_HOST}/$1?%1%2 [R=301,L]
#
# Case 2: No more parameters follow nocompress=1
# Here we rewrite the query (foo=bar)&nocompress=1 to foo=bar.
# The ampersand following the preceding parameters, if they are present, is discarded.
#
RewriteCond %{QUERY_STRING} ^((.+)&)?nocompress=1$ [NC]
RewriteRule (.*) http://%{HTTP_HOST}/$1?%2 [R=301,L]

Jim

AlexK

4:59 am on May 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jdMorgan:
You can reduce this kind of rewrite to two rules.

Well, I doubt that it is often that anyone gets the opportunity to update your sterling contributions, Mr Morgan, but we have all just discovered why Mr Engelschall recommends using `
RewriteRule ^/(.*)
'.

I used the 2 sets of rewrite rules exactly as provided, and this is what happened to 2 pages:

http://www.mysite.com/mfcs.php?nocompress=1&mid=30
- and -
[mysite.com...]
both became

http://www.mysite.com//mfcs.php?mid=30
(note the double `/')

Changing the `

RewriteRule (.*)
' to `
RewriteRule ^/(.*)
' then fixed the above problem. This is just a quibble, of course, but since the whole exercise is to eliminate duplicate pages I am paranoid to get it right. Many thanks for your input.

Can you confirm one thing: the very last line confused the dickens out of me at first:

RewriteCond %{QUERY_STRING} ^((.+)&)?nocompress=1$ [NC]
RewriteRule (.*) [%{HTTP_HOST}...] [R=301,L]

(After some thought) I take it that the regex treats the outer `()' as `%1' and the inner `()' as `%2'? At first I could not see how this was going to work, and just took it on blind-faith trust.

jd01

5:30 am on May 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> RewriteRule ^/(.*)

You would be correct in the httpd.conf file, but I suggest revisiting the documentation regarding .htaccess configuration, noting that the preceding / is stripped by Apache in per directory comparisons.

Of course it is very difficult to tell which way to recommend when the user does not specify which file they are using, so .htaccess is normally assumed.

Justin

jdMorgan

12:39 pm on May 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Alex,

Yes, I didn't notice you had leading slashes in the code above my post. The code I posted is for .htaccess use, since most webmasters don't have httpd.conf privileges. So include a leading slash for httpd.conf, and omit it for .htaccess.

You are correct about nested parentheses; To determine the back-reference assignment, count left parentheses. Nesting often comes in useful parentheses are needed both to create back-references and to group optional subexpressions.

Jim