Forum Moderators: phranque

Message Too Old, No Replies

htaccess remove specific querystrings only

         

OliverF

6:15 pm on Aug 17, 2010 (gmt 0)

10+ Year Member



Hi,

I need to remove the following querystring from all filenames: ?hop=*

But I need the following querystring to still work:
?tags=

And I need to exclude certain directories from these rules

This is what I tried:

#Rewrite ?hop= to url without hop

#RewriteCond %{QUERY_STRING} ^hop=[^&]+ #this one does not seem to work

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?]*)\? #this seems to work

#?tags should work, therefore
RewriteCond %{QUERY_STRING} !^tags= #this one does not work yet

#exclude customers directory from these Rewrites
RewriteCond %{REQUEST_URI} !^/customers/ #seems to work ok

RewriteRule !\.html$ [#*$!.com...] [R=301,L]

Thanks for taking a look.

Oliver

jdMorgan

4:02 am on Aug 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is a method to remove a query string name/value pair whether it appears alone, at the beginning, in the middle, or at the end of other parameters. The only addition is add your excluded directory:

# Excluding the "customers" directory URL-path,
RewriteCond $1 !^customers/
# allow and retain optional parm(s) before hop= name/value, but none after
RewriteCond %{QUERY_STRING} ^(([^&]*(&[^&]*)*)&)?hop=[^&]*$ [OR]
# require and retain parm(s) before and accepts optional parm(s) after hop= name/value
RewriteCond %{QUERY_STRING} ^(([^&]*(&[^&]*)*)&)hop=[^&]*((&[^&]*)*)$
RewriteRule ^(.*)$ http://www.example.com/$1?%2%4 [R=301,L]

It is unclear why you found it necessary to use %{THE_REQUEST} in your rewritecond. Doing so implies that perhaps you are using a later internal rewrite rule to add "hop=" back into the filepath after this redirect is invoked. If that is the case, then the above code will need to be modified to use %{THE_REQUEST} as well.

If this rule does need to be modified, care must be taken to be sure that the %2 and %4 back-references still contain the preceding and following query string parameters in both of the described [OR]ed cases.

Jim

OliverF

11:41 am on Aug 19, 2010 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for your help.

It is unclear why you found it necessary to use %{THE_REQUEST} in your rewritecond.

I am new to this and was just trying to block something from happening and googled and tried different things and somehow this seemed to work - somewhat anyway.

I am new to the syntax of htaccess files and would like to learn it better myself.

The reason I ran into this is because I ran into duplicate content issues and also because I saw my homepage NOT indexed and in stead someone's hoplink to my home page indexed.

I tried rel=canonical, but that is not always honored, so I wanted to do it through htaccess.

I understand that your code now only blocks ?hop

1- How could I make it that it blocks for example ?hop and ?bling - is that even possible?

On my site I also use ?tags, and that should definitely NOT be blocked.

2- Is it possible to block every ?* but ?tags.

3- If I want to block this rule for more than one directory, could I do it with:
RewriteCond $1 !^customers/|login/|example

Thanks for the help!

Oliver

jdMorgan

3:27 pm on Aug 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please decide *exactly* what query strings you want to remove, or equivalently, what query-strings you want to keep. Then compile a list of all exceptions, such as the directories "/customers or /login or /stats". Otherwise, this thread will take far too much effort if we keep coding correct solutions to incompletely-defined problems. Contributors quickly lose interest when this happens.

A good attitude to adopt is: "I have one chance to fully-define this problem before the rocket takes off for Mars, never to return. What is the complete/correct definition of the problem to be solved?"

The code I posted does not "block" anything. It simply redirect the client to a URL+querystring that no longer contains the unwanted query string name/value pair(s).

1- How could I make it that it blocks for example ?hop and ?bling
Use the regex subpattern (hop|bling)
Please refer to the resources cited in our Apache Forum Charter for more information.

On my site I also use ?tags, and that should definitely NOT be blocked.
2- Is it possible to block every ?* but ?tags.
3- If I want to block this rule for more than one directory, could I do it with:
RewriteCond $1 !^customers/|login/|example
 # Redirect to remove all query string parameters except "tag=" name/value
# pairs from all requested URLs, excluding several directories
RewriteCond $1 !^(customers|login|example)/
RewriteCond %{QUERY_STRING} !^(tag=[^&]*)?$
RewriteCond %{QUERY_STRING} ^([^&]*&)*(tag=[^&]*)
RewriteRule ^(.*)$ http://www.example.com/$1?%2 [R=301,L]

Jim

OliverF

9:55 am on Aug 20, 2010 (gmt 0)

10+ Year Member



Thanks Jim,

I apologise for not stating the problem clear from the start.

You pinpoint a problem that I have as a beginner trying to bend my head around htaccess rewrite syntax and logic.

This being the first time that I had to try and solve an actual problem that could be solved with htaccess, my focus was just on the one porblem, not realising that with the solution I would cause other problems.

Your first reply helped fix the original problem and helped me then to sharper get a view of the real extent of the problem.

So thanks again for your help.

I will continue my learning path by checking out syntax and logic further on this forum.

Oliver

jdMorgan

1:02 pm on Aug 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No need for an apology -- but if *you* need a quick and correct solution, the fastest way to get one is to define the problem exactly. No good code has ever been written without a correct and complete requirements specification - whether formal or informal. A well-considered plan is required.

Many don't see "the big picture" and don't appreciate that their "one little new rule" can have huge effects and interactions with other rules in this .htaccess file, rules in other .htaccess files, rules in httpd.conf and other server config-level files, directives belonging to other modules in all of these locations, external redirects done by scripts, server performance, disk lifespan, search engine ranking, and the "user experience" on their sites, even if their new rule doesn't crash the server right away... :)

That is why we emphasize here that .htaccess code is server configuration code and should not be modified without understanding all of these possible effects. It certainly can't be "guessed at" with much chance of success.

Even when you've got to the point of good understanding, this stuff is not easy or simple. I like to keep a personal library of well-commented, tested, and known-good rules, so that I don't have to re-write everything every time (and make the very same mistakes that I've made before)... ;) The code in the first example above was derived from very-similar code that I wrote more than 14 months ago in a response to a question posted here.

Jim