Forum Moderators: phranque

Message Too Old, No Replies

Rewriting Query Strings

I want to change the query string but leave the URL intact

         

wealthyteddy

7:48 pm on Dec 11, 2006 (gmt 0)

10+ Year Member



I want to change part of the query string from one value to another, but I need to leave the URL intact.

For example:

http://www.example.com/index.php?cName=toys-plush-animal-sets

needs to be changed to

http://www.example.com/index.php?cName=toys-plush-animals

and

http://www.example.com/product_info.php?pName=standing-hippo-40&cName=plush-animal-sets-safari-animal-sets

needs to be changed to

http://www.example.com/product_info.php?pName=standing-hippo-40&cName=plush-animals-safari-animal-singles

and

http://www.example.com/product_info.php?pName=standing-elephant-with-sound-set&cName=plush-animal-sets-safari-animal-sets

needs to be changed to

http://www.example.com/product_info.php?pName=standing-elephant-with-sound-set&cName=plush-animals-safari-animal-sets

There are other specific changes I need to apply, but they follow the same general pattern.

Note that, in the second example above, there are two phrases that need to be changed.

I'm aware that I need to use RewriteCond to test the query string, but it feels as though I've tried every combination of RewriteCond and RewriteRule code possible and not yet hit on the right solution.

I'm trying to apply these rewrites via my .htaccess file.

Any help gratefully received!

Thanks,

Mark.

[edited by: jdMorgan at 10:35 pm (utc) on Dec. 11, 2006]
[edit reason] example.com [/edit]

jdMorgan

8:29 pm on Dec 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please post your best-effort code as a basis for discussion.

Thanks,
Jim

wealthyteddy

10:24 pm on Dec 11, 2006 (gmt 0)

10+ Year Member



Jim,

Thanks for your reply.

I must not have tried every possibility after all, because when attempting to find my best yet, I came up with the following, which seems to work:


# Change the Category Name of plush-animal-sets to plush-animals
RewriteCond %{QUERY_STRING} (.*)plush-animal-sets(.*)
RewriteRule .? %{REQUEST_URI}?%1plush-animals%2 [R]

The new pages aren't live yet, so it does generate a 404 error, but I wanted to make sure I could do this rewrite before I change the name of the product categories in my shopping cart system.

Once it's working fine, should I change the R parameter to a R=301?

Thanks again,

Mark.

wealthyteddy

10:27 pm on Dec 11, 2006 (gmt 0)

10+ Year Member



Jim,

Oops!

I forgot to include sample inputs and outputs to my previous reply.

The original URL I entered was:

http://www.example.com/index.php?cName=toys-plush-animal-sets

and the URL it ended up at was:

http://www.example.com/resources/error404.php?url=http://www.example.com/index.php&cName=toys-plush-animals

The first part of the target URL is my custom 404 error page, but the url= parameter looks like it's the correct destination, once I rename the plush-animal-sets category to plush-animals.

Best wishes,

Mark.

[edited by: jdMorgan at 10:36 pm (utc) on Dec. 11, 2006]
[edit reason] example.com [/edit]

jdMorgan

10:47 pm on Dec 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd suggest making the query-matching pattern more efficient and more specific, specifying the canonical substitution URL, using a permanent (301) redirect, and telling mod_rewrite to quit processing immediately using the [L] flag if this rule is invoked:

# Change the Category Name of "plush-animal-sets" to "plush-animals"
RewriteCond %{QUERY_STRING} ^(([^&]+&)*)cName=plush-animal-sets(&.+)?$
RewriteRule (.*) http://www.example.com/$1?%1cName=plush-animals%3 [R=301,L]

Using a specific pattern instead of ".*" in the query-string checking should dramatically speed up processing. The "([^&]+&)*" pattern shown means, "one or more characters not equal to an ampersand, followed by an ampersand, and as many of those sequences as you like (including zero). This allows the pattern-matching to proceed directly from left-to right, and avoids the multiple match-attempt iterations required if you ask it to find your query string 'floating' between two ambiguous, greedy, and promiscuous ".*" sequences.

Jim

[edited by: jdMorgan at 10:48 pm (utc) on Dec. 11, 2006]

phranque

2:06 am on Dec 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



ambiguous, greedy, and promiscuous ".*"

jim - do you have a paste buffer containing this phrase ready at all times?
=8)

i love it!
you actually show up #2 if you google "ambiguous greedy promiscuous"

wealthyteddy

2:39 am on Dec 12, 2006 (gmt 0)

10+ Year Member



Jim,

Thanks for your advice.

I tried it out, and it seems to work fine.

However, sometimes the cName parameter will be the first parameter in the query string, and sometimes it will be after the first parameter.

For example:

http://www.example.com/product_info.php?pName=standing-elephant-with-sound-set&cName=plush-animal-sets-safari-animal-sets

Here, the first paramater is "pName", so the "cName=" will be preceded by an "&", which I believe is what your code looks for.

But in this example:

http://www.example.com/index.php?cName=plush_stuffed-toys

(which needs to be changed to

http://www.example.com/index.php?cName=toys-small-plush-toys
)

the "cName=" is the first parameter and therefore preceded by a "?" instead.

What is the best way to amend your code to cater for this?

Also, the text I'm looking for may occur directly after "cName=", and sometimes there will be other characters in between the "cName=" and the text I need to change, and sometimes there will be other characters after the text I need to change but before the next parameter or end of the query string.

I think I will always know what text would precede / follow the text I'm looking to change, so I suppose the simplest solution, even though it would mean adding more rules than my more generic but inefficient solution might require, would be to specify exactly what I need in both the search and replace strings?

Finally, having renamed my Plush Animal Sets category to Plush Animals in my shopping cart, the code I posted earlier doesn't work (it generates a 404) because it's changing the "? in my substitution URL to an "&", which it doesn't do with your solution, and I'm not sure why that is / what is different.

Thanks again,

Mark.

jdMorgan

4:03 am on Dec 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the "cName=" is the first parameter and therefore preceded by a "?" instead.

What is the best way to amend your code to cater for this?

Did you test the code I posted? The "as many as you like, including zero" clause on my first subpattern should allow for any number of name/value pairs to precede "cName=xyz" without any trouble. If you're sure that cName is and always will be the first parameter, then you can dispense with that bit of the pattern altogether, and just use


# Change the Category Name of "plush-animal-sets" to "plush-animals"
RewriteCond %{QUERY_STRING} ^cName=plush-animal-sets(&.+)?$
RewriteRule (.*) http://www.example.com/$1?cName=plush-animals%1 [R=301,L]

Along with my well-known (and highly-ranked*) dislike for ".*" patterns (due primarily to their inefficiency when used multiple times in one pattern), another one of my opinions is that having gained some comfort and proficiency with mod_rewrite, one soon discovers that coding is easy, it's defining the problem precisely that is most difficult.

You can write one rule per name/value pair that needs to change, or if there are similarities between some or all of the changes, you can take advantage of them to create a smaller number of rules to do all of them. But since I'm not familiar with your site and parameter-naming conventions, I have no idea what shortcuts you might be able to use. Only you can decide or discover them. So indeed, that really is the hard part.

*"Greedy" is a commonly-used description of the ".*" and ".+" patterns, because each will match as many characters as possible.

I use the word "promiscuous" because both patterns will also match *any* characters, often leading to unexpected results, a quick example of which is "(.*)/?" where the $1 back-reference will always contain the trailing slash if present in the request, because ".*" is greedier than "/?" and will always consume the trailing slash.

Using multiple (.*) patterns can also lead to ambiguity as to exactly which kinds of URLs will be matched -- it's a common source of functional rewriting problems.

Use of multiple ".*" subpatterns in one pattern also causes huge processing inefficiencies, since the matching routine often has to "loop and back off" many, many times to find a match. In short, avoiding the use of ".*" whenever possible is a good practice.

Jim

wealthyteddy

12:39 am on Dec 13, 2006 (gmt 0)

10+ Year Member



Jim,

Thanks for your reply.

I have to confess that I only tested it on the one URL, as I didn't want to rename all of my other category names before testing it out on the first one.

However, I understand what you are saying, having studied your code once again in more detail.

The cName parameter will not always be the first one, so I'll have to stick with your first solution.

Having worked in IT for over 25 years before giving up the corporate life to work from home, I also fully understand your comment about the hard part being to define the precise problem. And that's causing me some hard thinking in this case.

There is one more problem I'm encountering, however.

After the rewrite rule that you kindly gave me, I have a rule to trap 404 errors and redirect people to my own page, and it's a bit of RewriteRule I got from a website somewhere:


# Redirect 404 errors to custom error page
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^/?(.*)$ /resources/error404.php?url=$1 [L,QSA,R]

What I've found is that if I leave the "L" flag off my previous rules (as it's possible that, even after changing part of the query string, another part of the URL may still be wrong), then the 404 rule is being triggered, in spite of the fact that the target page, after the rewrite, is valid.

For example, the following URL:

http://www.example.com/product_info.php?pName=rolling-horse-brown-26&cName=plush-animal-sets-farm_domestic-sets

gets changed to:

http://www.example.com/product_info.php?pName=rolling-horse-brown-26&cName=plush-animals-farm_domestic-sets

using another rule I created earlier today, based on the one you supplied me.

This new target page exists, and if I add the "L" flag to the rewrite rule that does this change, then my browser takes me to the correct page.

However, if I remove the "L" flag, it generates a 404 error instead, presumably because it's triggering the 404 rule.

I would have thought that the

%{REQUEST_FILENAME}
should find the file, even though this is meant to be the full filesystem path.

(I know you can also use the ErrorDocument 404 command, but I've tried using that before, and it doesn't seem to trap all 404 errors, for some reason. What I found, on a couple of my sites, is that if you try to visit a (page in a) directory that doesn't exist, it works fine, but if you try to visit a file / page that doesn't exist in a directory that does exist, then it still presents the visitor with the host's default 404 page, not my custom one.)

The simple solution for now would be to use the "L" flag on my rules, but I'd be interested in knowing why the 404 rule is being triggered for pages that apparently exist, how I might resolve this, and why the ErrorDocument command doesn't always seem to work.

Sorry for the long post and the continued requests for help, and thanks for your patience and help,

Mark.

jdMorgan

4:12 am on Dec 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a quick note.

I don't know why you're getting the 404 problem -- what's your server error log have to say about it?

Using the ErrorDocument 404 /resources/error404.php method, are you getting the default server 404 error document, or the default 403 error document -- I'd expect the latter.

ErrorDocument works as designed, in that "/" is defined by default not as a file, but as the "index" -- the auto-generated "table of contents," if you will -- of each directory. So, in normal circumstances, it always exists, and you cannot get a 404 on requests for it.

However, that's a simplified answer, applying only to your particular problem. In the wider view, DirectoryIndex, ErrorDocument, and Options +/-Indexes all come into play, along with mod_dir functions, in determining what happens if a "/" URL is requested.

I'd avoid using that 404-handler-rewrite approach, because it makes two (additional and redundant) filesystem searches in addition to the built-in one used by the default Apache missing-file handling. On a busy site, it could really slow down the server.

In addition, since it generates a Redirect response, you're sending a 302-Found, not a 404-Not Found response to the client, and that's very, very bad if you care about search rankings...

If you're not already doing so, use the "Live HTTP Headers" extension to FireFox, and take a look at your response headers for "404" errors -- I think you'll find they're 302's. :(

Jim