Forum Moderators: phranque

Message Too Old, No Replies

RewriteRule converting %26 back to ampersand (&)

RewriteRule converting %26 back to ampersand (&)

         

Andy_I

9:46 am on Oct 22, 2009 (gmt 0)

10+ Year Member



Hi,

I have the following rule in .htaccess, which allows usage of a SEF URL which is then expanded to call the underlying PHP page along with the query string:

RewriteRule ^channel/vehicle_results-(.+)-(.+)- (.+) /channel/vehicle_results.php?make=$1&model=$2&type=$3

This works fine apart from when there are 'foreign' characters in the parameters. For example, in the following incoming URL the German 'ä' (lowercase a, umlaut) is correctly encoded as %26%23228%3B (which is hex version of 'ä') :

http://www.example.com/channel/vehicle_results-FENDT-M%26%23228%3BHDRESCHER-Any

...however in the rewritten URL it's converting the %26 back to an ampersand (&), thereby confusing the receiving PHP page when it does its GETs:

http://www.example.com/channel/vehicle_results.php?make=FENDT&model=M&%23228%3bHDRESCHER&type=Any

Is there any way I can stop this?

I've found lots of Rewrite/Ampersand queries in the forum but nothing which quite seems to help with my problem here.

jdMorgan

3:01 pm on Oct 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's highly unusual that Apache should be un-encoding the query string -- Most of the time, the problem is that it will doubly-encode any previously-encoded strings.

Your PHP code should be 'flexible' in that *any* HTTP agent may encode (or re-encode) characters as the request passes through the network. It should accept any of "&", "%26" or "%2526", "%252526", or even "%2525252525252526" as an ampersand.

Try using the [NE] flag, and while you're at it, add an [L] flag (each of your rules should have an [L] flag unless the output from that rule needs to be further re-written by a subsequent rule, which is quite rare.)

Also, fix your regex pattern, because it's very ambiguous and is therefore processed extremely inefficiently...

Try these two variations one at a time, and see if one works for you:


RewriteRule ^channel/vehicle_results-([^-]+)-([^-]+)-(.+)$ /channel/vehicle_results.php?make=$1&model=$2&type=$3 [NE,L]
- or-

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /channel/vehicle_results-([^-]+)-([^-]+)-([^?\ ]+)(\?[^\ ]*)?\ HTTP/
RewriteRule ^channel/vehicle_results-[^-]+-[^-]+- /channel/vehicle_results.php?make=%1&model=%2&type=%3 [NE,L]

The second one looks at the raw request exactly as received from the client, without any encoding or un-encoding but as you can see, it's more complex than the first option.

If you have any other rules with multiple occurrences of the ambiguous sub-patterns like ".+" and ".*" in your original rule as posted here, I suggest you look into using the negative-match technique demonstrated here -- You may find that your need for a server upgrade can be put off for a year or two, simply by using efficient patterns. Using more-specific subpatterns in rules can make processing each rule twenty to two thousand times faster, given average-length URLs...

Jim

Andy_I

6:02 pm on Oct 22, 2009 (gmt 0)

10+ Year Member



Jim,

Your 2nd suggestion (raw request) completely addresses my original problem. I'll now have to spend a couple of evenings trying to understand that syntax!

The only problem I now have is that if the requested URL containts '%2F' (hex code for '/') then for some reason it doesn't meet the RewriteCond/Rule and therefore doesn't rewrite the URL (causing a 'page not found'). This seems very strange. If I pass '/' in the URL rather than '%2F' then the RewriteCond/Rule picks it up but then I have problems because the '/' in the URL throws out all the page-relative requests.

Thanks so much for your help on this. If you do have any ideas about the %2F issue then I'd love to sort this bit out.

Andy

jdMorgan

6:34 pm on Oct 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Where, specifically, is this "%2f" in the URL?

If it's in the URL-path rather than in the query string appended to the URL-path, then you've got a serious problem in whater script generates the link(s) to that URL. This needs to be fixed in the script, and not in .htaccess, where the request has already been sent by a client and received by your server...

Jim

Andy_I

10:34 pm on Oct 22, 2009 (gmt 0)

10+ Year Member



Hi Jim,

Sequence for generating URL is as follows:

1. User enters 3 values into a HTML form (make, model, type)
2. Form is processed by (new) PHP program which retrieves the 3 params passed to it and then calls the PHP program that was originally used to process the form, but now with a SEF URL. It does this as follows:

$make = urlencode($_GET['make']);
$model = urlencode($_GET['model']);
$type = urlencode($_GET['type']);
$make = str_replace('-','%2d',$make);
$model = str_replace('-','%2d',$model);
$type = str_replace('-','%2d',$type);
Header ('Location: vehicle_results-'.$make.'-'.$model.'-'.$type);

If there is a '/' in one of the fields, e.g. make = Bravo/Brava, then the generated URL is:

http://www.example.com/channel/vehicle_results-FIAT-BRAVO%2FBRAVA-Any

...and this is when I'm hitting problem.

jdMorgan

1:25 am on Oct 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I suggest that you replace the slash with a different character, such as "+":

$model = str_replace('/','%2b',$model);

or replace the slash with a string such as "-or-" :

$model = str_replace('/','%2dor%2d',$model);

or don't allow slashes... Or don't encode them in the first place. Or decode them in the form presentation script.

I can't really make a good recommendation, since it doesn't really matter as far as the rewriterule is concerned -- In its present form the rule will simply move whatever you give it from the URL-path to the script query string, with the name/value pairs determined by the un-encoded-hyphen delimiters in the requested URL-path. The rule doesn't care about slashes, so the issue appears to be entirely within your scripts.

However, while it may seem attractive to allow *any* user-input characters to be passed, you must comply with the HTTP protocol requirements and should pay attention to the fact that escaped characters make for very ugly URLs -- counterproductive to the whole point of the 'friendly URL' exercise.

Jim