Forum Moderators: phranque
I have the following rule in .htaccess, which allows usage of a SEF URL which is then expanded to call the underlying PHP page along with the query string:
RewriteRule ^channel/vehicle_results-(.+)-(.+)- (.+) /channel/vehicle_results.php?make=$1&model=$2&type=$3
This works fine apart from when there are 'foreign' characters in the parameters. For example, in the following incoming URL the German 'ä' (lowercase a, umlaut) is correctly encoded as %26%23228%3B (which is hex version of 'ä') :
http://www.example.com/channel/vehicle_results-FENDT-M%26%23228%3BHDRESCHER-Any
...however in the rewritten URL it's converting the %26 back to an ampersand (&), thereby confusing the receiving PHP page when it does its GETs:
http://www.example.com/channel/vehicle_results.php?make=FENDT&model=M&%23228%3bHDRESCHER&type=Any
Is there any way I can stop this?
I've found lots of Rewrite/Ampersand queries in the forum but nothing which quite seems to help with my problem here.
Your PHP code should be 'flexible' in that *any* HTTP agent may encode (or re-encode) characters as the request passes through the network. It should accept any of "&", "%26" or "%2526", "%252526", or even "%2525252525252526" as an ampersand.
Try using the [NE] flag, and while you're at it, add an [L] flag (each of your rules should have an [L] flag unless the output from that rule needs to be further re-written by a subsequent rule, which is quite rare.)
Also, fix your regex pattern, because it's very ambiguous and is therefore processed extremely inefficiently...
Try these two variations one at a time, and see if one works for you:
RewriteRule ^channel/vehicle_results-([^-]+)-([^-]+)-(.+)$ /channel/vehicle_results.php?make=$1&model=$2&type=$3 [NE,L]
- or-
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /channel/vehicle_results-([^-]+)-([^-]+)-([^?\ ]+)(\?[^\ ]*)?\ HTTP/
RewriteRule ^channel/vehicle_results-[^-]+-[^-]+- /channel/vehicle_results.php?make=%1&model=%2&type=%3 [NE,L]
If you have any other rules with multiple occurrences of the ambiguous sub-patterns like ".+" and ".*" in your original rule as posted here, I suggest you look into using the negative-match technique demonstrated here -- You may find that your need for a server upgrade can be put off for a year or two, simply by using efficient patterns. Using more-specific subpatterns in rules can make processing each rule twenty to two thousand times faster, given average-length URLs...
Jim
Your 2nd suggestion (raw request) completely addresses my original problem. I'll now have to spend a couple of evenings trying to understand that syntax!
The only problem I now have is that if the requested URL containts '%2F' (hex code for '/') then for some reason it doesn't meet the RewriteCond/Rule and therefore doesn't rewrite the URL (causing a 'page not found'). This seems very strange. If I pass '/' in the URL rather than '%2F' then the RewriteCond/Rule picks it up but then I have problems because the '/' in the URL throws out all the page-relative requests.
Thanks so much for your help on this. If you do have any ideas about the %2F issue then I'd love to sort this bit out.
Andy
If it's in the URL-path rather than in the query string appended to the URL-path, then you've got a serious problem in whater script generates the link(s) to that URL. This needs to be fixed in the script, and not in .htaccess, where the request has already been sent by a client and received by your server...
Jim
Sequence for generating URL is as follows:
1. User enters 3 values into a HTML form (make, model, type)
2. Form is processed by (new) PHP program which retrieves the 3 params passed to it and then calls the PHP program that was originally used to process the form, but now with a SEF URL. It does this as follows:
$make = urlencode($_GET['make']);
$model = urlencode($_GET['model']);
$type = urlencode($_GET['type']);
$make = str_replace('-','%2d',$make);
$model = str_replace('-','%2d',$model);
$type = str_replace('-','%2d',$type);
Header ('Location: vehicle_results-'.$make.'-'.$model.'-'.$type);
If there is a '/' in one of the fields, e.g. make = Bravo/Brava, then the generated URL is:
http://www.example.com/channel/vehicle_results-FIAT-BRAVO%2FBRAVA-Any
...and this is when I'm hitting problem.
$model = str_replace('/','%2b',$model);
or replace the slash with a string such as "-or-" :
$model = str_replace('/','%2dor%2d',$model);
or don't allow slashes... Or don't encode them in the first place. Or decode them in the form presentation script.
I can't really make a good recommendation, since it doesn't really matter as far as the rewriterule is concerned -- In its present form the rule will simply move whatever you give it from the URL-path to the script query string, with the name/value pairs determined by the un-encoded-hyphen delimiters in the requested URL-path. The rule doesn't care about slashes, so the issue appears to be entirely within your scripts.
However, while it may seem attractive to allow *any* user-input characters to be passed, you must comply with the HTTP protocol requirements and should pay attention to the fact that escaped characters make for very ugly URLs -- counterproductive to the whole point of the 'friendly URL' exercise.
Jim