Forum Moderators: phranque

Message Too Old, No Replies

mod_rewrite and encoding

a rather pedantic question about how mod_rewrite handles encoding

         

gliff

2:20 pm on Jun 23, 2005 (gmt 0)

10+ Year Member



I have a question about how MOD_REWRITE handles a very specific and particular bit of encoding. The easiest way to explain what's confusing me is to setup a completely contrived example, so here goes. Assume the following rewrite rule in a .htaccess file.

RewriteEngine On
RewriteBase /
RewriteRule ^testthing/(.+)$ /testbed/encode.php?test=$1

Second, assume encode.php is the following

<?
var_export($_GET);
?>

So, calling a url like

http://example.com/testthing/foo

Would give you a page with the following

array ( 'test' => 'foo', )

So far so good. Now, take a look at the next URL/Results pair

http://example.com/testthing/one+two
array ( 'test' => 'one two', )

This is the expected result. In a query string the "+" character is a space. However, if I encode the +, I get the following.

http://example.com/testthing/one%2Btwo
array ( 'test' => 'one two', )

This is not what I expect. The 'test' key in the hash should be 'one+two', and not 'one two'. Take the following example

http://example.com/testthing/one%2Ctwo
array ( 'test' => 'one,two', )

Here the encoded value is correctly decoded.

Now, I know what's happening. During the MOD_REWRITE phase, apache sees the %2B and translates it into a + before passing it on to encode.php. The test parameter of the query string therefore gets an unencoded +, which encode.php translates as a space.

My questions is, is this behavior "correct"? If so, is there anyway to pass the + character on to a script through a URL without resorting to a "list of parameters" style URL? (like the following)

http://example.com/testbed/encode.php?test=one%2Btwo
array ( 'test' => 'one+two', )

I've noticed this behavior in both Apache 1.3x and 2.x. I also tested it with a variety of browsers (IE, Firefox, cURL) to make sure the browsers weren't munging the encoding differently (they weren't).

(If you're still here, thanks for reading this far.)

jd01

8:35 pm on Jun 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Running out of time, so I will be brief:

If you cannot find a way to pass the encoded/unencoded portion correctly, you could try inserting the character on the right side of the rule:

RewriteRule ^testthing/([a-z]+)(.([a-z]+))?$ /testbed/encode.php?test=$1+$3 [L]

Hope this gives you some ideas.

Justin

gliff

1:17 pm on Jun 24, 2005 (gmt 0)

10+ Year Member



If you cannot find a way to pass the encoded/unencoded portion correctly, you could try inserting the character on the right side of the rule:

That might work for specific cases where I know there's going to be a plus, but I'm more interested in creating general, catch all rules.

Thanks for the thought though.