Forum Moderators: coopster

Message Too Old, No Replies

Use of mod rewrite in Apache2 breaks PHP $ REQUEST parsing

mod_rewrite renders $_SERVER[ 'QUERY_STRING' ] in plain text

         

AlexK

4:40 pm on Jul 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whilst testing a new site layout, with dynamic URLs written to be human-friendly (and mod_rewrite used in Apache to turn those urls back into GET variables), I found that any 'directory' containing an ampersand (`&') or a plus (`+') just would not open within the edit pages.

Digging deeper revealed the following:

$_SERVER[ 'QUERY_STRING' ] is normally urlencoded, and $_REQUEST auto-urldecodes it.
That changes with the use of mod_rewrite in Apache; mod_rewrite renders QUERY_STRING already urldecoded. That fact in turn has at least 2 undesirable effects:
  1. `+' (`%2B' in REQUEST_URI) ends up as a space in $_REQUEST
  2. `&' (`%26' in REQUEST_URI) within query values breaks $_REQUEST parsing
    (spurious, and wrong, parameter-value pairs)

Additionally, in a "normal" (none-mod_rewrite) Apache request a `%26' (`&') in the QUERY_STRING will
end up as `&' in the $_REQUEST (also $_GET, though not in $_POST).

To attempt to illustrate this further, consider the following 2 situations; a page where dir='A & B' and file='A + B':.

normal, non mod_rewrite situation:

href=
[my-site.com...]
.
URL=
[my-site.com...]
.
$_SERVER[ 'QUERY_STRING' ]=
'dir=A%20%26%20B&file=A%20%2B%20B'
.
$_SERVER[ 'REQUEST_URI' ]=
'/page.php?dir=A%20%26%20B&file=A%20%2B%20B'
.
$_REQUEST[ 'dir' ]= // note: auto urldecoded
'A & B' // note: `&' not `&'
$_REQUEST[ 'file' ]=
'A + B'
.
$_GET[ 'dir' ]= // note: auto urldecoded
'A & B' // note: `&' not `&'
$_GET[ 'file' ]=
'A + B'
.
$_POST[ 'dir' ]= // note: auto urldecoded
'A & B'
$_POST[ 'file' ]= // note: not affected
'A + B'

mod_rewrite situation:
href=
[my-site.com...]
.
URL=
[my-site.com...]
.
$_SERVER[ 'QUERY_STRING' ]= // note: has been urldecoded
'dir=A & B&file=A + B'
.
$_SERVER[ 'REQUEST_URI' ]=
'/dir/A%20%26%20B/A%20%2B%20B'
.
$_REQUEST[ 'dir' ]= // note: parsing is broken
'A '
$_REQUEST[ 'B' ]=
''
$_REQUEST[ 'file' ]= // note: `+' becomes ` '
'A B'

Because the mod_rewrite QUERY_STRING is already urldecoded, any ampersand contained within it breaks the _REQUEST, _GET parsing. Additionally, any `+' in the QUERY_STRING becomes a space within the _REQUEST and _GET arrays.

I fixed this on the site by duplicating in PHP the Apache mode_rewrite and PHP Query parsing from scratch. It is definitely a bug.

Apache: 2.0.52
PHP: 4.3.9

Finally, whilst on this subject (and for those new to PHP), here is some sage advice from the online PHP manual [uk.php.net] concerning web link ("<a href=") urls:

  • Use urlencode for all GET parameters (things that come after each "=").
  • Use rawurlencode for parts that come before "?".
  • Use htmlspecialchars for HTML tag parameters and HTML title-text content.
    (the last one means, use `&amp;' as the parameter separator (not `&') (but use plain `&' within "header( 'location:)" URLs)

[edited by: AlexK at 4:45 pm (utc) on July 13, 2006]

coopster

6:48 pm on Jul 15, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



AlexK,

Before even delving into the mod_rewrite portion, how are you getting &amp; in your $_REQUEST and $_GET 'dir' index values? You even have a note on them ...

normal, non mod_rewrite situation:


...
$_REQUEST[ 'dir' ]= // note: auto urldecoded
'A &amp; B' // note: `&amp;' not `&'
$_REQUEST[ 'file' ]=
'A + B'
.
$_GET[ 'dir' ]= // note: auto urldecoded
'A &amp; B' // note: `&amp;' not `&'
$_GET[ 'file' ]=
'A + B'
.
$_POST[ 'dir' ]= // note: auto urldecoded
'A & B'
$_POST[ 'file' ]= // note: not affected
'A + B'

The $_POST array value for 'dir' looks correct and that is the same value I get in $_REQUEST and $_GET on my test servers. I find myself trying to figure out which configuration directive we have different, but I cannot think of any. Are you certain that is the value you are getting in a non-mod_rewrite environment?

I tested on both PHP 4.3.11 and PHP 5.0.4

AlexK

5:24 pm on Jul 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



coopster:
Are you certain that is the value you are getting in a non-mod_rewrite environment?

Yes, absolutely certain (but read the 'mea-culpa' at bottom).

[Sorry for the very long delay with this reply - I am in the final, final stages of getting my site rewrite live, and that is consuming my attention. Back to your reply...]

You may imagine that this situation was driving me quite crazy. Mod rewrite is implemented on a test site which has full error-checking turned on. My live site has all error-checking turned OFF and is currently non-mod_rewrite. My admin pages have full error-checking turned ON and are also non-mod_rewrite. I got the result above on the admin pages, using (from memory - it was another age ago now)

var_dump( $GLOBALS )
.

Checking through the source of the result page, the $_REQUEST + $_GET arrays contained an ampersand as "&amp;", whereas when POSTed $_POST contained "&". I was as thorough as I could be.

Sticky me, and I will set up a test page for you on the test site with a var_dump reply.

Mea Culpa:
So, being a "as thorough as I could be" sort of chap, I constructed the page first on both the admin and the test site and checked it and... no "&amp;" (sob). It was there before, I swear it!

No point in stickying me now.

The mod_rewrite comments are still correct (mumble, grumble).