Forum Moderators: phranque

Message Too Old, No Replies

"Force proxy" gets POST data thru redirect

Solves problem of 301 redirect dropping POST data

         

jerrykrinock

10:30 pm on May 14, 2010 (gmt 0)

10+ Year Member



I manage a site which is hosted on a low-priced shared server running Apache and cPanel on Linux. After setting up a 301 redirect of an entire subdirectory of my public_html/cgi-bin using cPanel, I found that the scripts in the redirected subdirectory were not receiving any POST data. I then read in Apache documentation that this is expected behavior.

But it's not what I wanted. I found the solution to the problem [on another site]:

Editing public_html/.htaccess, I found that cPanel had entered my redirect thus:

RewriteCond %{HTTP_HOST} ^example.com$ [OR] 
RewriteCond %{HTTP_HOST} ^www.example.com$
RewriteRule ^cgi\-bin\/sales\/?(.*)$ "http\:\/\/example\.com\/cgi\-bin\/live\/$1" [R=301,L]


So I changed the "L" in the [flags] at the end to "P" and voila, the scripts in the redirected subdirectory started getting POST data!

The "P" stands for "force proxy" and is explained in the Apache documentation here:

[httpd.apache.org ] (Search that page for the text "force proxy".)

At the end it says "Note: mod_proxy must be enabled in order to use this flag." Apparently it is enabled, but when I asked the support team at my web host if this was true, they said "yes", but couldn't explain know why or how.

Another interesting thing about this "force proxy" redirect is that it is stealthy; the web browser shows the original URL instead of the redirected URL. I may be wrong, but I think that, unlike a normal redirect which goes back to the client and gives it the redirected URL and then expects the client to re-send, this "force proxy" appears to instead simply forward the original request internally, not even telling the client. That seems like a bonus -- less dependence on client's web browser, faster response time. (The client's application is an application which I publish, and it doesn't have any "bookmark" to change. I'll change the URL in future releases.)

So it seems like a win-win-win, except that I wish better understood what I've done. Can anyone confirm or explain it better? Is there any downside to this little trick?

Thank you,

Jerry Krinock

[edited by: jdMorgan at 12:02 am (utc) on May 15, 2010]
[edit reason] Removed links, changed to example.com. Please see TOS. [/edit]

g1smd

10:40 pm on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{HTTP_HOST} ^sheepsystems.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.sheepsystems.com$
RewriteRule ^cgi\-bin\/sales\/?(.*)$ "http\:\/\/sheepsystems\.com\/cgi\-bin\/live\/$1" [R=301,L]


Whoever wrote that code needs a severe talking to.
- Periods in patterns need to be escaped.
- Slashes, hyphens and colons in patterns do not need to be escaped.
- Unless this folder hosts other domains the two RewriteConds are not required.
- The code above, also doesn't cater for non-canonical hostname requests with appended port number and/or appended period, another reason to change or dump the RewriteConds.
- RewriteRule cannot see query strings so the RewriteRule pattern actually tests for an optional slash. In that case, it would match
/cgi-bin/salesblahblah
as a valid URL request.

I hope you're aware that using the proxy flag messes up your server logs and analytics, as well as slowing down your site for all visitors, as the server has to make a new request for the page, and then organise passing the reply back to the browser.

At this point is very important to understand the differences between an external redirect, an internal rewrite, an include, a same-domain proxy request and a cross-domain proxy request. They each have their place, and are very different in outcome, even though some are only a few characters different from each other in implementation.

jdMorgan

12:07 am on May 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first question is "Why did you want to do this redirect in the first place?"

By setting up a proxy through-put, you are causing your server to accept the client's (browser's) HTTP request, then make a new (outgoing) HTTP request to the URL you specified, accept the response from that server, and then pass it back to the originally-requesting client.

This is not a redirect, and does not function as a redirect in terms of what search engines will make of it.

So why did you want to accomplish with a redirect? -- that's the question.

Jim

jerrykrinock

1:22 am on May 15, 2010 (gmt 0)

10+ Year Member



Whoever wrote that code needs a severe talking to...
Well, it was generated automatically by cPanel when I set up a "Redirect". I'll review your points, but it does seem to work without side effects. (cPanel is a website control panel which is provided by my web host.)

I hope you're aware that using the proxy flag messes up your server logs and analytics, as well as slowing down your site for all visitors, as the server has to make a new request for the page, and then organise passing the reply back to the browser.
OK, I'll watch for any slowdown. The requests are typically quite tiny.

At this point is very important to understand the differences between an external redirect, an internal rewrite, an include, a same-domain proxy request and a cross-domain proxy request. They each have their place, and are very different in outcome, even though some are only a few characters different from each other in implementation.
Thank you, g1smd. That gives me some terms to go learn about. My guess is that I did either an internal rewrite or a same-domain proxy request.

Now on to Jim's thoughts...

The first question is "Why did you want to do this redirect in the first place?"
Thank you, Jim. Answer: I've published several applications which have different URLs hard-coded into them, and as a result of spring cleaning and reorganization, I want them to all hit the same scripts. After a few weeks, as users update, the redirects will be used less and less. After maybe 6 months I'll delete the this code from .htaccess.

By setting up a proxy through-put, you are causing your server to accept the client's (browser's) HTTP request, then make a new (outgoing) HTTP request to the URL you specified, accept the response from that server, and then pass it back to the originally-requesting client.
I believe you're telling me that it's as I suspected...it's all done without involving the client, and furthermore it's all internal since it's actually the same server, example.com. It's forwarding from example.com/cgi-bin/sales/Whatever.pl to example.com/cgi-bin/live/Whatever.pl. So it should never leave the box it's running on until the final response.

This is not a redirect, and does not function as a redirect in terms of what search engines will make of it.
I'm glad there's another word for it. As far as search engines, I believe that's not an issue since I have no interest in advertising these scripts to the public. They are hit either by an application which I publish, or with notifications from Paypal or Google Checkout. When hit with appropriate POST data, they will do one of: return pricing information, initiate an order, send out a key, mark a sale as completed, etc.

jdMorgan

4:30 pm on May 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> So it should never leave the box it's running on until the final response.

It does leave the box. Your server sends the client's POST to the internet through one or more datacenter routers, one of which (thankfully) is sending it back to your own server. Your server then sends its reply back to itself, again through the router(s). Finally, that reply is sent back to the client.

This introduces extra traffic within the datacenter network and causes delays. Also, you server will log two requests for each of these POSTs.

An internal rewrite would seem to be a better solution than this reverse-proxy through-put. You need not change a URL just because a filepath changes. In fact, for pages on your own site, you need not and should not ever change any URL.

The internal rewrite syntax for use in .htaccess would look like this:

RewriteRule ^cgi-bin/sales(/.*)?$ cgi-bin/live$1 [L]

Note that all unnecessary and incorrect character-escaping has been corrected as well.

This has the effect of rewriting requests for URL-path /cgi-bin/sales/xyz to the filepath /cgi-bin/live/xyz, so that only the filepath "associated" with that URL-path is modified.

You may find the resources cited in our Apache Forum Charter and the tutorials in our Apache Forum Library to be useful.

Jim

jerrykrinock

8:52 pm on May 15, 2010 (gmt 0)

10+ Year Member



Ah, of course if there's no need for it to leave the box, I shouldn't be giving it a domain! And the "one-liner" in the last post works perfectly. POST data gets through.

After reading some of the documentation Jim suggested, I believe that apache.org documentation should, but does not, explain *what* a *rewrite* is and *why* it is useful. So I submitted some text to apache.org:

[issues.apache.org ]

Thanks again, Jim.

g1smd

10:24 pm on May 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's why I said above: At this point is very important to understand the differences between an external redirect, an internal rewrite, an include, a same-domain proxy request and a cross-domain proxy request. They each have their place, and are very different in outcome, even though some are only a few characters different from each other in implementation.

I'll tell you now that I was completely lost until I had those all completely clear in my head, and knew the subtle differences in coding required to implement each one.

I'll also guess that the vast majority of web designers, and a very large number of SEO practitioners, don't understand the concepts at all - and that's why you see some truly awful server configurations, URL structures, and site implementations, even on (no! especially on) projects costing millions.

As for Apache, I am not that happy with the wording they use to describe some of the concepts. What we like to call an "internal server filepath" here they sometimes call a "local URL", and they also muddy the waters with their usage of the redirect and rewrite terms on occasions too.