homepage Welcome to WebmasterWorld Guest from 54.226.252.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Form GET question mark breaking mod rewrite rule
montclairguy




msg:4565419
 9:58 pm on Apr 16, 2013 (gmt 0)

For files that don't actually exist in a "search" directory, this rewrite rule:

RewriteCond /usr/local/apache/htdocs/search/$1 !-f
RewriteRule /search/(.*) /usr/local/apache/cgi-bin/s.cgi?$1 [L,T=application/x-httpd-cgi,NE,NC]


allows me to use URLs like this:

http://www.example.com/search/whatever=whatever&something=something

instead of this:

http://www.example.com/cgi-bin/s.cgi?whatever=whatever&something=something

I'm aware of other methods of doing this to separate the query sections removing ampersands and equal signs, but I don't care about that right now. The problem I'm having is with a form which uses "get" instead of "post". The browser address bar is showing this, which probably means s.cgi is getting the question mark sent to it:

http://www.example.com/search/?whatever=whatever&something=something

I need the question mark removed when submitting a form with the "get" method. Again, this rule works perfectly for direct links and forms using post.

Sorry if this is an easy one, but mod_rewrite is not one of my strong points. Also, this is an older version of Apache and mod_rewrite and it cannot handle (?:) style handle back-references.

[edited by: bill at 9:00 am (utc) on Apr 17, 2013]
[edit reason] disable smilies [/edit]

 

lucy24




msg:4565457
 1:46 am on Apr 17, 2013 (gmt 0)

Y'know, I just spend a lonnnng time on another question, hoping the fairies would answer this one in the mean time. No luck. I suspect the Apache fairies are waiting for me to say something egregiously incorrect and then they'll come to the rescue. But it's cheating if I intentionally say something wrong.

Judging by ordinary smiley behavior, that was a (?:) before Happy Face got hold of it. But ?: means a non-capturing group. Not sure if I even want to know what role it was supposed to play. Or did you mean a (?<= etc. lookbehind?

First reaction: Someone who writes a rule involving the [T] flag-- which I've never used in my life-- has a ### of a nerve turning around and saying "mod_rewrite is not one of my strong points". (Do you really need the flag? What else would a filename in .cgi be?)

Second reaction: Do you mean that you want to mess with your URL so that a real-life query string looks as if it's simply part of the path? This could get dicey, since literal ampersands also don't belong in paths.

Third reaction: Now wait a minute.

RewriteRule /search/(.*) /usr/local/apache/cgi-bin/s.cgi?$1 [L,T=application/x-httpd-cgi,NE,NC]
...
The browser address bar is showing this, which probably means s.cgi is getting the question mark sent to it:
http://www.example.com/search/?whatever=whatever&something=something

Do you have some reason to believe that you're seeing a literal question mark rather than the ordinary query-string question mark? Is that even physically possible? Seems like, at worst, you're getting a redirect instead of the intended rewrite.

What happens when you try the inverse of your current rule? Something like

RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{QUERY_STRING} (.+)
RewriteRule ^search/$ http://www.example.com/search/%1 [R=301,L]

This is all assuming the troublesome requests started out as GET, rather than as POST requests that got redirected and underwent a sea change.

phranque




msg:4565549
 8:55 am on Apr 17, 2013 (gmt 0)

first you should understand how forms work.
if you use the GET method the form sends the request parameters in the url, with the query string separated from the action path by a question mark, and with each parameter=value pair in the query string separated by ampersands.
if you use the POST method the form sends the request parameters in the body of the request, with each parameter=value pair in the query string separated by ampersands.

what you are seeing is normal behavior according to the HTTP protocol and can't be changed.

as far as how you are using urls, keep in mind that & and = are reserved characters to be used for delimiters and when not used as such according to protocol should be percent-encoded.

http://tools.ietf.org/html/rfc3986#section-2.2
URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

montclairguy




msg:4569467
 10:29 pm on Apr 30, 2013 (gmt 0)

Thank you both for your replies.

montclairguy




msg:4569751
 7:38 pm on May 1, 2013 (gmt 0)

Let me try and be a little more clear. I need a rewrite rule that will handle both of these GET requests:

http://www.example.com/search/something=whatever
http://www.example.com/search/?something=whatever

It should rewrite them to this, without redirection:

/usr/local/apache/cgi-bin/s.cgi?something=whatever

I've been unsuccessful in my attempts to get this working.

@Lucy - yes, I misspoke and meant non-capturing groupings aren't supported.

lucy24




msg:4569798
 10:55 pm on May 1, 2013 (gmt 0)

non-capturing groupings aren't supported.

That means Apache 1.something. Ouch! Is it absolutely out of your power to upgrade?

I need a rewrite rule that will handle both of these GET requests:
<snip>
without redirection

Ouch again. The two inputs you show are fundamentally different, meaning two different URLs for the same page. Are you in a position where Duplicate Content will not be a problem? Most likely situation: all this is happening in non-indexed search-result pages. Otherwise we're getting into "Just show him how to aim the gun" territory ;)

In what follows, any dotdotdot ... represents stuff you can fill in for yourself depending on exact environment.

First form:
... %{QUERY_STRING} (.+)
... /search/$ ...
Second form: conditionless
... /search/(.+) ...

The first gets rewritten to
... usr/local/apache/cgi-bin/s.cgi?%1 [L]
while the second ends in
... $1

Is there any possibility whatsoever that a request could come in with both? That is, content after /search/ but also a query string? If so, you've got several possibilities, depending on the form of the query and also on the mood of your Apache installation. Simplest is to capture the path as above, shove it into a fresh query string and then use the [QSA] flag to keep the original query.

For that matter, you could do it all in one conditionless rule:

... /search/(.*) ... ?$1 [QSA]

again depending on Apache. Early versions may get cranky about empty captures. I assume your cgi is already equipped to handle empty or invalid parameters, so this part depends entirely on Apache.

I've said /search/ throughout. That's the config-file version. If you're in htaccess-- or in a <Directory> section within config-- leave off the leading / slash.

And then there's the potential issue of a request for /index.html or similar. Since these are rewrites alone, any redirecting has already happened. But better shove in a [NS] flag to be safe. mod_dir normally executes after mod_rewrite, but it's better not to take chances. There was a spell when none of my RewriteRules worked unless I wrote them as if "index.html" was already in place. Go figure.

Then again, you could express the non-query version of the capture as ([^=]*=.*) meaning "if it doesn't contain at least one = sign, disregard it". Change to [^=]+ if you think your cgi is already working hard enough.

montclairguy




msg:4569837
 11:40 pm on May 1, 2013 (gmt 0)

The two inputs you show are fundamentally different, meaning two different URLs for the same page. Are you in a position where Duplicate Content will not be a problem? Most likely situation: all this is happening in non-indexed search-result pages. Otherwise we're getting into "Just show him how to aim the gun" territory ;)


Initially, I was attempting to correct this:

http://www.example.com/search/?whatever=whatever

to this:

http://www.example.com/search/whatever=whatever

via a 301 redirect to avoid duplicating content. Again, I'm very weak in the mod_rewrite area. If you've got a solution to that, so I can roll this all up, that would be great.

Regarding your solutions, this works for both URLs with and without question marks:

RewriteRule ^/search/(.*) /usr/local/apache/cgi-bin/s.cgi?$1 [L,T=application/x-httpd-cgi,NE,NC,QSA]

This also worked for both versions:

## ? version rule ##
RewriteCond %{REQUEST_METHOD} GET
RewriteCond /usr/local/apache/htdocs/search/?$1 !-f
RewriteCond %{QUERY_STRING} (.+)
RewriteRule ^/search/$ /usr/local/apache/cgi-bin/s.cgi?%1 [L,NE,NC,T=application/x-httpd-cgi]

## no ? version rule ##
RewriteCond %{REQUEST_METHOD} GET
RewriteCond /usr/local/apache2/htdocs/search/$1 !-f
RewriteRule ^/search/(.*) /usr/local/apache/cgi-bin/s.cgi?$1 [L,T=application/x-httpd-cgi,NE,NC]

If you can help me redirect search/?whatever to search/whatever in that first rule (my attempts have failed) I can close this thread.

Lastly, you mentioned the need for the T flag -- without it, the script source gets output instead of executed.

lucy24




msg:4569860
 12:53 am on May 2, 2013 (gmt 0)

without it, the script source gets output instead of executed

Ugh. Isn't there some AddType or AddHandler directive that covers this globally? See under mod_mime in the Apache docs. In fact the blurb on the [T] flag specifically says it "has the same effect as" AddType.

redirect search/?whatever to search/whatever

You should be able to simply turn around your existing rule.

RewriteCond %{QUERY_STRING} (.+)
RewriteRule ^/search/$ http://www.example.com/search/%1? [R=301 et cetera]


Better put this after the "index.html" redirect. Since the directory doesn't really exist it should not matter, but you never know what a search engine might take it into its head to request. Note the trailing ? to get rid of the query string.

This redirect is entirely separate from the subsequent cgi-script rewrite.

montclairguy




msg:4572794
 6:29 pm on May 10, 2013 (gmt 0)

Works perfectly now. Thank you very much for your help.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved