Forum Moderators: phranque

Message Too Old, No Replies

Parsing redirect QSA to rewrite and omitting from substitution

         

jameshopkins

4:00 pm on Jun 2, 2009 (gmt 0)

10+ Year Member



In theory, what I'm trying to do should be relatively simple, but it's taking a long time for me to figure out.

As some of you may already know from my last posts, I am performing an external redirect then internal rewrite.

My external redirect is:-


RewriteRule ^ukhti/newapp.asp/application.nav/param.classic.uk([a-z-]+)/walk.yahlo.uk([a-z-]+)$ http://localhost/${level1-1:$1}? [R=301,L,QSA]

As you'll notice, I'm using the [QSA] flag to append the original query string onto the redirect URL. The resulting URL looks like /horses?cm_re=hello_-_hello

I then want to be able to then invoke this querystring parameter in my internal rewrite pattern, whilst at the same time, removing the query string from the substition value.


RewriteCond %{QUERY_STRING} ^cm_re=([a-z-_]+)$ HTTP [NC]
RewriteRule ^([a-z-]+)?$ http://www.site.com/ukhti/newapp.asp/application.nav/param.classic.uk${level1:$1¦401}/walk.yahlo.uk${level1:$1¦401}?cm_re=%1 [P]

First off, I seem unable to capture the query string value using my current RewriteCond statement. Moreover, I'm unable to then remove this appended query string from my substitution URL (you'll notice the question mark appended, which I thought could do this)

Any help would be much appreciated

jdMorgan

6:41 pm on Jun 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The pattern "^cm=([a-z-_]+)$" may not work as expected. Note the "dual function" apparent here of the hyphen character: Within grouped alternate sets, it serves as a "range specification indicator" unless it is escaped with a backslash. Also, various versions of the regex libraries bundled with the server OS (on Apache 1.x) or with Apache 2 itself won't handle it properly unless it appears in certain positions in the group. I suggest "^cm=([a-z_\-]+)$"

Next, use of [QSA] is only needed if you wish to append new query string data to the original query string. If you want to clear the query string, you put a "?" at the end of your substitution path. If you wish to append additional name/value pairs, then you specify those name/value pairs in the substitution path, and use [QSA]. If you simply want to keep the original query string, then you omit the "?" in the substitution path and do not use [QSA] -- mod_rewrite's default behaviour is to pass query strings through without modification.

That said, it appears to me that you are explicitly re-attaching the query string in your second rule, rather than removing it...

Jim

jameshopkins

11:44 pm on Jun 2, 2009 (gmt 0)

10+ Year Member



Thanks of the regex advice- I'll check this out.

I assumed I needed to use the [QSA] flag in my 301 redirect so that my original query string would be parsed to the subsequent internal rewrite. From there, I thought I would be able to grab that query string from the 301 redirect URL via my RewriteCond statement, and then be able to parse it into the pattern for my internal rewrite. I could then remove the query string from my substitution via using the question mark.

I can however see how these steps may be unneccessary, and that I could possibly make it simpler.

An example of how this would work is below:-

I would have an URL such as ukhti/newapp.asp/application.nav/param.classic.ukgh/walk.yahlo.ukgh?cm_re=hello_-_hello

This URL would then be transformed by the 301 redirect to a form such as /horses. At this stage, I assumed that I had to use the [QSA] flag so that I could transfer the original query string (which I want to use in the pattern within my subsequent internal rewrite) to the subsequent internal rewrite statement.

My internal rewrite would then accept a request for /horses and would fetch content from the internal server filepath at ukhti/newapp.asp/application.nav/param.classic.ukgh/walk.yahlo.ukgh?cm_re=hello_-_hello.

If there's a simpler way of retaining the original query string, but removing it from the internal rewrite substitution, then I would be really appreciated if you could tell me how I could achieve this.

Regards

James

jdMorgan

1:35 am on Jun 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't really understand what you're trying to do here. The first rule is a redirect, which terminates the current HTTP transaction, sending a response to the client saying, "the resource you requested has moved, please ask for it again at this new URL." That's the end of processing and the end of the HTTP transaction for this request.

Then the client comes back with the new URL you sent it, and you take that request and perform a proxy through-put using your rewrite map.

That's what your code is written to do. Now whether that's what you want or not, I don't know. But the fact that you're still not sure about the [QSA] flag indicates that a thorough review of the mod_rewrite documentation is probably in order...

Jim

[edited by: jdMorgan at 1:36 am (utc) on June 3, 2009]

jameshopkins

8:34 am on Jun 4, 2009 (gmt 0)

10+ Year Member



My code is working how I would like it to; I think ignoring [P] flag and absolute path in my internal rewrite is wise, to understand this issue. I'll be the first one to admit, that the through-put is a hacky way of doing things.

I simply want to retain the original query string from the old URL (before external redirect) so that it is included in the server filepath in the internal rewrite, whilst removing it in the internal rewrite substiution.

g1smd

9:28 am on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The query string passes through RewriteRule untouched unless you add a ? to the end to clear it, or you specify different parameters instead.

That is, a query string is not a part of the URL. It is data "attached" to the end of URL. RewriteRule directly only deals with the filepath and filename parts. If you need to look at, or change, anything else then you need to use a RewriteCond to look at HTTP_HOST or SERVER_PORT or QUERY_STRING or whatever.

jameshopkins

9:45 am on Jun 4, 2009 (gmt 0)

10+ Year Member



Thanks for the reply, g1smd.

I was originally thinking that I do indeed need a RewriteCond to lookup the query string, which is what I _tried_ to do initially, but to no avail. To me, the code below looks correct from a logic point of view; does it look OK syntactically?

RewriteCond %{QUERY_STRING} ^cm_re=([a-z-_]+)$ HTTP [NC]
RewriteRule ^([a-z-]+)?$ ukhti/newapp.asp/application.nav/param.classic.uk${level1:$1¦401}/walk.yahlo.uk${level1:$1¦401}?cm_re=%1

Interesting you mention that the query string passes through the RewriteRule untouched. However, presumably you are talking about the internal rewrite as opposed to the preceding redirect. The only reason I am using [QSA] in my redirect is to retain that query string so it can be processed subsequently by the internal rewrite.

EDIT: Absolute path and [P] flag removed from internal rewrite in example for clarity.

My worry is that if I don't capture the query string in my internal rewrite pattern, then it won't be invoked when the substitution URL is requested.

g1smd

10:07 am on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The query string passes through RewriteRule untouched unless you add a ? to the end to clear it, or you specify different parameters instead - that happens whether you are using that rule to generate a redirect or a rewrite, both will be the same.

jameshopkins

10:39 am on Jun 4, 2009 (gmt 0)

10+ Year Member



Guys, I do apologise for my misunderstanding.
I completely missed the fact that I had a question mark appended to the substitution in the redirect, and this had forced me to use [QSA] to re-append the query string. Doh!

This leaves me with one issue; I've appended a question mark to the substitution in the rewrite, however the query string isn't being removed.


RewriteRule ^([a-z-]+)?$ http://www.site.com/ukhti/newapp.asp/application.nav/param.classic.uk${level1:$1¦401}/walk.yah.uk${level1:$1¦401} [P]

Does anyone know why this is?

[edited by: jameshopkins at 11:14 am (utc) on June 4, 2009]

g1smd

11:02 am on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have to ask what the heck most of that junk in the URL is actually for?

The URL format is horrendous in many ways.

Are you kidding me?

http://www.[b]example[/b].com/uk[i]lalalala[/i]/[i]lalala[/i]app.aspx/app.detail/params.frames.y.tpl.uktsv.item.tsv.cm_scid.TB-TSV/left.html.¦tsvmetadrill,html/walk.yah.ukHB?cm_re=LN-_-On[i]LaLaLaLa[/i]-_-TodaysSpecial[i]Offer[/i]

Is this some kind of joke?

[Self-Mod Note: Obfuscated identifying parts of URL with: lalala]

jameshopkins

11:21 am on Jun 4, 2009 (gmt 0)

10+ Year Member



Is this some kind of joke?

:)

Are you able to give me any pointers as to why the question mark in the substitution doesn't remove the query string?

Thanks for the considerate URL obfuscation, BTW

g1smd

11:34 am on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't see a question mark in the target URL of the Rule.

I see one on the pattern, which makes the pattern optional, and therefore matches for the URL request of "/" as well as a string of letters.

jameshopkins

12:38 pm on Jun 4, 2009 (gmt 0)

10+ Year Member



Thanks for your continued help.
I'm going to go away and do some experimenting with RewriteCond

jdMorgan

12:56 pm on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond can be used to *test* query strings and other context variables not directly-visible to RewriteRule. But only RewriteRule takes action to *change* anything.

There is no apparent need for a proxy through-put here. Have you tried simply adding this much-discussed question mark to the substitution path and changing from proxy through-put syntax to internal rewrite syntax?


RewriteRule ^([a-z-]+)?$ [b]/uk[/b]hti/newapp.asp/application.nav/param.classic.uk${level1:$1¦401}/walk.yah.uk${level1:$1¦40[b]1}?[/b] [L]

Use of a rewritemap makes things a little complicated, and the regex patterns and the documentation can be a bit cryptic. But overall, mod_rewrite 'works as expected' and it really isn't as hard to use as this thread would seem to imply...

Jim

jameshopkins

2:10 pm on Jun 4, 2009 (gmt 0)

10+ Year Member



There is no apparent need for a proxy through-put here

I am coming up with a proof-of-concept where I'm replicating the URL structure on our site, but running this mod_rewrite statement from my localhost. So the job of the [P] flag is to parse the content of our live site, when I request that file locally.

I'v found where I was going wrong; my thinking behind this issue was all wrong from the start.

My understanding was that if the query string was removed by the question mark in the redirect, then that query string would be lost forever and couldn't be requested within the substitution in the internal rewrite. I notice from Live Headers (Firefox add-on) that this in fact isn't the case; the query string is just being masked.

I've really appreciated everyones perseverance with regards to my difficulty in understanding mod_rewrite. I'm sure there'll be more questions in the future, but I'll try and keep them to a minimum!

jdMorgan

3:37 pm on Jun 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I notice from Live Headers (Firefox add-on) that this in fact isn't the case; the query string is just being masked.

I'm not sure what you mean by "masked," but the query string will in fact be completely removed by the redirect if the substitution URL ends with a question mark. Any 'survival' of that query information may be due to client-side caching effects, and cannot be counted on for normal site operation.

If you need to pass that query string's "information" through the redirect, while at the same time *not* having it show in the redirected-to URL, then look into setting a client-side cookie to 'keep' that state information. The client will then send that information back to the server for each request to the defined cookie 'realm', unless and until you set the cookie to a different value or let it expire (either by date and time or by browser session close, as you choose when creating the cookie). However, I wouldn't count on search engines to handle or use your cookie, so be aware that they will likely 'lose' that query information if you move it to a cookie when redirecting.

In essence, this technique moves the query information from the client's HTTP request line (loosely-speaking "the URL") to an HTTP header that accompanies that client request line when sent to the server. You can use Live HTTP Headers on any site that uses cookies to see/watch how cookies are set by server responses and sent in client requests.

There's no need to "keep questions to a minimum." However, be aware that every detail is important when working at this level -- technical details, code details, functional details, and the details that members post in their responses to your questions must not be overlooked if a 'quick and correct answer' is sought.

With server configuration changes like URL-rewriting, it's *all* details, and getting one wrong (e.g. a single typo) can take your site down instantly -- if you're lucky. If you're not so lucky, it can quietly and slowly erode the search ranking of your site's pages over time, and --especially given today's economic realities-- potentially put you out of business... :o

Jim