QUERY STRING issues with static pages

Forum Moderators: phranque

Message Too Old, No Replies

QUERY STRING issues with static pages

dave_c00

1:39 pm on Oct 14, 2010 (gmt 0)

Hi,

I am having a few little issues with my QUERY_STRING htaccess code. I need to get the following to rewrite:

http://www.example.com/products/3/dvds/?a=10&b=name&c=desc

to go to

http://www.example.com/products/3/dvds-10-name-desc.html

I am creating all my pages statically for speed purposes.

Cheers,

Dave

dave_c00

2:32 pm on Oct 14, 2010 (gmt 0)

Think I have got it -

Something like:

RewriteCond %{QUERY_STRING} ^(a|b|c|d|e|f)=([^&]*)$
RewriteRule ^products/([^/]+)/([^/]+)/$ products/$1/$2-%1-%2.html [L]

sublime1

3:04 pm on Oct 14, 2010 (gmt 0)

(edit: your solution may work as well or better ... but read mine for another view anyway :-)

Dave --

I suspect we need a little more information, but I'll assume that the static pages already exist as files and that for your public URLs will continue to use the query strings as in your first example. If not, please let us know. Also assuming Apache .htaccess rewrite rules.

To get at the query string, you'll need a RewriteCond and will probably want to use the %{REQUEST_URI} variable the pattern to match against, with a regular expression that "captures" each of the components of the query string -- they can be back-referenced in a subsequent RewriteRule. In your case, the rewrite should be internal (so don't use the "R" flag).

So something similar to this (you can probably be more specific in the pattern if you know something should be a number or a character).


RewriteCond %{REQUEST_URI} ^(/products/[0-9]+/.+)/\?a=(.*?)&b=(.*?)&c=(.*?)$
RewriteRule ^.*$ %1/%2-%3-%4.html [L]

I am not sure about the $1 back reference in this case. In the REQUEST_URI I think the URI will contain a leading /, whereas (if in a .htaccess context) the path matched after the RewriteRule will not contain the leading slash. Also not that the first capture group in the RewriteCond does not capture the trailing slash on the path part of the URI, which is why I added in the rewrite -- mostly just for clarity.

------------

HOWEVER....

Are you rolling your own caching? Even if you aren't using Apache 2.2 and don't have access to the server configuration, check out the Apache 2.2 Caching Guide: [httpd.apache.org...] -- it will reveal the many (many!) complexities of caching. There are scores of pre-existing caching mechanisms that deal with all the special cases.

How is your cache updated if content changes? Are you sure all the files will exist? Will files ever expire? Are there exceptions?

I have found the the most reliable software on the sites I manage is the software that didn't need to get written specially either because we decided it was trickier than it was worth, or because someone had an existing, tested, flexible method of doing what we want that worked in our environment.

Tom

dave_c00

8:27 am on Oct 15, 2010 (gmt 0)

Thanks Tom,

I am still having deeper problems, but your way looks like a better solution.

I have got this, which works well:

RewriteCond %{REQUEST_URI} ^(.+)/(.+)/(.+)/(.+)/$
RewriteRule ^.*$ %1/%2/%4-%3.html [L]

products/3/15/dvds/
- rewrites to -
products/3/dvds-15.html

But if I add the following:

RewriteCond %{REQUEST_URI} ^(.+)/(.+)/(.+)/(.+)/\?p=1$
RewriteRule ^.*$ %1/%2/%4-%3.html [L]

Surley then
products/3/15/dvds/?p=1
- would still rewrite to -
products/3/dvds-15.html

And then

RewriteCond %{REQUEST_URI} ^(.+)/(.+)/(.+)/(.+)/\?p=(.+)$
RewriteRule ^.*$ %1/%2/%4-%3-%5.html [L]

making
products/3/15/dvds/?p=1
- rewrite to -
products/3/dvds-15-1.html

Even replacing the (.+) with (.*?) on the query part doesn't do anything.

From the original post, the slashes were getting caught in the matches.

Dave

dave_c00

9:12 am on Oct 15, 2010 (gmt 0)

It appears it will not match the ? as I have got the following working:

RewriteCond %{REQUEST_URI} ^(.+)/(.+)/(.+)/(.+)/p=(.+)$
RewriteRule ^.*$ %1/%2/%4-%3-%5.html [L]

products/3/15/dvds/p=1
- rewriting to -
products/3/dvds-15-1.html

I would have thought \? would work fine.. I am sure I have used the back slash before as my char to ignore the special meaning of the next charachter.

dave_c00

9:20 am on Oct 15, 2010 (gmt 0)

REQUEST_URI is obviously not picking up my query string is it... I annoy myself sometimes..

dave_c00

9:52 am on Oct 15, 2010 (gmt 0)

$_SERVER['REQUEST_URI'] is picking up the whole url . . .

My apache is doing weird things.

jdMorgan

6:42 pm on Oct 15, 2010 (gmt 0)

No, not weird: The variables are differently-scoped in mod_rewrite versus PHP.

To get the URL-path only in mod_rewrite, use the pattern in the RewriteRule itself (preferred for efficiency), or use %{REQUEST_URI} in a RewriteCond.

To get the query string in mod_rewrite, use %{QUERY_STRING} in a RewriteCond.

To get the entire client request line in mod_rewrite, including URL-path, query string, URL-fragment, and request protocol, use %{THE_REQUEST} in a RewriteCond. Note that this is the entire request line sent by the client, exactly as it appears as a quoted string in your raw server access log file.

Which form you need to use depends on exactly what you are doing. If you do not use the query-stringed path as an internal script filepath, then the simple QUERY_STRING method should work. However, if you are internally rewriting the 'friendly' URLs back to a form that resembles the URL-path-plus-query, then that rule and this new one may together create an infinite loop, and you will have to use the THE_REQUEST method to avoid that.

Simple version of your added rule above:


RewriteCond %{QUERY_STRING} ^p=1$
RewriteRule ^([^/]+/[^/]+)/([^/]+)/([^/]+)/$ /$1/%3-%2.html [L]

But that does not comport with the stated goal in your first post which was to rewrite
URL-path /products/3/dvds/?a=10&b=name&c=desc to
filepath /products/3/dvds-10-name-desc.html
That would be answered by


RewriteCond %{QUERY_STRING} ^a=([0-9]+)&b=([^&]+)&c=([^&])$
RewriteRule ^([a-z]+/[0-9]+/[a-z]+)/$ /$1-%1-%2-%3.html [L]

assuming all-lowercase names and descriptions -- otherwise use [NC] flag. Also assuming that you want to rewrite all subdirectory-URL-paths -- "products" and all others matching the specified format...

Please notice that I have made the regular expressions sub-patterns much more specific. The rules I've posted may run dozens to hundreds of thousands of times faster than yours, because I explicitly define when each subpattern-match is to end. Your use of multiple ".+" subpatterns invokes a recursive loop, forcing the matching engine to try thousands of times to find a "best fit" apportionment of the requested strings among the multiple ambiguous subpatterns.

This is because on the first pass, the matching engine will match the entire string into the subpattern for "$1". It will then find that the rest of the pattern fails to match. So it will remove one character from $1 and try again. This continues until the rest of the subpatterns past $1 are no longer "starved", and it's easy to see that with a long requested URL, this may take many, many iterations.

In contrast, by using specific character-set-matches (e.g. [a-z]+ or [0-9]+) or negative-character-matches (e.g. [^/]+ meaning "match one or more characters not a slash"), the improved pattern can always be matched in a single left-to-right pass.

Jim