Forum Moderators: phranque

Message Too Old, No Replies

Recursive [N] RewriteRules -p1:v1 -> &p1=v1

Make rule to convert ShortURL -param:value pair to full URL

         

msandersen

4:25 pm on Jun 26, 2005 (gmt 0)

10+ Year Member



I'm working on ShortURLs for an F/OSS PHP system to be distributed that can have 3rd party modules, hence have unknown parameters. I have ShortURLs for the system, but for the moment I have some clumsy generic rules at the end to catch URLs with various number of parameters:
# [ModuleName]/[Function]-[Param1]:[Value1]-[Param2]:[Value2]-[Param3]:[Value3].phtml
RewriteCond %{REQUEST_FILENAME}!-f
RewriteRule ^([^-+/]+)[/-]([^-+.]+)\.p?html?$ index.php?module=$1&func=$2 [L,NS,QSA]
RewriteRule ^([^-+/]+)[/-]([^-+]+)-([^-:]+)[:-]([^-.]*)\.p?html?$ index.php?module=$1&func=$2&$3=$4 [L,NS,QSA]
RewriteRule ^([^-+/]+)[/-]([^-+]+)-([^-:]+)[:-]([^-]*)-([^-:]+)[:-]([^-\.]*)\.p?html?$ index.php?module=$1&func=$2&$3=$4&$5=$6 [L,NS,QSA]
RewriteRule ^([^-+/]+)[/-]([^-+]+)-([^-:]+)[:-]([^-]*)-([^-:]+)[:-]([^-]*)-([^-:]+)[:-]([^-\.]*)\.p?html?$ index.php?module=$1&func=$2&$3=$4&$5=$6&$7=$8 [L,NS,QSA]

As you can see, this accounts for the modulename as a virtual directory, and the function name as a virtual file with up to 3 parameter/value pairs. But I want to be able to do more, rather than have URLs with 4 or more parameters as long URLs.

This has led me to experiment with recursive rules, but that's asking for trouble on Apache, since it ends in an infinite loop every time.

I made rules for a 3rd party filter for MS IIS, which is based on Apache with differences, it uses an up-to-date RegEx library and can match more than 9 back references, and does recursion without problems, and it will stop after a certain number of repeats anyway. From what I understand, this has only come to Apache as of the latest Apache2 version, 2.0.54, with the RewriteOptions MaxRedirects. I have 2.0.52, but I cannot count on other servers having the latest Apache, or even Apache2. It's troublesome testing when I have trouble breaking out of the infinite loop and ending up with 200Mb+ RewriteLogs. The non-working code: WARNING: On Apache less than 2.0.54, it will cause a infinite loop.

# Requires Apache 2.0.54
RewriteOptions MaxRedirects=20

# Recursive rule to convert all parameters for index page from -p:v to &p=v
RewriteCond %{QUERY_STRING} ^(.*?)-([^:&]*):(.*)$
RewriteRule ^index\.php\?.*$ index.php?%1&%2=%3 [N]

... Some other rules...

# Don't process remaining rules if not .phtml or .htm(l) files.
RewriteCond %{REQUEST_URI}!^.*\.p?html?$
RewriteRule ^.*$ - [PT]

... Specific system rules...

# [ModuleName]/[Function](-[Param1]:[Value1] .. -[ParamN]:[ValueN]).(p)htm(l)
# Rewrite Module/Function-P1:V1-P2:V2-P3:V3.(p)htm(l) to index.php?module=Module&func=Function-[ParamN]:[ValueN]
RewriteRule ^([^/+]+)/?\+([^-.]+)([^.]*)\.p?html?$ index.php?module=$1&func=$2$3 [N]

The problem is, whereas the URL is rewritten correctly and directed to the top, Apache splits the URI and Query string, but at the top it adds the WRONG arguments! Hence the Arguments rewrite condition fails, and it goes to the bottom rule, rewrites, goes to the top, adds the wrong asrguments, and so on...

As for an extract of the RewriteLog, I'll spare you the full 200Mb+ log and post this extract instead, showing a log level of 3 with a few bits edited out, the rules looping 3 times, using a rule for a slightly different style of URL with the Rewrite rule
RewriteRule ^([^/+]+)/?\+([^-.]+)([^.]*)\.p?html?$ index.php?name=$1&file=$2$3 [N]
where I've distinguished the shortURL with module/+index

(2) init rewrite engine with requested uri /pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(1) pass through /pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/gallery
-> C:/Apache2/htdocs/pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^index\.php\?.*$' to uri 'gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/gallery
-> C:/Apache2/htdocs/pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^([^/+]+)/?\+([^-.]+)([^.]*)\.p?html?$' to uri 'gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(2) [per-dir C:/Apache2/htdocs/pntest/] rewrite gallery/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php?name=gallery&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo

(3) split uri=index.php?name=gallery&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo
-> uri=index.php, args=name=gallery&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo
(3) [per-dir C:/Apache2/htdocs/pntest/] add per-dir prefix: index.php
-> C:/Apache2/htdocs/pntest/index.php
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/index.php
-> C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^index\.php\?.*$' to uri 'index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/index.php
-> C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^([^/+]+)/?\+([^-.]+)([^.]*)\.p?html?$' to uri 'index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(2) [per-dir C:/Apache2/htdocs/pntest/] rewrite index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php?name=index.php&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo

(3) split uri=index.php?name=index.php&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo
-> uri=index.php, args=name=index.php&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo
(3) [per-dir C:/Apache2/htdocs/pntest/] add per-dir prefix: index.php
-> C:/Apache2/htdocs/pntest/index.php
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/index.php
-> C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^index\.php\?.*$' to uri 'index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(3) [per-dir C:/Apache2/htdocs/pntest/] add path info postfix: C:/Apache2/htdocs/pntest/index.php
-> C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] strip per-dir prefix: C:/Apache2/htdocs/pntest/index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
(3) [per-dir C:/Apache2/htdocs/pntest/] applying pattern '^([^/+]+)/?\+([^-.]+)([^.]*)\.p?html?$' to uri 'index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml'
(2) [per-dir C:/Apache2/htdocs/pntest/] rewrite index.php/+index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo.phtml
-> index.php?name=index.php&file=index-full:1-set_albumName:jessica-id:100_1113_001-include:view_photo

In case you wonder, the IIS filter is Isapi Rewrite (google for it, I'm not allowed to post the URL here), and the rules that work for it is:

# Recursive rule to convert all parameters for index page from -p:v to &p=v
RewriteRule (.*)(index\.php\?.*?)-([^&:]*):(.*) $1$2&$3=$4 [NS,I]

# Don't process remaining rules if not .phtml or .htm(l) files.
RewriteRule .+\.(?!p?htm).* $0 [L]

... system rules ...

# New modules: [ModuleName]/[Function]-[Param1]:[Value1]-[Param2]:[Value2] ... [ParamN]:[ValueN].phtml
RewriteRule (.*)/([^/]+)/([^-+.]+)([^.]*)\.p?html?(?:\?(.*))?(#.*)? $1/index.php\?module=$2&func=$3$4(?5&$5:$6) [N]


The [NS] flag is not the Apache [NS] flag, it means loop this rule only without going to the top, a handy type of flag Apache could use also. Maximum iterations default to 32, but can be changed.

- Martin

[edited by: jdMorgan at 10:37 pm (utc) on July 5, 2005]
[edit reason] Disabled graphic smilies. [/edit]

msandersen

4:32 pm on Jun 26, 2005 (gmt 0)

10+ Year Member



Slight error: I confused the two versions; The recursive rewrite rule contained an extra \+ as in the rewwrite log, it should be:
RewriteRule ^([^/+]+)/?([^-.]+)([^.]*)\.p?html?$ index.php?module=$1&func=$2$3 [N]

gergoe

6:39 pm on Jun 26, 2005 (gmt 0)

10+ Year Member



How about this one?

#
# Recursively translate the query string
RewriteCond %{QUERY_STRING} ^(.+)&params=([^:]+):([^-]*)(-(.+))?$
RewriteRule ^index\.php$ index.php?%1&%2=%3&params=%5 [NC,N]
#
# Rewrite the original (forged) request to index.php, and translate the main parameters
RewriteRule ^([^/]+)(/(\+([^-]+)(-(.*))?\.p?html?)?)?$ index.php?module=$1&func=$4&params=$6 [NC,N]

It does the same as you intended to do, but with few little differences (the params parameter, which I used to keep the original parameters in it). I tested it quickly and it seems to do the trick and even a bit more...

But if you want to keep working with yours, then the first thing you should check is the first RewriteRule:


RewriteRule ^index\.php\?.*$ index.php?%1&%2=%3 [N]

The ^index\.php\?.*$ regex pattern is not valid there, as it will never match anything because RewriteRules only matched against the filename (everything before the question mark), so in this case against index.php. The query string is only accessible through the RewriteConds (as you did it). So this should look like this:

RewriteRule ^index\.php$ index.php?%1&%2=%3 [NC,N]

About the dead loop I haven't got a clear idea, but I have the impression that you did not posted all of your rules, and it seems that these rules are highly affected by that (somewhere on the way to these rules, the .phtml is appended to the request).

Have fun.

msandersen

5:08 pm on Jun 27, 2005 (gmt 0)

10+ Year Member



I missed the \?.*$ problem. Thanks. As mentioned, I've just ported the Apache rules to Asapi Rewrite, which works on the whole Request_Uri, including query string. Had trouble until I got used to having to match the whole URI. While working on it I got the idea to try recursion. Worked like a charm there. So I backported it to the original Apache htaccess file, but forgot to remove the extra bit.

I like your idea, but it doesn't solve the Recursive problem. It is probably only a problem in .htaccess files, but I would like to know if it's something I'm doing wrong, or Apache. Could it be specific to this version? I actually have Apache 2.0.43, not 2.0.52 as mentioned above, running on an Windows XP test server.
The problem is when it goes to the top, it adds a postfix, the /func-p:v.phtml bit from the original query.

Here's my test rules for a test rewrite.php file:

RewriteCond %{QUERY_STRING} ^(.+)&rwparams=-([^:&]+):([^-]*)(.*)?$
RewriteRule ^rewrite\.php$ rewrite.php?%1&%2=%3&params=%4 [NC,N]

RewriteCond %{QUERY_STRING} ^(.+)&rwparams=(.*)$
RewriteRule ^rewrite\.php$ rewrite.php?%1%2 [L]

RewriteRule ^([^/]+)/([^-.]+)([^.]*)\.p?html?$ rewrite.php?module=$1&func=$2&rwparams=$3 [NC,N,QSA]


There are no other rules.

Specifically, from the log, using the URL
[localhost...]
the last rule correctly rewrites the URL:
rewrite Example/view-test:1-test2:23-val:one-val2:two-album:jessica.html
-> rewrite.php?module=Example&func=view&rwparams=-test:1-test2:23-val:one-val2:two-album:jessica

It then splits the URI:
split uri=rewrite.php?module=Example&func=view&rwparams=-test:1-test2:23-val:one-val2:two-album:jessica
-> uri=rewrite.php, args=module=Example&func=view&rwparams=-test:1-test2:23-val:one-val2:two-album:jessica&lang=eng

adds per-dir prefix: rewrite.php -> C:/Apache2/htdocs/pntest/rewrite.php
and adds path info postfix: C:/Apache2/htdocs/pntest/rewrite.php
-> C:/Apache2/htdocs/pntest/rewrite.php/view-test:1-test2:23-val:one-val2:two-album:jessica.html

Whoa! Hang on! It's re-injecting /view-test:1-test2:23-val:one-val2:two-album:jessica.html
this is why it is looping endlessly! It will always add this to rewrite.php when it goes to the top, and the last rule will always match it, rewrite it, and send it to the top, add this postfix, and so on...

For argument's sake, I then tried changing the 1st rule to:
RewriteRule ^rewrite\.php/ rewrite.php?%1&%2=%3&params=%4 [NC,N]
This matches, the Condition is met, and the 1st parameter in the URL rewritten: &test=1
But then it goes through the same process of adding prefix and the same damn postfix, so I'm back to square one, stuck in a loop!

Question: why is it doing this, and how to stop it? I keep thinking I'm missing something perhaps not-so-obvious here. If it worked for you, maybe sopmething is different. If I cannot have a reliable rule that will work on older versions, then it won't work. A 3rd-party IIS derivative filter would actually have one up on Apache, which is an annoying thought.

[edited by: jdMorgan at 10:38 pm (utc) on July 5, 2005]
[edit reason] Disabled graphic smilies. [/edit]

msandersen

5:17 pm on Jun 27, 2005 (gmt 0)

10+ Year Member



I might add the 1st rewrite condition matches with this horror query, which is doubled up with the 1st query string:
RewriteCond: input='module=rewrite.php&func=view&rwparams=-test:1-test2:23-val:one-val2:two-album:jessica&module=Example&func=view&test=1&params=-test2:23-val:one-val2:two-album:jessica&lang=eng'
pattern='^(.+)&rwparams=-([^:&]+):([^-]*)(.*)?$' => matched

[edited by: jdMorgan at 10:38 pm (utc) on July 5, 2005]
[edit reason] Disabled graphic smilies. [/edit]

msandersen

5:25 pm on Jun 27, 2005 (gmt 0)

10+ Year Member



Hmm, that's probably because I left the QSA flag on the last rule... The URL is still screwed up, though.

msandersen

6:34 pm on Jun 27, 2005 (gmt 0)

10+ Year Member



All right! Got it working!

The final rules are:

RewriteCond %{QUERY_STRING} ^(.+)&rwparams=-([^:&]+):([^-]*)(.*)$
RewriteRule ^rewrite\.php rewrite.php?%1&%2=%3&rwparams=%4 [NC,N]

# Filter out rwparam
RewriteCond %{QUERY_STRING} ^(.+)&rwparams=$
RewriteRule ^rewrite\.php rewrite.php?%1 [L]

RewriteRule ^([^/]+)/([^-.]+)([^.]*)\.p?html?$ rewrite.php?module=$1&func=$2&rwparams=$3 [NC,N,QSA]

I decded to rename the params= to rwparams= for added uniqueness, but forgot to convert one instance in the condition, which then promptly failed.

The otehr important factor is that the Rewrite rule is looking for ^index\.php without the $ on the end, for the first 2 rules, so that the stupid postfix won't interfere. Using ^index\.php/ would also work, but with the condition looking for the rwparam= in the query string, not necessary.

I notice, as has been mentioned in the forums by jdmorgan before, once rewritten, it restarts once more, but thankfully this time pass the recursive rules by, and on to any further rules.

Only question now is: How efficient is it? Is having a bunch of specific rules with up to 3 parameter pairs, as mentioned in the first post, better up front, and leaving this for the last resort?

[edited by: jdMorgan at 10:36 pm (utc) on July 5, 2005]
[edit reason] Disabled graphic smilies. [/edit]

gergoe

10:36 pm on Jun 27, 2005 (gmt 0)

10+ Year Member



I'm not really sure about the impact of this on the performance, but it sounds oblivious, that it is better to have the "hardcoded" version implemented for the sake of the performance. Don't trust me in this matter, I never did any profiling on this, it is just my impression. But regardless of which way you choose, you can still switch to another one, or finetune the rules to make them faster.

Good luck.