Forum Moderators: phranque

Message Too Old, No Replies

Why is this an infinite loop?

mod_rewrite

         

Poor_Knight

6:51 am on Feb 4, 2009 (gmt 0)

10+ Year Member



I'm struggling with some url rewriting. One of my problems is this infinite loop. Can anyone tell me where I'm failing?

I was having success with the 2nd rule but getting 404 if trailing slash was excluded. So I added the 1st rule to added the missing trailing slash. Now I get a loop but I don;t know why.

RewriteRule ^campaign/(.*)/(.*)$ /campaign/$1/$2/ [R]
RewriteRule ^campaign/(.*)/(.*)/$ /?action=campaign-view&campaign=$1&section=$2

Thx in advance :)

g1smd

11:37 am on Feb 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The (.*) pattern is very very inefficient. It grabs "everything" then backs off one character at a time until a match is found. With two such patterns in each rule, the server might have to try thousands of permutations before a match is found.

Replace

(.*)
with
([^/]+)
or similar. The new code says "match everything until the next slash".

You need to have

[L]
on the end of both rules. That issue is causing many problems.

Your first rule is also very problematical. With just

[R]
you are serving a 302 redirect. You really need to make the first line into 301 a redirect, by adding the domain name and
[R=301,L]
to it.

If you fail to do that, you'll have two URLs delivering the same content (302 URL is indexed as well as the target). You should have a 301 redirect so that if the right URL is requested, the content is delivered and if the wrong URL is requested, the user is first redirected to make a new request for the right URL. That redirect should also 'fix' the domain name so that if non-www is requested, the www is also forced at the same time.

jdMorgan

2:10 pm on Feb 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The reason that the first rule loops is that the second (.*) subpattern matches a requested URL-path with or without a trailing slash, so that even a previously-redirected request with a trailing slash added will match it. In other words, that rule detects a request for /campaign/a/b and redirects it to /campaign/a/b/. The client then requests /campaign/a/b/, and that URL-path also matches the pattern in the first rule, and so is redirected to /campaign/a/b//. This continues, adding a slash each time, until the client or the server reaches its maximum redirection limit, and quits.

The sub-pattern supplied by g1smd above will fix this problem. However, there is another problem, in that you are issuing a 302-Moved Temporarily redirect to add the slash. Therefore, search engines will keep the non-slash URLs if they have already indexed them, and continue to send you traffic using those URLs. This means a large number of redirects that your server will *have to* issue. This isn't good for your users, your server, or for the usability of your server stats reports -- You will log both requests (the initial one and the one resulting from the redirect), and so see a doubling of the 'hit count.' Depending on how many requests you get that must be redirected, this might make your traffic-tracking invalid and useless.

By specifying a 301-Moved Permanently redirect, you tell the search engines to replace the /campaign/a/b URL in their index with /campaign/a/b/, and to give all of the ranking credit currently assigned to /campaign/a/b to /campaign/a/b/ instead.

I suggest:


# Externally redirect to add a trailing slash if one is not present
RewriteRule ^campaign/([^/]+)/([^/]+)$ /campaign/$1/$2/ [R=301,L]
#
# Internally rewrite to script, copying URL-path to query string
RewriteRule ^campaign/([^/]+)/([^/]+)/$ /?action=campaign-view&campaign=$1&section=$2 [L]

Jim

Poor_Knight

2:55 pm on Feb 4, 2009 (gmt 0)

10+ Year Member



As always, I appreciate the assistance. Your suggestions worked like a champ. I suspected the greed of .* but regex is still quite foreign to me. This recent task has been enlightening.

Now if I may be so bold, I've also struggled with getting the original URL (with the query string) to rewrite/redirect to the new format.

/?action=campaign-view&campaign=X&section=Y to /campaign/X/Y/ (which then uses the rule in the first post to display the correct page).

One attempt (applying some of your new suggestions) was:
RewriteRule ^index.php?action=campaign-view&campaign=([^&]+)&section=([^/]+)$ /campaign/$1/$2/ [R=301,L]

This does not appear to do anything.

jdMorgan

3:54 pm on Feb 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule cannot see query string data attached to URL-paths. You have to use a RewriteCond to examine the query string (see mod_rewrite RewriteCond documentation).

Furthermore, if you try to redirect a URL from "b" to "a" and also rewrite it from "a" to "b", you end up with another infinite loop unless you take steps to make that redirect conditional upon the client request. In other words, you want to externally redirect the URL only if that URL was requested by the client, and is not the result of your previously-invoked internal rewrie.

The problem may be clearer if you realize three facts: That an external redirect terminates the current HTTP transaction and relies on the client to start a new one, that HTTP is a 'stateless' protocol, meaning that the server has no 'memory' of any previously-completed HTTP transaction, and that mod_rewrite in .htaccess is recursive: If any rule is invoked, mod_rewrite processing is re-started from the top of the file to see if any more rules apply to the newly-rewritten path (this is necessary, for example, to enforce access controls).

So, the solution in non-obvious, but relies on examining the initial client request line, which is unaffected by any previous internal rewrites:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(index\.php)?\?action=campaign-view&campaign=([^&]+)&section=([^&\ ]+)\ HTTP/
RewriteRule ^(index\.php)?$ http://www.example.com/campaign/%2/%3/? [R=301,L]

In both lines, I have made "index.php" optional, so that requests for both
www.example.com/index.php?action=campaign-view&campaign=x&section=y
and
www.example.com/?action=campaign-view&campaign=x&section=y
will be redirected to
www.example.com/campaign/x/y/

Note also the change from "$1" and "$2" to "%2" and "%3", so that the proper variables in the RewriteCond are back-referenced in the substitution URL.

To clarify the RewriteCond pattern, the value of "THE_REQUEST" is the entire client request line, as shown in quotes in your raw server access log. For the "/x/y/" example URL I referred to above, it would look like this:

GET /index.php?action=campaign-view&campaign=x&section=y HTTP/1.1

I should note that the query string pattern in the RewriteCond is specific: only the name/value pairs shown will be rewritten. If some of them may be missing, if they might be in a different order, or if additional name/value pairs might be present, then the code will need to be modified so that all cases are properly-handled.

Jim

[Edit] Corrected as noted below. [/edit]

[edited by: jdMorgan at 5:01 pm (utc) on Feb. 4, 2009]

Poor_Knight

4:06 pm on Feb 4, 2009 (gmt 0)

10+ Year Member



Ah, I was experimenting with rewritecond QueryString last night but to no avail. During my readings I failed to comprehend that rewrite cannot see the query string though.

Again, I appreciate the time taken to answer my questions. I'll go apply this new-found understanding. Heeding your last advise, there are other options to be added and situations to account for but I feel more confident now and hopefully can resolve those conditions now.

Thanks again

-PK

Poor_Knight

4:47 pm on Feb 4, 2009 (gmt 0)

10+ Year Member



Ah, success to a degree. It's appending the query string onto the end of the newly formatted URL.

So a request of:
www.example.com/?action=campaign-view&campaign=x&section=y
Results as:
www.example.com/x/y/?action=campaign-view&campaign=x&section=y

The correct page is displayed, so it seems as though the internal redirect worked.

jdMorgan

5:03 pm on Feb 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, I *always* forget that... :(

Please see the correction to the code in my post above (Append a "?" to the substitution URL to clear the query string).

Jim