homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Circular mod_rewrite Redirect Problem
Dolemite




msg:1504189
 9:18 pm on May 8, 2003 (gmt 0)

I have a dynamic site that I'm making a bit more SE-friendly by removing the querystrings with mod_rewrite.

I've got that part working, but the problem lies in getting the pages that are already indexed by google switched to the new format using 301 redirects.

It seems that the RewriteRules could be rewriting each other, since the old URLs need to be 301 redirected to the new URLs, but the new URLs are being rewritten to the old URLs.

Its like this:

I need index.php?p=* to permanently redirect to post*, while post* URLs are rewritten to index.php?p=*.

I've tried various permutations of these rules (different order, with/without [L], etc.):

RewriteRule ^(.*)index\.php\?p\=(.*)$ $1post$2 [R=301]
RewriteRule ^(.*)post(.*)$ $1index.php?p=$2 [L]

The post* to index.php?p=*. rewriting works, but I can't get the other rule to send a 301 (I've checked the headers). It seems like Apache sees these two rules as contradictory and throws one out, though they should both be valid and [L] should prevent any looping.

 

jdMorgan




msg:1504190
 12:42 am on May 9, 2003 (gmt 0)

Dolemite,

I hope I understand what you're trying to do...

If so, the first problem is that the query string is not available to be tested in a RewriteRule, so add a RewriteCond to test it and put the parameters into backreference %1. Then use the RewriteRule to look only for the index.php page URL, and build the destination URL using both $1 and %1, thus:

RewriteCond %{QUERY_STRING} ^p=(.*)$
RewriteRule ^(.*)index\.php$ http://www.yourdomain.com/$1post%1 [R=301,L]

Note that I'm also assuming that you want to redirect the request at this point to give search engines the new "public" URL, and so I've added the [L]. The next rule will then be applied after the client takes the 301 redirect and comes back with a new request for a *post*-formatted URL. At this point the client will be served *index.php?p=something tranparently - with no external redirect - by your original Rule:

RewriteRule ^(.*)post(.*)$ /$1index.php?p=$2 [L]

HTH,
Jim

<corrected> Added "http://domain_name/" to 301 Rule </corrected>

Dolemite




msg:1504191
 5:05 am on May 9, 2003 (gmt 0)

Thanks, Jim.

From your code I'm getting looping requests for [mydomain.com...]

Adding "?" to the end of the rewriterule prevents the querystring from being passed:

RewriteCond %{QUERY_STRING} ^p=(.*)$
RewriteRule ^(.*)index\.php$ [yourdomain.com...] [R=301,L]

But things still seem to loop when /post10 is requested. Any ideas?

I've found this page [fluidthoughts.com] which uses the following example for something similar:

RewriteCond %{QUERY_STRING} id=([^&;]*)
RewriteRule ^/$ [%{SERVER_NAME}...] [R]
RewriteRule ^/([^\/]*)/?$ /index.php?id=$1 [L]

That redirects a querystring to a directory and then rewrites it back to the querystring. I haven't been able to adapt it for my use, but if it works, it might be a good reference.

BTW, I'd really like to get this right before I try any more possibilities. I get the feeling that if I set off another infinite loop, my web host is going to wish a slow and painful death on me. ;)

Dolemite




msg:1504192
 7:55 am on May 9, 2003 (gmt 0)

I'm not exactly sure how apache applies .htaccess rules. It seems like either last [L] doesn't work as I imagine, or .htaccess rules are applied to both the original URI and rewritten URI.

Taking that into consideration, I can't imagine how to prevent looping given what I need to do. I'm thinking something like this could work:

1 RewriteCond %{REQUEST_URI}!index\.php
2 RewriteRule ^(.*)post(.*)$ $1index.php?p=$2 [L]
3
4 RewriteCond %{REQUEST_URI}!post[0-9]
5 RewriteCond %{QUERY_STRING} ^p=(.*)$
6 RewriteRule ^(.*)index\.php$ http://www.yourdomain.com/$1post%1? [R=301,L]

But then again, what happens when you click on /post* and its rewritten to index.php?p=* on line 2? Wouldn't the .htaccess rules then be applied to the rewritten URL, which would match the RewriteCond's on lines 4 & 5? Then you're back in a loop, unless REQUEST_URI has somehow stayed constant through this process.

jdMorgan




msg:1504193
 2:11 pm on May 9, 2003 (gmt 0)

Dolemite,

I think it's simply a matter of the order of your rules. The external 301 redirect must be first, and it must have an [L] flag. The second rule must also have an [L] flag, but must not be an external (R=301 or R=302) redirect.

The code should not loop if the [R=301,L] Rule is processed first, and the internal redirect is placed after that.

If the [301,L] is processed first, then the client browser is redirected to the new URL, and processing stops for that request.

Then the client returns, this time with the URL pattern matching the second rule, which is an internal rewrite only. So in this case, after the internal rewrite is done no further rewriting takes place, either.

You must not have any other RewriteRules which invoke a 301 redirect for URLs matching the output of the second rule. This includes additional .htaccess files in subdirectories, script outputs, etc. If you do, then you will indeed get an infinite loop.

You could set this test up with some dummy files and URLs so that testing doesn't affect your live pages.

I'm not intimately familiar with your site, so please cite what URL is input and what that URL is rewritten to in each case - I can't tell whether you are telling me the input URL which failed, or what the RewriteRule output was when it failed - and in this case in particular, that can be very confusing!

Try the rules in this exact order. Also, remove (comment out) any other RewriteRules you may have which might affect requests for "index.php". That should sort out the looping, and then let's see about other problems.

RewriteCond %{QUERY_STRING} ^p=(.*)$
RewriteRule ^(.*)index\.php$ http://www.yourdomain.com/$1post%1 [R=301,L]
RewriteRule ^(.*)post(.*)$ /$1index.php?p=$2 [L]

Jim

Dolemite




msg:1504194
 8:48 pm on May 9, 2003 (gmt 0)

Using that code, the 301 works, but the internal rewrite doesn't. Checking my logs, I can see that the internal rewrite (requests for /post*) is generating a 301 also, so it must be matching the redirect RewriteCond.

So
[mydomain.com...]
301 redirects correctly & as intended to
[mydomain.com...]

but
[mydomain.com...]
also 301 redirects to
[mydomain.com...]

If I knew more about how .htaccess/apache worked, I think I could figure this out for myself, but there just isn't much documentation on how this all fits together. I.E., after a redirect, is the new URL completely reprocessed through .htaccess? Or does an internal rewrite change the REQUEST_URI variable?

It makes sense to me that the redirect needs to occur before the internal rewrite, but changing the order of the corresponding lines doesn't seem to affect their function.

There are a fairly limited number of the querystring URLs in the google index, so I could do more hard-coded redirects. Logically, they shouldn't be any different, though.

Another option is that I could have basically another index.php file, completely identical, but just with a different filename that I'd use for the internal rewrite (/post* to /index2.php?p=*). I'm sure this would work, so maybe I'll just do that. At this point, its more like something I need to conquer than a search for the most practical solution, but it may not be worth the time to figure this mess out.

Dolemite




msg:1504195
 9:12 pm on May 9, 2003 (gmt 0)

Yep, the 2nd index.php file thing works like a champ. I don't like it, but it works.

I guess I can deal with it.

jdMorgan




msg:1504196
 9:17 pm on May 9, 2003 (gmt 0)

If I knew more about how .htaccess/apache worked, I think I could figure this out for myself, but there just isn't much documentation on how this all fits together. I.E., after a redirect, is the new URL completely reprocessed through .htaccess? Or does an internal rewrite change the REQUEST_URI variable?

After a 301 or 302 redirect which includes the [L] flag, rewriting is terminated, and the 30x response is sent back to the client (browser).

If there is no [L] flag, then subsequent RewriteRules will be processed if their RewriteConds are met.

In the case of a server-internal rewrite, only the REQUEST_URI is changed - the client is NOT notified, and again rewriting will continue in the absence of an [L] flag.

I can't for the life of me figure out how a request for [yourdomain.com...] is matching the first rule, unless the whole requested URL is [yourdomain.com...]
If that's the case, you'll need to add another RewriteCond to block the loop:

RewriteCond %{REQUEST_URI} !post/index\.php
RewriteCond %{QUERY_STRING} ^p=(.*)$
RewriteRule ^(.*)index\.php$ http://www.yourdomain.com/$1post%1 [R=301,L]
RewriteRule ^(.*)post(.*)$ /$1index.php?p=$2 [L]

If that isn't applicable, then it is possible that something outside of the scope of this single .htaccess file is causing the recursion. In that case, you will have to rename or move files, scripts, or directories to allow the rewrites to become completely mutually-exclusive. IF the recursion is occurring in the context of a single request - that is, if it is happening in the same process, then you may be able to use mod_rewrite's capability to set and test environment variables to work around the problem. However, those environment variables will not persist from one client request to the next, so it's a big "if."

Then there is another thing... mod_rewrite works only between the receipt of a request and the serving of a resource. It cannot be used to rewrite URLs which are output from a script, unless that script is returning those URLs to the client with a 301/2 redirect header - in which case mod_rewrite will see those as the incoming URLs of new requests. In that case, the script and mod_rewrite code is going to loop unless you make changes to create mutual exclusion.

Again, please be as specific as possible about the requested URLs and their querystrings - I suspect the devil is in the details there.

I hope this makes sense!
Jim

Dolemite




msg:1504197
 10:55 pm on May 9, 2003 (gmt 0)

Thanks for all your help, Jim.

I just can't figure it out either! Its really just as simple as I'm describing it...no fancy URL-writing scripts, complicated query strings, etc. I've stared at the damn thing for hours myself and can't make sense of it, so I think for now I'll stick with the renamed file solution. Its not terribly elegant, but its completely transparent to users, so I can't complain too much.

Again, thanks for all your help and if I do ever figure out, I'll be sure to let you know how. ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved