Forum Moderators: phranque

Message Too Old, No Replies

rewrite redirect loop

how to stop?

         

Doood

4:43 pm on Apr 16, 2008 (gmt 0)

10+ Year Member



I'm using GeoIP to redirect a few contries to a different page, but I think those redirected are being stuck in a loop because when they land on the page they're redirected back to it again and again. How do you stop that?

Is this correct to use in htaccess?


RewriteEngine on
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule ^(.*)$ /block/block.php [R,L]

One problem I had was using "$1" after the RewriteRule url like the instructions said to do. It created a "Error 500 - Internal server error" and wouldn't work until it was removed.

I turned on the RewriteLog and in 5 seconds it was 75MB! This is a high traffic site though.

The Log showed they were redirected to /block/block.php and then redirected to /block/block.php again.
So what am I doing wrong here?

Oh, it's apache 2.2.3

edit:
if the "$1" is added to the rewrite rule after the url and you visit domain.com/geoip/index.html then it will redirect you to domain.com/block/block.phpindex.html

[edited by: Doood at 4:55 pm (utc) on April 16, 2008]

jdMorgan

5:53 pm on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Simply exclude block/block.php from being redirected to itself:

RewriteEngine on
#
RewriteRule %{REQUEST_URI} !^/block/block\.php$
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule .* /block/block.php [R,L]

I used a RewriteCond to do the exclusion in a generic way, in case you want to change how the rule works.

Replace the broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

Doood

6:24 pm on Apr 16, 2008 (gmt 0)

10+ Year Member



I set it like you mentioned and it kind of worked.

It goes like this,
request page
checks if it is block/block.php
checks country for match
redirects to block/block.php
checks if it is block/block.php again
check country again
then ignores rewrite because of "initial URL equal rewritten URL"

With loglevel set to 4 it shows 17 steps to process this.

jdMorgan

6:33 pm on Apr 16, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I'm not sure which actions are taken by your script and which by the rule. I'm also not sure what your script does, or what your (possibly-various) URLs are.

In general, such a rule should only be executed on a single entry page URL, or only if a cookie is not present indicating that they've set they're preferred language, or whatever it is that you're trying to accomplish here.

Here's the code with comments. Only you can decide if it's appropriate to your goal or not:


# Turn on the rewriting engine
RewriteEngine on
#
# If the requested URL-path is NOT /block/block.php
RewriteRule %{REQUEST_URI} !^/block/block\.php$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# redirect ALL requested URLs to block/block.php
RewriteRule .* /block/block.php [R,L]

So this will redirect all requested URLs except for /block/block.php to /block/block.php if the country code matches -- including requests for images, external JavaScripts, style sheets, error pages, robots.txt. etc.

Jim

Doood

6:49 pm on Apr 16, 2008 (gmt 0)

10+ Year Member



I'm just redirecting all requests from these countries based on their IP to a blank page so they'll stop clicking on my links which the referrers are paid for. It's an ad program.

I found that removing the first slash and the "R" redirects correctly with the same end result with fewer steps because once they're redirected to block/block.php it tries to send them there again based on their country but the server stops after sending them there twice. But this only works if the htaccess is in the site root.

Seems like this would be a very common problem being in a loop when redirecting, but maybe not.

[edited by: Doood at 6:50 pm (utc) on April 16, 2008]

g1smd

12:13 am on Apr 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The [R] serves a 302 redirect, and that forces the browser to request the new URL from the server.

The target URL of the redirect is "domain unsafe" as it does not fix any domain canonicalisation issues in the process. The 302 redirect also allows search engines to index an infinite number of URLs for the "blocked" page error message. That is also very bad.

Changing to a 301 redirect [R=301] and specifying the domain name in the redirect would mostly fix those issues.

However, using any sort of redirect causes the browser to request the new URL, and the user therefore sees that new URL on screen.

Removing the [R] and NOT specifying the domain name changes the redirect into a rewrite. With a rewrite, the user does not see a new URL in their browser.

In this situation you MUST make sure that the "blocked" error-message page has a meta robots noindex tag in it. That ensures the "blocked" error message can NEVER be indexed for any of the real page URLs of the site. Each URL of the site can return two completely different pages of content. Only one of those versions should be indexed. The error message must never be indexed.

Doood

4:35 pm on Apr 18, 2008 (gmt 0)

10+ Year Member



I'm still having troubles with this.

To make it easier, what if the country rules and redirection are only applied if a certain page is requested? This way it would avoid any chance of looping. Plus I'm really only needing this for one page.

Say we want the rules to apply only if they're visiting this page
mydomain.com/links.php
and redirect them if the country matches to
mydomain/block/block.php

So all other pages on the site are accessable from all countries and the page /links.php redirects to block/block.php only if they're from a blocked country.

The actual url would be more like
mydomain.com/links.php?id=123&banner=one
but I don't guess that matters.

So how would this be done?

jdMorgan

4:42 pm on Apr 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That description doesn't sound like it's "codeable" to me.

If I were you, I'd simply return a 403-Forbidden response and be done with this.


# If the requested URL-path is NOT 403 error page
RewriteRule %{REQUEST_URI} !^/path-to-403-page$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# Deny the request
RewriteRule .* - [F]

Jim

Doood

5:39 pm on Apr 18, 2008 (gmt 0)

10+ Year Member



The problem with 403 errors is that I'll probably have a minimum of 20k redirects per day and the error log would be kinda big and legit errors would be hard to find.

This rewrite stuff is confusing to me.

It just seems like if all calls to a specific page can be redirected then extra rules could be able to be added in a chain.

Like,

# if the page requested is /block/block.php then stop, if not proceed
RewriteRule %{REQUEST_URI} ^/block/block\.php$ [L]
# if the page is /links.php then proceed to next condition in chain
RewriteCond ^/link.php$ [C]
# check for country match, if yes proceed to RewriteRule. If no match then pass thru to page /links.php
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule ^(.*)$ block/block.php [L]

Maybe it's just wishful thinking that this would work. Or maybe I'm not explaining it correctly.

Doood

9:59 pm on Apr 18, 2008 (gmt 0)

10+ Year Member



It's the simple things that get ya.

I stopped all further processing of the rules by placing an htaccess in the /block/ directory with,


RewriteEngine off

So now if they're blocked and sent to this directory all the rule processing that sent them here stops.

If this is wrong or bad practice let me know please.

g1smd

10:50 pm on Apr 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As long as you avoid a redirection chain - multiple redirect actions for any one initial request - you should be OK with that (as far as I can see).

jdMorgan

2:23 am on Apr 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since the code I posted above explicitly prevents looping, I'm at a loss to explain why you had to go to these measures to stop recursion. The only reason that code might loop would be a typo in transcription to your "real" URL-paths.

Jim

Doood

2:47 pm on Apr 19, 2008 (gmt 0)

10+ Year Member



JD, the one you posted produced a 500 error so after removing the first slashes and the 'R' it redirects but it still checks their country again and tries to redirect them again.

This is what the RewriteLog shows happens (country changed to US for testing)
request index.php
applying pattern '%{REQUEST_URI}' to uri 'index.php'
applying pattern '^(.*)$' to uri 'index.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'index.php' -> 'block/block.php'
internal redirect with /block/block.php [INTERNAL REDIRECT]
applying pattern '%{REQUEST_URI}' to uri 'block/block.php'
applying pattern '^(.*)$' to uri 'block/block.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'block/block.php' -> 'block/block.php'
initial URL equal rewritten URL: /home/mysite/public_html/block/block.php [IGNORING REWRITE]

And this is what it shows when turning off the RewriteEngine in the /block/ directory,
request index.php
applying pattern '^(.*)$' to uri 'index.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'index.php' -> 'block/block.php'
internal redirect with /block/block.php [INTERNAL REDIRECT]

Using LogLevel 4 there are 17 total processes in the first one and with the engine off in the sub-directory there are only 7.

I probably messed up your code trying to get it to work.

jdMorgan

3:03 pm on Apr 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The code you posted above is invalid. %{REQUEST_URI} cannot be used as a rewriterule pattern. Please see the mod_rewrite documentation.

Unless there is a problem with your server configuration, this code should work fine, with no looping:


# Turn on the rewriting engine
RewriteEngine on
#
# If the requested URL-path is NOT /block/block.php
RewriteRule %{REQUEST_URI} !^/block/block\.php$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# redirect ALL requested URLs to block/block.php
RewriteRule .* /block/block.php [L]

The only changes or additions I'd make would be to exclude robots.txt and included-object (e.g. image) requests from this rule, *after* the basic function has been tested successfully.

Completely flush your browser cache after each test case.

Replace the broken pipe "Š" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.

Jim

Doood

3:46 pm on Apr 19, 2008 (gmt 0)

10+ Year Member



Maybe you made a typo cause your code has %{REQUEST_URI} as a RewriteRule?

I'll do some more testing and see what gives.

jdMorgan

3:57 pm on Apr 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, sorry. That should be:

RewriteCond %{REQUEST_URI} !^/block/block\.php$

Jim

Doood

4:43 pm on Apr 19, 2008 (gmt 0)

10+ Year Member



When changed to RewriteCond then you're code works perfect.

It opened up a new can of worms since it's also applying the patterns to favicon.ico and redirecting favicon.ico to /block/block/php
I'm sure that can easily be fixed though.

One other thing, in the rewrite log is it normal to see this,


[perdir /home/mysite/public_html/] strip per-dir prefix: /home/mysite/public_html/index.php -> index.php


[perdir /home/mysite/public_html/] add per-dir prefix: block/block.php -> /home/mysite/public_html/block/block.php


[perdir /home/mysite/public_html/] strip document_root prefix: /home/mysite/public_html/block/block.php -> /block/block.php

Doood

8:31 pm on Apr 25, 2008 (gmt 0)

10+ Year Member



Is it ok for the rewrite conditions to check the visitor for every single page and image they view? Wasn't sure if that's how it's supposed to work.

I just noticed that while browsing around my site the rewrite log showed it checked the country of my own IP about 100 times. It applied the pattern to every single thing I viewed.

I guess though if a condition was set to not apply patterns to jpg's then it would check every file to see if it was a jpg or not though right?

jdMorgan

8:50 pm on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The pattern in the RewriteRule is evaluated first. Make it as specific as possible.

Then if that RewriteRule pattern matches, the RewriteConds are evaluated in order; Put the simplest/most efficient RewriteConds first, saving the more CPU-intensive --such as your GeoIP call, file-exists checks, and reverse-DNS lookups-- for last.

Think carefully about what kind of URLs you need to block and are willing to run the complete check on: URLs that resolve to a certain directory level? URLs with a "filetype" on them, or URLs with a specific filetype on them? Or maybe you want to check them if they don't have a filetype or a certain filetype? URLs with certain words in them?

There are a million ways to classify URLs, but your goal is to pick a method that clearly puts them into three groups: Those to be blocked, those to be fully-checked to see if they need to be blocked, and those that you wish to by-pass the full check on. Use the RewriteRule and RewriteCond patterns to construct that "URL classifier system."

Just another way of looking at it, but this is actually the difficult part of using mod_rewrite once you fully-understand mod_rewrite coding and operation -- The major work becomes devising the most efficient method of determining which URLs are and are not to be rewritten. Ah, some of the almost-random-URL e-commerce sites I've seen! ;)

Jim

Doood

9:07 pm on Apr 25, 2008 (gmt 0)

10+ Year Member



haha. I think if the developers of mod_rewrite tried to make this more difficult... they couldn't.

If I knew mod_rewrite well enough, I would create a program that would spit out the correct code to use based on what you want to do with it. Then charge people top dollar to use it.

jdMorgan

9:42 pm on Apr 25, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are many of those automatic code-generators, and just about every one of them produces simply-awful code.

It's very easy to tell posters here who have read the mod_rewrite documentation and have written their own code from those who are simply posting the code 'written' for them by their site's cPanel -- In many cases, the self-written code (although wrong) is better (e.g. more-efficiently written) than the cPanel code!

As I said, it is not the code-writing that is difficult -- take note of how few "instructions" mod_rewrite has compared to, say, PERL, PHP, C, or any assembly language. Rather, the difficulty is deciding how to describe the URLs you do and do not wish to rewrite (based on the capabilities of mod_rewrite and regular-expressions pattern-matching), and that is a mental exercise.

The primary challenge is to visualize the entire URL-space of a site and divide it --in the most efficient manner possible-- into URLs that should be rewritten by a particular rule, and those that shouldn't.

Take a look at some of the longer threads here. They often start out "I want to rewrite all URLs to this script." First question: "Do you really want to rewrite the script's own URL to the script? That's a loop!" How about robots.txt? What about other "well-known-location" files like /w3c/p3p.xml or sitemap.xml, or labels.rdf? What about your "stats" page and control panel URLs? Do you really want to rewrite image and multimedia resource requests to your script? Do you really want to rewrite your custom 500-Server Error page to your script?--Consider what happens if you have a PHP configuration error if you do that!

See, it's not the coding, but the unambiguous specification of the requirements that is the hard part. And when you get right down to it, it's actually far easier to describe the two URL groups using regular-expressions, instead of English!

Jim

Doood

11:39 pm on Apr 25, 2008 (gmt 0)

10+ Year Member



The difficulty for me now is not getting the rewrite to perform the action I want, but it's getting it done the most efficient way possible.

There are just too many ways to seemingly do the same thing (on the surface) but with very different results in the rewrite log resulting in extra work for the server.

For me, the small amount of rewrite docs means there are many questions left unanswered and the only way to know is all day long trial and error.