Forum Moderators: phranque

Message Too Old, No Replies

Infinite redirect loop (.htaccess) with MaxMind GeoIP

Redirect based on visitor country on a WordPress multi-site

         

seanenns

9:55 pm on May 26, 2014 (gmt 0)

10+ Year Member



I'm using WordPress multi-site to handle visitors from different countries.

If they're from Canada, they go to the Canadian site. Everywhere else, currently, they'd go to the US.

I'm just trying to manage the Canadian rule for now. In chrome, I get an error:

"This webpage has a redirect loop"

This is my entire .htaccess file


GeoIPEnable On

RewriteEngine On
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA$ [NC]
RewriteRule ^(.*)$ http://subdomain.example.com/cad/$ [R,L]

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteBase /
RewriteRule ^index\.php$ - [L]

# add a trailing slash to /wp-admin
RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(wp-(content|admin|includes).*) $2 [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(.*\.php)$ $2 [L]
RewriteRule . index.php [L]
</IfModule>

# END WordPress
#
# <Files wp-config.php>
# order allow,deny
# deny from all
# </Files>

# order deny,allow
# deny from all
# allow from 11.111.111.11
# allow from 22.22.222.22

# directory browsing
# Options All -Indexes

# <Files ~ "^.*\.([Hh][Tt][Aa])">
# order allow,deny
# deny from all
# satisfy all
# </Files>


I've read through a few other answers here. They all look more complicated than what I'm doing, though my thought is that it's detecting me once I get to somedomain.example.com/cad/ and redirecting me to somedomain.example.com/cad/.

Thanks

lucy24

10:26 pm on May 26, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteEngine On
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA$ [NC]
RewriteRule ^(.*)$ http://subdomain.example.com/cad/$ [R,L]

Where's the rule or condition that says "Only redirect if the request is not already for the stated subdomain"?

[R] should be [R=301] so it won't default to 302. But this doesn't affect the problem at hand.

Incidentally, what's the NC for? Can "ca" or "Ca" occur?

seanenns

10:56 pm on May 26, 2014 (gmt 0)

10+ Year Member



That's an excellent question. So, I added this:

RewriteCond %{REQUEST_URI} !^/cad/$

But it didn't fix the loop.

The NC probably isn't required... I've been looking at a half dozen or more examples. Some use it, others don't.

lucy24

1:37 am on May 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you access your raw logs? The error message from your browser means that it has put in 20 or 30 requests and is throwing in the towel. Each of those 20 or 30 requests will be listed in logs. Alternately, you can use an extension such as LiveHeaders (Firefox, but there are probably equivalents for other major browsers).

Empty your browser's cache, or better yet use a different one. Redirect responses can be cached, so the browser might not make a fresh request. (In my own experience, a forced refresh takes care of things, but some browsers are more stubborn.)

The fact that the error message comes from your browser means that the problem concerns external redirects. An internal rewrite would lead to an error message from the server, and it would generally be the same in all browsers. So your problem is easier to diagnose than it could have been.

Some use it, others don't.

The [NC] flag is only a few bytes to type, but for the server it creates 2^n times as much work, where n is the number of characters to be matched (here just two). That's why you only use [NC] when a differently cased form can actually occur-- and, of course, when casing isn't part of what you're matching against (like real "Googlebot" vs. spoofed "GoogleBot").

RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

Good Lord. Just how many /wp-admin/ directories have you got?! Incidentally, [_0-9A-Za-z] can be expressed as \w, leaving only the - hyphen to be separately named. Unless you've got URLs with non-ASCII characters that you need to exclude; \w is pretty comprehensive. But, again, just how many directories are there?

seanenns

3:49 am on May 27, 2014 (gmt 0)

10+ Year Member



Well, there's only one actual wp-admin directory (but, one for each country, /cad/, /us/, and the main directory. Also the network admin, so, 4?) . Those are the rules WordPress wrote by itself. I think it has to do with the multi-user config details, but I'm not sure.

So, I removed the NC. Thanks for the explanation.

Re: Raw logs. They all say: 70.67.151.192 (that's me) - - [26/May/2014:20:38:36 -0700] "GET /cad/$ HTTP/1.1" 301 255 "-" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0"

I've used different browsers, different computers, emptied my cache, done a hard refresh. Nada.

lucy24

4:04 am on May 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, but what do they say next? Or do they say the same thing 30 times in a row?

A redirect is an instruction to the browser to make a fresh request. That means it will show up in your logs as a separate entry, because the server's memory is shorter than mine.

:: wait, stop, rewind ::

GET /cad/$

Is that a literal $ sign in the request?!

###. I'm sorry I missed that; it's right there in the rule. Get rid of it! The $ (ending anchor) only has meaning in patterns, including conditions.

http://subdomain.example.com/cad/

That extraneous $ is also what's making the condition fail.

Those are the rules WordPress wrote by itself.

As long as you're editing your htaccess, change the rule to list only the exact path(s) to your /wp-admin/ directory. But what's the rule for anyway? I thought /wp-admin/ was only for the site administrator (that's you) and malign robots, so why do you even need a redirect? Presumably you'll get the name right when you type it yourself. Is it just to prevent the later !-f business from kicking in if you or the Ukrainians do make a mistake?

seanenns

4:30 am on May 27, 2014 (gmt 0)

10+ Year Member



They all say the same thing 30 times in a row.

So, followed your advice. My rule now reads like so:

GeoIPEnable On

RewriteEngine On
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule http://somesite.example.com/cad/ [R=301,L]

Now, there's no error, but it's not redirecting either

phranque

4:44 am on May 27, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i think this is what you want:
GeoIPEnable On

RewriteEngine On
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule ^(.*)$ http://somesite.example.com/cad/$1 [R=301,L]


the $1 is a backreference to the first capture group (grouping parentheses) in the pattern.
the "1" was missing in your original code.

seanenns

5:06 am on May 27, 2014 (gmt 0)

10+ Year Member



Well, that's new.

Now I get a 500 internal server error.

lucy24

6:11 am on May 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, double ###. I really need to read more slowly. Not "remove $" but "ADD 1". That makes all the difference in the world-- but the wrong form wouldn't create a 500 error or infinite loop, just a bunch of unwanted redirects.

Are you using phranque's code exactly as shown?

RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule ^(.*)$ http://somesite.example.com/cad/$1 [R=301,L]


That certainly should do it. If you're getting a 500 error, there's a typo somewhere.

Incidentally... You've got
subdomain.example.com
and also
/cad/

What would the two other possibilities
www.example.com/cad/blahblah
and
subdomain.example.com/blahblah
lead to?

seanenns

6:34 am on May 27, 2014 (gmt 0)

10+ Year Member



Yeah,so.

It's a subdomain because it's a development site for a client. The configuration is like this.

The domain is example.com
I created a subdomain. Let's call it clientarea.
I installed a multi-site wordpress installation at clientarea.example.com

It's e-commerce. There's a Canadian store, and a US store.

The Canadian store is clientarea.example.com/cad/
The US store is at clientarea.example.com/us/

clientarea.example.com is the main installation. That's where all the wordpress files are.

Eventually, it'll just be clientsite.com/cad/, and clientsite.com/us/

As regards above, I've copied the code exactly as posted. I still get a 500 error.

lucy24

8:06 am on May 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Comment-out the three lines. (Put a # at the beginning of each line: the two Conditions and the Rule.) Verify that the 500 error has gone away.

Now un-comment the second condition (the one with /cad/) and the rule itself, leaving the first condition (the one with {ENV}) commented-out. See whether you get a 500.

Meanwhile I will go check something.

seanenns

4:03 pm on May 27, 2014 (gmt 0)

10+ Year Member



Fair point. I commented out the rules, and was still getting the 500 error.

So, I restored my .htaccess file from a backup. I did that and checked it, no 500 error.

I copied phranque's code example above exactly and pasted it in. The only change I made to the WordPress rules was to delete the second "RewriteEngine On."

I commented out the second condition, and the rule itself. And, it's back to the redirect loop.

This is my htaccess file, currently.

(I've left in the crazy characters for the WordPress rules, since I know they work. I don't want to fix two things at the same time, but will look at streamlining it once my current issue is resolved.)

# GeoIPEnable On

RewriteEngine On
# RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,L]

# BEGIN WordPress
RewriteBase /
RewriteRule ^index\.php$ - [L]

# add a trailing slash to /wp-admin
RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(wp-(content|admin|includes).*) $2 [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(.*\.php)$ $2 [L]
RewriteRule . index.php [L]

lucy24

7:37 pm on May 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I commented out the rules, and was still getting the 500 error.

Ha! That means there's a typo somewhere else.

In Apache (unlike, say, robots.txt) a blank line has no syntactic meaning. So it's a very good practice to put a blank line after each ruleset (RewriteRule with any preceding RewriteCond) for your own sanity and convenience.

RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

Always include a leading slash / in all targets. There are no exceptions to this rule. This, in turn, makes the RewriteBase line unnecessary. It does no harm, it just isn't needed because it will never be invoked.

Also: when a rule creates an external redirect (R flag), give the full protocol-plus-domain in the target.

So this rule
# RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,L]

leads to an infinite redirect loop? (You can un-comment the CA condition, since we seem to have established that it wasn't responsible for the 500.) And logs show 30 consecutive requests for the identical file?

We need to take a closer look at that subdomain. Since servers don't operate in four dimensions,
clientarea.example.com
is not physically meaningful. It has to be located somewhere else, most likely a directory inside the
example.com
directory. It doesn't have to be there, though; it could even be on an entirely different server. Find out where it is. This is important.

phranque

7:38 pm on May 27, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



a 500 internal server error should have a corresponding message in the web server error log file

seanenns

9:20 pm on May 27, 2014 (gmt 0)

10+ Year Member



Yes. This rule


# RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !^/cad/
RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,L]


Leads to a redirect loop.

clientarea.example.com is physically located on the server at example.com/clientarea.

phranque

9:31 pm on May 27, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



what do the URLs in the redirect chain look like?

seanenns

9:43 pm on May 27, 2014 (gmt 0)

10+ Year Member



When I check the raw access logs, this is what I get:

... "GET / HTTP/1.1" 301 254 ...
... "GET /cad/ HTTP/1.1" 301 263 ...
... "GET /cad/index.php HTTP/1.1" 301 263 ...

The ellipses denote identical information. Time stamp, IP address, browser and OS info.

phranque

11:20 pm on May 27, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



i don't see a loop in that pattern.
is there more to follow that shows a repeat?
where does the 3rd redirect go?

what is causing is the 2nd redirect?
i don't see anything in your directives that would make that happen.

lucy24

12:34 am on May 28, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... "GET /cad/ HTTP/1.1" 301 263 ...
... "GET /cad/index.php HTTP/1.1" 301 263 ...

Independent of anything else, there's something seriously wrong with this sequence. Why is anything at all getting externally redirected to "index.php"? That's only supposed to come out in an internal rewrite, after all external redirecting is done.

Have you posted your entire current htaccess? Let me make sure of this before I start scrutinizing.

... "GET / HTTP/1.1" 301 254 ...

Did you put back the ^/CA condition? It now occurs to me that-- aside from any other issues-- requests for the root should never be redirected. So the body of the rule should say
(.+)

with + rather than * to exclude empty requests (that's what the root looks like in htaccess). The anchors aren't needed in this situation; by default a Regular Expression starts as soon as it can, and continues as long as it can.

i don't see a loop

It doesn't have to be a loop. Browsers simply count redirects. (I looked this up once.) But I don't see what happens after this third request: is it then 20 or more of the same thing?

RewriteRule ^index\.php$ - [L]

This rule needs to go before any rules with an [R] flag. This is a special CMS thing; it's an exception to the general "external redirects before internal rewrites" rule.

phranque

12:46 am on May 28, 2014 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It doesn't have to be a loop. Browsers simply count redirects.

i doubt any browsers stop at 3 redirects by default.

seanenns

1:12 am on May 28, 2014 (gmt 0)

10+ Year Member



On my way out the door, here's my entire htaccess.

# GeoIPEnable On

RewriteEngine On
# RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
# RewriteCond %{REQUEST_URI} !^/cad/
# RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,L]

# BEGIN WordPress
RewriteBase /
RewriteRule ^index\.php$ - [L]

# add a trailing slash to /wp-admin
RewriteRule ^([_0-9a-zA-Z-]+/)?wp-admin$ $1wp-admin/ [R=301,L]

RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(wp-(content|admin|includes).*) $2 [L]
RewriteRule ^([_0-9a-zA-Z-]+/)?(.*\.php)$ $2 [L]
RewriteRule . index.php [L]

[edited by: phranque at 3:49 am (utc) on May 28, 2014]
[edit reason] exemplified domain [/edit]

seanenns

11:24 pm on May 29, 2014 (gmt 0)

10+ Year Member



So, I tested the redirect code on another server, and it works. I was successfully able to detect my country, and redirect the traffic.

Knowing it wasn't that, I commented out all of the WordPress code, and I still get a redirect loop.

So, I caused the loop, and downloaded my raw error logs. It shows the following message in my logs 19 times (I guess it gave up on the 20th)

myIP - - [29/May/2014:16:23:15 -0700] "GET /cad/404.shtml HTTP/1.1" 301 263 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0"

I know now though that it's not being caused by my .htaccess. It's something else.

seanenns

12:56 am on May 30, 2014 (gmt 0)

10+ Year Member



So, it works now.

Modified one line, added another:

GeoIPEnable On

RewriteEngine On
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^CA
RewriteCond %{REQUEST_URI} !\.(jpe?g|gif|bmp|png|tiff|css|js)$ [NC]
RewriteCond %{REQUEST_URI} !^/(cad/|index\.php) [NC]
RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,NC,L]

Now I'll go through and tidy it up, but this is a huge victory.

You were all awesome, and invaluable. Thanks!

lucy24

2:00 am on May 30, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteCond %{REQUEST_URI} !\.(jpe?g|gif|bmp|png|tiff|css|js)$ [NC]
<snip>
RewriteRule ^(.*)$ http://clientarea.example.com/cad/$1 [R=301,NC,L]

Remaining issues:

By its nature, this redirect is only intended to apply to requests for pages. You want to make it so the server doesn't have to stop and evaluate conditions every time it gets a request for an image or stylesheet. Exact wording depends on your URL structure; one approach is

RewriteRule ^([^.]+(\.html|/))?$ http etcetera


replacing ".html" with whatever extension(s) you actually use.

Leave off [NC] flags; wrongly cased requests should get 404s. (The rule as written doesn't need an [NC] anyway, because there's no text to match. This will change if you put text such as ".html" into the body of the rule.)

It's something else.

Well, that's a little worrying. Doesn't your rule explicitly say that requests for /cad/ are not to be redirected?

Sometimes you have to look at error logs and access logs side by side to get the full picture. Here I see a 404 page being requested by name-- which should never be happening in the first place. It looks like an internal request that got externally redirected. Does each subdomain have its own error documents? If not, you're at risk for internal redirect loops.

seanenns

3:06 am on May 30, 2014 (gmt 0)

10+ Year Member



You can ignore my statement that it was something else. I was mistaken. It was definitely the .htaccess.

It just seemed strange that it would work on one server, and not the other, given that each server is configured the same (I'm close with the sysadmin). That the only difference was that on server 1, I tried the rule in isolation on a "standard" site, and on server 2, it was a multi-site WordPress.

And then when I commented out all of the WordPress rules, and it still didn't work, well... you can see where I'm going.

Anyway, that's all academic. Now that it works, I can optimize it.

Thanks again!