Forum Moderators: phranque

Message Too Old, No Replies

.htaccess rewriterule resulting in loop

geoip, virtual directories, shared pages

         

adamski

7:28 pm on Jul 20, 2009 (gmt 0)

10+ Year Member



Hi All,

I'm at my wits end with this one. Never been strong with .htaccess and this one has me baffled.

I'm trying to achieve having a site that uses geoip to direct uses to country pages. Eg domain.com > domain.com/us/ or domain.com/eu/

This works fine to achieve that


RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(DE¦FR¦GB)$
RewriteCond %{REQUEST_URI} !^/eu/
RewriteRule ^(.*)$ /eu/$1 [R,L]
Good.

Now I want to use one set of pages in the root that each virtual directory pulls and then the content will be dynamically generated.

This code works fine for that bit


RewriteRule ^eu/(.*)[/]?$ /$1 [NC,L]

What's the problem then? Well I can't get them working together because it goes into a loop. I understand why it goes into the loop but I can't figure out how to get round it. Can anyone point me in the right direction?

Total code in .htaccess in root


RewriteEngine On
RewriteBase /

RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(DE¦FR¦GB)$
RewriteCond %{REQUEST_URI} !^/eu/
RewriteRule ^(.*)$ /eu/$1 [R,L]
RewriteRule ^eu/(.*)[/]?$ /$1 [NC,L]

Hi and all that too :)

TIA

Adam

adamski

7:30 pm on Jul 20, 2009 (gmt 0)

10+ Year Member



The good in the first bit of good isn't there. That should be in the post !

adamski

9:47 pm on Jul 20, 2009 (gmt 0)

10+ Year Member



This is what I have now which is working. All pages stored in own subdirectory. Would still be interested to know if what I originally wanted can be done (ie pages in root)


1 RewriteEngine On
2 RewriteBase /
3
4 RewriteCond %{REQUEST_URI} !^/eu/
5 RewriteCond %{REQUEST_URI} !^/pages/
6 RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(DE¦FR¦GB)$
7 RewriteRule ^(.*)$ /eu/$1 [R,L]
8
9 RewriteCond %{REQUEST_URI} !^/us/
10 RewriteCond %{REQUEST_URI} !^/pages/
11 RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^US$
12 RewriteRule ^(.*)$ /us/$1 [R,L]
13
14 RewriteCond %{REQUEST_URI} !^/pages/
15 RewriteRule ^eu(.*)[/]?$ pages/$1 [NC,L]
16 RewriteRule ^us/(.*)[/]?$ pages/$1 [NC,L]
17
18 rewriteCond $1 !^pages/
19 rewriteRule ^(.*)$ pages/$1 [L]

jdMorgan

4:46 am on Jul 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why are you externally 302-redirecting to the region-specific URLs, instead of just rewriting and serving a region-specific file at the same URL? For example:

RewriteCond %{QUERY_STRING} !&?region=[^&]+&?
RewriteCond %{ENV:GEOIP_COUNTRY_CODE}>eu ^(DE¦FR¦GB)>(.+)$ [OR]
RewriteCond %{ENV:GEOIP_COUNTRY_CODE}>us ^(US)>(.+)$
RewriteRule ^(.*)$ /$1?region=%2 [QSA,L]

rewrites any incoming request, adding the 'region code' as a query string parameter, unless this has already been done.

If you want to have all of these 'region' files in root, then you've got to either give them different names or append a 'region-identifier' query-string to the filepath, or do something to make each region-page's path different (as you did with the pseudo-path "/pages/"). Also, look at AcceptPathInfo, Content-Negotiation, etc. as well.

The bottom line is that if the 'output' path of Rule B matches the 'input' pattern of Rule A, and the 'output' path of Rule A matches the 'input' pattern of Rule B, then of course you get an 'infinite' rewriting loop in .htaccess.

Note: Replace all broken pipe "¦" characters in code you see posted here with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

adamski

5:34 am on Jul 21, 2009 (gmt 0)

10+ Year Member



Not sure I fully understand what you mean Jim.

The last code posted works fine. Rather than have a query string I want the url to look like below as that will look nicer in address bar.

domin.com/uk/
domain.com/us/
domain.com/eu/

These directories are all virtual only. The content for these pages all comes from the same files in real directory /pages/

Wont the example you give just give me domain.com/region=eu ?

jdMorgan

6:11 am on Jul 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It will give you a *URL* of domain.com/

That URL, when requested from a .eu IP address, will resolve to the DirectoryIndex-defined *file* referred to as "/" with a query string of "region=eu", but when
that URL is requested from a US IP address, it will resolve to the DirectoryIndex-defined *file* referred to as "/" with a query string of "region=us"

You could do the same thing pointing to /eu and /us as you did before. Or you could use "eu" and "us" subdomains (hint here).

My main point is that there is no reason to 'expose' the inner subdirectories or the inner mechanism of your site to the client with an external redirect. And if you feel you need an external redirect, then you really ought not to be linking to the domain, but rather to the 'region directories' anyway.

This thread has two aspects: First the usability and SEO design aspects of multi-region (multi-language?) sites, and second, implementation of those SEO and usability factors. I'm actually pushing you back a bit from implementation to design to be sure you've thought this through.

I'll also apologize for giving you 'scattered' responses -- replying to different aspects in the same thread without making too much of a fuss about changing the immediate subject... But here's another:

If you want to redirect a URL, but you don't want to redirect a previously-rewritten server-filepath, then you can check THE_REQUEST to be sure that the client asked for the path the you're testing with your rule pattern, and that it didn't arise as the result of a previously-invoked internal rewrite.

Be sure you're quite clear on those terms, too: External redirect vs. internal rewrite and URL vs. filepath. None are the same thing (or even similar, really).

Jim

adamski

7:13 am on Jul 21, 2009 (gmt 0)

10+ Year Member



Thanks Jim. Will digest when I return from he day job!

adamski

7:44 pm on Jul 21, 2009 (gmt 0)

10+ Year Member



Much easier to see it in action. That code you gave works nicely Jim. I'm not keen on not having the country subdir showing so just figuring out how to do that.

Thanks thus far :)

jdMorgan

9:03 pm on Jul 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not keen on not having the country subdir showing so just figuring out how to do that.

Use the internal rewrite syntax instead of the external redirect syntax. An external redirect, by defintion, sends a response to the client (e.g. browser) that says, "The resource you requested has moved. Please ask for it again at this new URL." So the client (usually) updates its address bar, and issues a new HTTP request for what it wanted, but using the new URL provided in the redirect response.

An internal rewrite, in contrast, simply tells the server, "Instead of using the default URL-to-filepath translation of adding DocumentRoot to the requested URL-path, use DocumentRoot and this filepath instead." The client is not informed of any change in the URL-to-filename translation, and happily receives and displays whatever content the server sends back from that new filepath.

As I said above, rewrites, redirects, URLs, and filepaths... all are different, and this must be clear in order to understand this stuff. BTW, the primary purpose of an HTTP server is to translate URLs used on the Web into whatever filepaths are used by the operating system of the server, so that Web clients don't need to know anything about the filesystem inside the server. Mod_rewrite sits right at the 'boundary' between the URL-based Web and the server's filesystem, and can modify this URL-to-filename translation.

Jim

adamski

9:13 pm on Jul 21, 2009 (gmt 0)

10+ Year Member



Ok this is where I am at just now. I'm probably still not fully understanding external and internal so further reading no doubt needed.


RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_URI} !^/pages/
RewriteCond %{REQUEST_URI} !^/eu/
RewriteCond %{REQUEST_URI} !^/us/
RewriteCond %{REQUEST_URI} !^/eu
RewriteCond %{ENV:GEOIP_COUNTRY_CODE}>eu ^(DE¦FR¦GB)>(.+)$ [OR]
RewriteCond %{ENV:GEOIP_COUNTRY_CODE}>us ^(US)>(.+)$
RewriteRule ^(.*)$ /%2/$1 [R,L]

RewriteCond %{THE_REQUEST} !^/pages/
RewriteRule ^eu(.*)[/]?$ pages/$1 [NC,L]
RewriteRule ^us/(.*)[/]?$ pages/$1 [NC,L]

RewriteCond $1 !^pages/
RewriteRule ^(.*)$ pages/$1 [L]

The first block is using the code you supplied but have changed it to use country subdir. As I've put the R flag on it have I now made that external? If so I don't know any other way of updating the url in the address bar (which is what I want to happen).

Also is there a cleaner way to write all those RewriteCond for the pages?

The second block takes the info to be displayed from the pages subdir where all pages will be stored. Only one copy of a page for each page as content will be dynamically loaded (and as such may well put back the refer variable).

Last block is to make sure root reads the pages from pages subdir too.

Last thing is to update url if someone stumbles across pages so that it goes to root but I think I'm stuck on loops again. We'll see.

Thanks for your patience!

Adam

adamski

9:14 pm on Jul 21, 2009 (gmt 0)

10+ Year Member



Mean to put in will be adding the necessary to make sure trailing slashes are forced.

jdMorgan

11:39 pm on Jul 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This should be equivalent:

RewriteEngine on
#
# Externally redirect to remove trailing slashes from /en/<path>/ and /us/<path>/ URL-paths
RewriteRule ^(en¦us)/([^/]*)/$ http://www.example.com/$1/$2 [R=301,L]
#
# Externally redirect /<path> to /en/<path> or /us/<path> URLs based on geoip lookup
RewriteCond $1 !^(pages¦eu¦us)/
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^([A-Z]{2})$
RewriteCond %1>eu ^(DE¦FR¦GB)>(.+)$ [OR]
RewriteCond %1>us ^(US)>(.+)$
RewriteRule ^(.*)$ http://www.example.com/%2/$1 [R=301,L]
#
# Internally rewrite /en/<path> and /us/<path> URLs to /pages/<path>
RewriteRule ^(eu¦us)/(.*)$ pages/$2 [NC,L]
#
# Internally rewrite any remaining requests to /pages/<path>
RewriteCond $1 !^pages/
RewriteRule ^(.*)$ pages/$1 [L]

Note that I reduced the possible number of calls to geoip from two to one to speed up your server.

Replace all broken pipe "¦" characters with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

[edit] Corrections as noted below. [/edit]

[edited by: jdMorgan at 3:57 pm (utc) on July 22, 2009]

adamski

6:03 am on Jul 22, 2009 (gmt 0)

10+ Year Member



Of course - completely passed me by that I could use the pipe for checking the pages/eu/us bit! Good work!

I've changed


RewriteRule ^(eu¦us)/(.*)$ pages/$1 [NC,L]

to


RewriteRule ^(eu¦us)/(.*)$ pages/$2 [NC,L]

adamski

6:18 am on Jul 22, 2009 (gmt 0)

10+ Year Member



And have also spotted on oddity when domain.com/eu/test is called it url redirects to domain.com/pages//test but domain.com/eu/test/ is fine. Must go to work now!

adamski

7:55 pm on Jul 22, 2009 (gmt 0)

10+ Year Member



This is my code as of now allowing cookies to override geiop. I'm pleased and grateful to Jim for all his assistance. htaccess is great :)


RewriteEngine on
#
# Rewrite URL based on cookie value
RewriteCond $1 !^(pages¦eu¦us)
RewriteCond %{HTTP_COOKIE} location=([^;]+) [NC]
RewriteRule ^(.*)$ /%1/$1 [R,L]
#
# Externally redirect to add trailing slashes from /en/<path> and /us/<path> URL-paths
RewriteRule ^(eu¦us)(.*[^/])?$ http://domain.com/$1$2/ [R=301,L]
RewriteRule ^(.*)(eu¦us)/$ - [co=location:$2:.domain.com:2592000:/]
#
#Externally redirect /<path> to /en/<path> or /us/<path> URLs based on geoip lookup
RewriteCond $1 !^(pages¦eu¦us)
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^([A-Z]{2})$
RewriteCond %1>eu ^(DE¦FR¦GB)>(.+)$ [OR]
RewriteCond %1>us ^(US)>(.+)$
RewriteRule ^(.*)$ /%2/$1 [R,L]
#
# Internally rewrite /en/<path> and /us/<path> URLs to /pages/<path>
RewriteRule ^(eu¦us)(.*)[/]?$ pages/$2/ [NC,L]
#
# Internally rewrite any remaining requests to /pages/<path>
RewriteCond $1 !^pages/
RewriteRule ^(.*)$ pages/$1 [L]

Good stuff :)

g1smd

8:05 pm on Jul 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# Rewrite URL based on cookie value

In this section, you have a redirect, not a rewrite.

Note, too, that [R,L] gives you a 302 redirect. You likely need [R=301,L] as Jim explained in the first post.

jdMorgan

8:11 pm on Jul 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To prevent potentially-major problems, always specify a protocol and domain in addition to the URL-path in your RewriteRules which redirect (the rules with the [R=] flag). It's also a very good idea to specify [R=301] or [R=302] explicitly, and to know why you picked the one you did.

It's very likely that you could add the "cookie-setting flag" to your now-fourth rule, and eliminate the third rule.

I'd also recommend validating the cookie value in your first rule using ";?location=(en¦us);?", because cookies can be spoofed on the client side.

Jim

adamski

8:27 pm on Jul 22, 2009 (gmt 0)

10+ Year Member



Yes 301 redirect will be in place (overlooked as I've been playing).

Thanks both. Cookie validation added.

adamski

8:44 pm on Jul 22, 2009 (gmt 0)

10+ Year Member



I think just one thing remaining. When I click on a link (eg eu) to change the page from us to eu the output from $_COOKIE doesn't update until I've clicked it twice. Would this be because of how the cookie is set in htaccess or do I need to be more creative in the php?!

jdMorgan

3:41 pm on Jul 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure what that means. But to clarify, the cookie is set on the client side by the server's first response that contains a "Set-Cookie" HTTP header, and the client (browser) will then send that cookie back to your server with each of its subsequent requests to your server. So the cookie is set and stored client-side. This is a common point of confusion.

Make sure that the page that is being requested when you intend to set the cookie or to check "$_Cookie" has been marked with proper cache-control headers, so that the browser *must* send a request to the server after the cookie has been set. Using "Cache-control: no-cache, must-revalidate" might fix your problem.

Note that the "no-cache" attribute doesn't mean exactly what it sounds like; Due to errors, misinterpretations, and liberties taken on the early Web, all this header will do is to make the client actually send a request to your server if the visitor's browser re-loads the page for any reason. In most cases, this will be a request with an "If-Modified-Since" header, and if you are currently returning Last-Modified headers, then your server will not send back the page, it will only send back a "304-Not Modified." But, the cookie header will also be sent in that response.

As you can see, this gets complicated... :)

Jim