Forum Moderators: phranque
Is this correct to use in htaccess?
RewriteEngine on
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule ^(.*)$ /block/block.php [R,L]
One problem I had was using "$1" after the RewriteRule url like the instructions said to do. It created a "Error 500 - Internal server error" and wouldn't work until it was removed.
I turned on the RewriteLog and in 5 seconds it was 75MB! This is a high traffic site though.
The Log showed they were redirected to /block/block.php and then redirected to /block/block.php again.
So what am I doing wrong here?
Oh, it's apache 2.2.3
edit:
if the "$1" is added to the rewrite rule after the url and you visit domain.com/geoip/index.html then it will redirect you to domain.com/block/block.phpindex.html
[edited by: Doood at 4:55 pm (utc) on April 16, 2008]
RewriteEngine on
#
RewriteRule %{REQUEST_URI} !^/block/block\.php$
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule .* /block/block.php [R,L]
Replace the broken pipe "¦" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.
Jim
It goes like this,
request page
checks if it is block/block.php
checks country for match
redirects to block/block.php
checks if it is block/block.php again
check country again
then ignores rewrite because of "initial URL equal rewritten URL"
With loglevel set to 4 it shows 17 steps to process this.
In general, such a rule should only be executed on a single entry page URL, or only if a cookie is not present indicating that they've set they're preferred language, or whatever it is that you're trying to accomplish here.
Here's the code with comments. Only you can decide if it's appropriate to your goal or not:
# Turn on the rewriting engine
RewriteEngine on
#
# If the requested URL-path is NOT /block/block.php
RewriteRule %{REQUEST_URI} !^/block/block\.php$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# redirect ALL requested URLs to block/block.php
RewriteRule .* /block/block.php [R,L]
Jim
I found that removing the first slash and the "R" redirects correctly with the same end result with fewer steps because once they're redirected to block/block.php it tries to send them there again based on their country but the server stops after sending them there twice. But this only works if the htaccess is in the site root.
Seems like this would be a very common problem being in a loop when redirecting, but maybe not.
[edited by: Doood at 6:50 pm (utc) on April 16, 2008]
The target URL of the redirect is "domain unsafe" as it does not fix any domain canonicalisation issues in the process. The 302 redirect also allows search engines to index an infinite number of URLs for the "blocked" page error message. That is also very bad.
Changing to a 301 redirect [R=301] and specifying the domain name in the redirect would mostly fix those issues.
However, using any sort of redirect causes the browser to request the new URL, and the user therefore sees that new URL on screen.
Removing the [R] and NOT specifying the domain name changes the redirect into a rewrite. With a rewrite, the user does not see a new URL in their browser.
In this situation you MUST make sure that the "blocked" error-message page has a meta robots noindex tag in it. That ensures the "blocked" error message can NEVER be indexed for any of the real page URLs of the site. Each URL of the site can return two completely different pages of content. Only one of those versions should be indexed. The error message must never be indexed.
To make it easier, what if the country rules and redirection are only applied if a certain page is requested? This way it would avoid any chance of looping. Plus I'm really only needing this for one page.
Say we want the rules to apply only if they're visiting this page
mydomain.com/links.php
and redirect them if the country matches to
mydomain/block/block.php
So all other pages on the site are accessable from all countries and the page /links.php redirects to block/block.php only if they're from a blocked country.
The actual url would be more like
mydomain.com/links.php?id=123&banner=one
but I don't guess that matters.
So how would this be done?
If I were you, I'd simply return a 403-Forbidden response and be done with this.
# If the requested URL-path is NOT 403 error page
RewriteRule %{REQUEST_URI} !^/path-to-403-page$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# Deny the request
RewriteRule .* - [F]
This rewrite stuff is confusing to me.
It just seems like if all calls to a specific page can be redirected then extra rules could be able to be added in a chain.
Like,
# if the page requested is /block/block.php then stop, if not proceed
RewriteRule %{REQUEST_URI} ^/block/block\.php$ [L]
# if the page is /links.php then proceed to next condition in chain
RewriteCond ^/link.php$ [C]
# check for country match, if yes proceed to RewriteRule. If no match then pass thru to page /links.php
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
RewriteRule ^(.*)$ block/block.php [L]
Maybe it's just wishful thinking that this would work. Or maybe I'm not explaining it correctly.
I stopped all further processing of the rules by placing an htaccess in the /block/ directory with,
RewriteEngine off
If this is wrong or bad practice let me know please.
This is what the RewriteLog shows happens (country changed to US for testing)
request index.php
applying pattern '%{REQUEST_URI}' to uri 'index.php'
applying pattern '^(.*)$' to uri 'index.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'index.php' -> 'block/block.php'
internal redirect with /block/block.php [INTERNAL REDIRECT]
applying pattern '%{REQUEST_URI}' to uri 'block/block.php'
applying pattern '^(.*)$' to uri 'block/block.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'block/block.php' -> 'block/block.php'
initial URL equal rewritten URL: /home/mysite/public_html/block/block.php [IGNORING REWRITE]
And this is what it shows when turning off the RewriteEngine in the /block/ directory,
request index.php
applying pattern '^(.*)$' to uri 'index.php'
RewriteCond: input='US' pattern='^US$' => matched
rewrite 'index.php' -> 'block/block.php'
internal redirect with /block/block.php [INTERNAL REDIRECT]
Using LogLevel 4 there are 17 total processes in the first one and with the engine off in the sub-directory there are only 7.
I probably messed up your code trying to get it to work.
Unless there is a problem with your server configuration, this code should work fine, with no looping:
# Turn on the rewriting engine
RewriteEngine on
#
# If the requested URL-path is NOT /block/block.php
RewriteRule %{REQUEST_URI} !^/block/block\.php$
# and if the country code is one of those listed
RewriteCond %{ENV:GEOIP_COUNTRY_CODE} ^(EGŠPSŠCNŠINŠTR)$
# redirect ALL requested URLs to block/block.php
RewriteRule .* /block/block.php [L]
Completely flush your browser cache after each test case.
Replace the broken pipe "Š" characters above with solid pipe characters before use; Posting on this forum modifies the pipe characters.
Jim
It opened up a new can of worms since it's also applying the patterns to favicon.ico and redirecting favicon.ico to /block/block/php
I'm sure that can easily be fixed though.
One other thing, in the rewrite log is it normal to see this,
[perdir /home/mysite/public_html/] strip per-dir prefix: /home/mysite/public_html/index.php -> index.php
[perdir /home/mysite/public_html/] add per-dir prefix: block/block.php -> /home/mysite/public_html/block/block.php
[perdir /home/mysite/public_html/] strip document_root prefix: /home/mysite/public_html/block/block.php -> /block/block.php
I just noticed that while browsing around my site the rewrite log showed it checked the country of my own IP about 100 times. It applied the pattern to every single thing I viewed.
I guess though if a condition was set to not apply patterns to jpg's then it would check every file to see if it was a jpg or not though right?
Then if that RewriteRule pattern matches, the RewriteConds are evaluated in order; Put the simplest/most efficient RewriteConds first, saving the more CPU-intensive --such as your GeoIP call, file-exists checks, and reverse-DNS lookups-- for last.
Think carefully about what kind of URLs you need to block and are willing to run the complete check on: URLs that resolve to a certain directory level? URLs with a "filetype" on them, or URLs with a specific filetype on them? Or maybe you want to check them if they don't have a filetype or a certain filetype? URLs with certain words in them?
There are a million ways to classify URLs, but your goal is to pick a method that clearly puts them into three groups: Those to be blocked, those to be fully-checked to see if they need to be blocked, and those that you wish to by-pass the full check on. Use the RewriteRule and RewriteCond patterns to construct that "URL classifier system."
Just another way of looking at it, but this is actually the difficult part of using mod_rewrite once you fully-understand mod_rewrite coding and operation -- The major work becomes devising the most efficient method of determining which URLs are and are not to be rewritten. Ah, some of the almost-random-URL e-commerce sites I've seen! ;)
Jim
It's very easy to tell posters here who have read the mod_rewrite documentation and have written their own code from those who are simply posting the code 'written' for them by their site's cPanel -- In many cases, the self-written code (although wrong) is better (e.g. more-efficiently written) than the cPanel code!
As I said, it is not the code-writing that is difficult -- take note of how few "instructions" mod_rewrite has compared to, say, PERL, PHP, C, or any assembly language. Rather, the difficulty is deciding how to describe the URLs you do and do not wish to rewrite (based on the capabilities of mod_rewrite and regular-expressions pattern-matching), and that is a mental exercise.
The primary challenge is to visualize the entire URL-space of a site and divide it --in the most efficient manner possible-- into URLs that should be rewritten by a particular rule, and those that shouldn't.
Take a look at some of the longer threads here. They often start out "I want to rewrite all URLs to this script." First question: "Do you really want to rewrite the script's own URL to the script? That's a loop!" How about robots.txt? What about other "well-known-location" files like /w3c/p3p.xml or sitemap.xml, or labels.rdf? What about your "stats" page and control panel URLs? Do you really want to rewrite image and multimedia resource requests to your script? Do you really want to rewrite your custom 500-Server Error page to your script?--Consider what happens if you have a PHP configuration error if you do that!
See, it's not the coding, but the unambiguous specification of the requirements that is the hard part. And when you get right down to it, it's actually far easier to describe the two URL groups using regular-expressions, instead of English!
Jim
There are just too many ways to seemingly do the same thing (on the surface) but with very different results in the rewrite log resulting in extra work for the server.
For me, the small amount of rewrite docs means there are many questions left unanswered and the only way to know is all day long trial and error.