Forum Moderators: phranque
Problems:
Problem #1 I need to block traffic that comes in from MULTIPLE malicious domains (badsite1.com, badsite2.com, etc.). Wherever they arrive on the site, I want them to be redirected to PAGE1.html.
Problem #2 I need to have a list of IP address in my .htaccess file that I'm blocking. (I already have a long list) These visitors will be redirected to a different page (PAGE2.html)
Problem #3 If a visitor whose IP I have already blocked visits my site through a malicious domain I want him to be
redirected to PAGE1.html NOT PAGE2.html.
------------------------------------------
To achieve this effect, I'm not sure what order everything needs to be on the website or what code needs to be added. I need to be able to add new malicious domains in the future and it can't mess up the code. I'm not too terribly good at .htaccess as you can tell, and would so appreciate it if I could have some help with this issue.
Below is everything that my .htaccess file CURRENTLY consists of in the order they appear in the .htaccess file: (spacing is for clarity)
# Block of 301 redirects
Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html
Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html
Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html
#List of Blocked IP addresses:
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx
#NOT SURE WHAT SHOULD GO HERE ONCE BLOCKED DOMAINS ARE ADDED
------------------------------------------
[edited by: jdMorgan at 3:18 am (utc) on June 20, 2009]
[edit reason] example.com [/edit]
Do you really want a redirect? With a redirect you will tell the other end that it needs to make a new request for a new URL (i.e. they will see that they are being handled differently). If you used a rewrite, the other end would see 'alternative content' at the same URL that they requested (i.e. this happens silently).
Basically, the code above is so flawed that it indicates that more than just a simple "add this line" answer is needed. For example, the "Redirect" lines each contain a major error, and the variable "REMOTE_HOST" requires a full or partial domain name, not an IP address.
It's also not clear if you want to block requests from servers at those addresses, or if you want to block visitors referred by the Web sites at those addresses.
If you mix directives from mod_alias and mod_rewrite as you have done, then you cannot control the order that directives from those two modules will execute. Furthermore, if you change hosts, or if your existing host 'upgrades' your server, that execution order may change -- breaking your site. You also won't be able to control what happens if the 'bad request' meets the criteria for both blocking methods -- at least, not reliably.
So your simple answer is that you need to add a RewriteRule, such as
RewriteRule .* - [F]
Also be aware that unless you add an exception, you won't be able to use a custom 403 error document on this site. Also, any robots accessing your site from those blocked addresses will likely assume that they can/should attempt to spider your entire site, since they won't be able to fetch robots.txt.
There are many 'wider issues' here. If you're interested in avoiding simplistic solutions and the resultant problems, then you'd do well not to brush off the well-intentioned advice of our few contributors. If you do, you may end up trying to contract with some of those very same people to fix your site later, and I assure you that it won't be inexpensive.
Jim
Basically I was just trying to say that:
I have a block of 301 redirects (all working) at the beginning of my document, followed by a list of IP address that I redirect to a particular url on my site. (all work fine, though I don't know anything about a full domain name...) I have a rewriterule beneath these in real life.
The 'custom 403 page' is not an actual 403 page, just a page that *looks like a 403 page, and that's what I'm going for. I don't know anything about exceptions however.
I didn't mention anything about blocking servers... (though that would be nice). For now, that is not a necessity. I'm just after redirecting visitors from that domain.
To be honest, I've been trying to advertise this as a paid job (Don't think it's possible to post jobs on WebmasterWorld). For someone who actually knows how .htaccess files should work it shouldn't take longer than 30 minutes I would think.
If you need a visual on what this .htaccess file is supposed to look like, I have a picture on flickr.
[flickr.com...]
Problem #1 I need to block traffic referred from MULTIPLE malicious domains (badsite1.com, badsite2.com, etc.). Wherever they arrive on the site, I want them to be redirected to PAGE1.html.Problem #2 I need to have a list of IP addresses in my .htaccess file that I'm blocking. (I already have a long list) These visitors will be redirected to a different page (PAGE2.html)
Problem #3 If a visitor whose IP I have already blocked by IP address visits my site through a link on (i.e. a referral from) a malicious domain I want him to be redirected to PAGE1.html NOT PAGE2.html.
# Redirect visitors referred by 'bad' sites
RewriteCond $1 !^(page1¦page2)\.html$
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite1\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite2\.co\.uk [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite3\.ca
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ http://www.example.com/page1.html [R=302,L]
#
# Redirect visitors from blocked IP addresses:
RewriteCond $1 !^(page1¦page2)\.html$
RewriteCond %{REMOTE_ADDR} 192\.168\.01\.02 [OR]
RewriteCond %{REMOTE_ADDR} 192\.168\.03\.04 [OR]
RewriteCond %{REMOTE_ADDR} 192\.168\.05\.06
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ http://www.example.com/page2.html [R=302,L]
#
# Per-page 301 redirects
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]
Rule order is important, and is correct as shown here.
The first RewriteCond in each 'blocking' rule prevents an infinite redirection loop, based on the requested page's URL-path.
Note that I used 302 redirects for the page1.html and page2.html redirects. Actually, a 303-See Other might be more appropriate, but I haven't thought about it because I never use redirects for access control (see further discussion below).
Note that only requests for .html and .php pages (including any index pages at "/" in any directory) are redirected by these rules. This is because a browser will not show an HTML page if it is requesting an image or other object 'included' in an HTML page. So there is not much use trying to redirect such requests to an HTML page. You may modify the subpattern to add or remove 'page' filetypes as needed.
Now, backing up quite a bit from this to get a wider perspective, consider: If page1.html or page2.html are some kind of "Hah, Hah! We caught you!" pages, then I recommend against that. Doing such things throws down a challenge, and will simply motivate some people to try to crack your defenses. Failing that, they may start a DOS attack out of spite. When issuing 'error messages' to potentially-malicious malformed requests or to unwelcome visitors, it is best to provide only minimal information. In fact, if a malicious request is intentionally denied, it's sometimes better to pretend that there is a problem with your server; Instead of saying "Access denied," you might be better off saying "Server Error." Keep in mind that information is power: Don't unnecessarily empower the bad guys.
Also, as g1smd stated above, an external redirect will 'expose' the new 'page1.html' and 'page2.html' URLs to these blocked visitors. Therefore, they will know that the URL changed, and that "you did something." And if these visitors are robots, they may not follow the redirect (remember that it is up to the client browser/robot whether it wants to follow a redirect). For this reason, you might consider using an internal rewrite, rather than an external redirect. For example, the second rule would change to
RewriteRule ^(.+\.(html¦php))?$ /page2.html [L] Keep in mind that HTTP Referer headers are not always present in HTTP requests. It is up to the client whether to send them, and they are optional -- not required by the HTTP protocol specification. Some ISPs' caching proxies and some "internet security" software will also block this header. What this means is that your referrer-based blocking will not be 100% effective, and that in cases where no Referer header is sent, visitors blocked by both methods will see page2.html instead of page1.html.
Jim
I tested the first part separately "redirecting visitors from bad sites" to see if it works, but though I replaced the broken pipe with solid pipe and changed the file names and domains appropriately, it doesn't redirect anywhere... which is rather curious. No idea what needs fixing...
One thing I don't understand is the "$1" after the first RewriteCond. What does that signify and could that be causing the problem? I just don't see what else could be flawed...
I've made sure to make he redirects very discreet so there wouldn't be much chance of the users noticing anything. However, the internal rewrite looks like a great option as well, which I'll test and look into once I get the original code you wrote implemented.
In this case, we are looking at the requested URL-path to be sure that we don't redirect previously-redirected requests for either page1.html or page2.html. This avoids an infinite redirection loop. Note that "!" means "NOT" in this context.
Be aware that within a mod_rewrite "routine," the RewriteRule pattern is evaluated first. The RewriteConds are not evaluated if the RewriteRule pattern does not match. One effect (and reason for doing it this way) is that RewriteConds can then 'see' the back-references created by RewriteRules, and RewriteRules and RewriteConds can use back-references created by a preceding matched RewriteCond. This is quite useful when using multiple RewriteConds and when building the RewriteRule substitution URL-path.
You will have to completely flush your browser cache and then click on a link on one of the bad sites for this to work. You can use the Live HTTP Headers add-on for Firefox/Mozilla browsers to see the transactions between your browser and your server.
If you have not already done so, read the mod_rewrite documentation at apache.org. Yes, it is long and complicated, but be aware that one single typo or other small error in your code can take down your server -- if you are lucky. If you are not so lucky, it can slowly erode your search engine rankings over time, and perhaps put you out of business. Therefore "looking it all up so I can understand what it means" and reading the documentation are both very, very good ideas.
Jim
[edited by: jdMorgan at 8:55 pm (utc) on June 20, 2009]
# Redirect visitors referred by 'bad' sites
RewriteCond $1 !^(not-found¦not-found2)\.html$
RewriteCond %{HTTP_REFERER} ^http://(www\.)?subdomain\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?example2\.com
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ [mysite.com...] [R=302,L]
#
I have link pointed to mysite.com (which has this .htaccess file) from example.com, but the .htaccess file appears to be doing nothing. I can't see anything obvious that would be making this happen...
Oh, and I have flushed my browsers cache, cookies, etc completely a few times now. I've also tried using several different browsers... still no go. (and as said previously, I *did repair the pipes.)
Options +FollowSymLinks -MultiViews
RewriteEngine on
RewriteRule ^test-foo\.html$ http://www.google.com/ [R=302,L]
Jim
[edited by: jdMorgan at 9:31 pm (utc) on June 20, 2009]
I put in "RewriteEngine on" at the top of the file. If that works by itself, do I still need the 1st line?
Also, do these directives just have to be present once in the .htaccess file or do I have to place them before every rewriterule? Was never sure about that...
RewriteRule ^(.+\.(html¦php))?$ /not-found.html [L]
I replaced the external redirect rule with the code above, and when I followed a link from a 'bad domain', I got a 500 Error page. The address bar showed the destination that the link was pointing to, which is good but the error message was the usual 500 error message, and didn't contain the code that I had in my custom page.
I'm just not sure what you mean by 'content substitution'. I thought then that the page was supposed to show my error message on the not-found.html page on the linked to page... am I wrong in my thinking here or not?
The pattern I just posted looks identical? to the one you posted above. I just changed the filename I thought. And the same rewritecond is there. The only thing I changed was the rewrite rule.
You posted: RewriteRule ^(.+\.(html¦php))?$ /page2.html [L]
I posted: RewriteRule ^(.+\.(html¦php))?$ /not-found.html [L]
Is there something better I should use?
Try the more-specific pattern as in the redirect rule:
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ /not-found.html [L]
If you commented-out the last RewriteCond (the one without an [OR] flag), leaving one with an [OR] as the last RewriteCond, then that would cause a problem. But without access to the error log, you're kind of lost, and I cannot recommend using or developing mod_rewrite code without access to the error log.
Jim
I summarily pitched it in the bin.
There were simply too many perfectly-good keyboards available at reasonable cost to bother with a defective one.
Without access to error logs, you will pay over and over and over again -- in terms of time wasted "debugging code by staring at it" and because of "unfindable" bugs. For anything except a pure static-HTML site with no scripting, no SSI, and no mod_rewrite, error logs are NOT optional.
I've got error logs on hosts that charge me less than 35 cents a day for hosting, so it is not a matter of cost.
Jim
I just wonder if I'll actually be able to decipher what the error logs are actually telling me if I *did have access.