Forum Moderators: phranque

Message Too Old, No Replies

Need .htaccess help - Answers greatly appreciated!

Using domain and ip blocking together...

         

timothius

10:21 pm on Jun 19, 2009 (gmt 0)

10+ Year Member



If someone can successfully can successfully come up with something solid that can help me, I wouldn't mind throwing them a bone... I've been having a hard time at getting a reliable solution for this.

Problems:

Problem #1 I need to block traffic that comes in from MULTIPLE malicious domains (badsite1.com, badsite2.com, etc.). Wherever they arrive on the site, I want them to be redirected to PAGE1.html.

Problem #2 I need to have a list of IP address in my .htaccess file that I'm blocking. (I already have a long list) These visitors will be redirected to a different page (PAGE2.html)

Problem #3 If a visitor whose IP I have already blocked visits my site through a malicious domain I want him to be

redirected to PAGE1.html NOT PAGE2.html.

------------------------------------------

To achieve this effect, I'm not sure what order everything needs to be on the website or what code needs to be added. I need to be able to add new malicious domains in the future and it can't mess up the code. I'm not too terribly good at .htaccess as you can tell, and would so appreciate it if I could have some help with this issue.

Below is everything that my .htaccess file CURRENTLY consists of in the order they appear in the .htaccess file: (spacing is for clarity)

# Block of 301 redirects

Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html
Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html
Redirect 301 http://www.example.com/somepage.html http://www.example.com/some-other-page.html

#List of Blocked IP addresses:

RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx [OR]
RewriteCond %{REMOTE_HOST} xx\.xx\.xx\.xx

#NOT SURE WHAT SHOULD GO HERE ONCE BLOCKED DOMAINS ARE ADDED

------------------------------------------

[edited by: jdMorgan at 3:18 am (utc) on June 20, 2009]
[edit reason] example.com [/edit]

g1smd

12:46 am on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you use RewriteRule for some of your rules, use RewriteRule for all of your rules. Don't mix in stuff using Redirect or RedirectMatch on the same site. In any case, with any type of redirect, the pattern on the left should contain only a local path (not a domain name) and the target on the right should contain both domain name and full path.

Do you really want a redirect? With a redirect you will tell the other end that it needs to make a new request for a new URL (i.e. they will see that they are being handled differently). If you used a rewrite, the other end would see 'alternative content' at the same URL that they requested (i.e. this happens silently).

timothius

12:52 am on Jun 20, 2009 (gmt 0)

10+ Year Member



That isn't the *exact format I used for my 301 Redirects. It was just an example.

I'm not really sure what you're talking about with the rewriterule. Whatever the case, the last post doesn't address any of my problems that I listed... can anyone help?

jdMorgan

3:40 am on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You'd do well to go along with the 'discussion' here, even if it doesn't *yet* address your question.

Basically, the code above is so flawed that it indicates that more than just a simple "add this line" answer is needed. For example, the "Redirect" lines each contain a major error, and the variable "REMOTE_HOST" requires a full or partial domain name, not an IP address.

It's also not clear if you want to block requests from servers at those addresses, or if you want to block visitors referred by the Web sites at those addresses.

If you mix directives from mod_alias and mod_rewrite as you have done, then you cannot control the order that directives from those two modules will execute. Furthermore, if you change hosts, or if your existing host 'upgrades' your server, that execution order may change -- breaking your site. You also won't be able to control what happens if the 'bad request' meets the criteria for both blocking methods -- at least, not reliably.

So your simple answer is that you need to add a RewriteRule, such as

 RewriteRule .* - [F]

below your RewriteConds. But it won't work, because of the other problems.

Also be aware that unless you add an exception, you won't be able to use a custom 403 error document on this site. Also, any robots accessing your site from those blocked addresses will likely assume that they can/should attempt to spider your entire site, since they won't be able to fetch robots.txt.

There are many 'wider issues' here. If you're interested in avoiding simplistic solutions and the resultant problems, then you'd do well not to brush off the well-intentioned advice of our few contributors. If you do, you may end up trying to contract with some of those very same people to fix your site later, and I assure you that it won't be inexpensive.

Jim

timothius

4:09 am on Jun 20, 2009 (gmt 0)

10+ Year Member



Hmmm... I'm not trying to brush anyone off Jim. I just modified the example code that is actually on my current .htaccess file for simplicity's sake. (probably not a good idea)

Basically I was just trying to say that:

I have a block of 301 redirects (all working) at the beginning of my document, followed by a list of IP address that I redirect to a particular url on my site. (all work fine, though I don't know anything about a full domain name...) I have a rewriterule beneath these in real life.

The 'custom 403 page' is not an actual 403 page, just a page that *looks like a 403 page, and that's what I'm going for. I don't know anything about exceptions however.

I didn't mention anything about blocking servers... (though that would be nice). For now, that is not a necessity. I'm just after redirecting visitors from that domain.

To be honest, I've been trying to advertise this as a paid job (Don't think it's possible to post jobs on WebmasterWorld). For someone who actually knows how .htaccess files should work it shouldn't take longer than 30 minutes I would think.

If you need a visual on what this .htaccess file is supposed to look like, I have a picture on flickr.

[flickr.com...]

jdMorgan

2:00 pm on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, so as I understand it:

Problem #1 I need to block traffic referred from MULTIPLE malicious domains (badsite1.com, badsite2.com, etc.). Wherever they arrive on the site, I want them to be redirected to PAGE1.html.

Problem #2 I need to have a list of IP addresses in my .htaccess file that I'm blocking. (I already have a long list) These visitors will be redirected to a different page (PAGE2.html)

Problem #3 If a visitor whose IP I have already blocked by IP address visits my site through a link on (i.e. a referral from) a malicious domain I want him to be redirected to PAGE1.html NOT PAGE2.html.



# Redirect visitors referred by 'bad' sites
RewriteCond $1 !^(page1¦page2)\.html$
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite1\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite2\.co\.uk [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?badsite3\.ca
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ http://www.example.com/page1.html [R=302,L]
#
# Redirect visitors from blocked IP addresses:
RewriteCond $1 !^(page1¦page2)\.html$
RewriteCond %{REMOTE_ADDR} 192\.168\.01\.02 [OR]
RewriteCond %{REMOTE_ADDR} 192\.168\.03\.04 [OR]
RewriteCond %{REMOTE_ADDR} 192\.168\.05\.06
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ http://www.example.com/page2.html [R=302,L]
#
# Per-page 301 redirects
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]
RewriteRule ^somepage\.html$ http://www.example.com/some-other-page.html [R=301,L]

Replace the broken pipe "¦" characters with solid pipes before use; Posting on this forum modifies the pipe characters.

Rule order is important, and is correct as shown here.

The first RewriteCond in each 'blocking' rule prevents an infinite redirection loop, based on the requested page's URL-path.

Note that I used 302 redirects for the page1.html and page2.html redirects. Actually, a 303-See Other might be more appropriate, but I haven't thought about it because I never use redirects for access control (see further discussion below).

Note that only requests for .html and .php pages (including any index pages at "/" in any directory) are redirected by these rules. This is because a browser will not show an HTML page if it is requesting an image or other object 'included' in an HTML page. So there is not much use trying to redirect such requests to an HTML page. You may modify the subpattern to add or remove 'page' filetypes as needed.

Now, backing up quite a bit from this to get a wider perspective, consider: If page1.html or page2.html are some kind of "Hah, Hah! We caught you!" pages, then I recommend against that. Doing such things throws down a challenge, and will simply motivate some people to try to crack your defenses. Failing that, they may start a DOS attack out of spite. When issuing 'error messages' to potentially-malicious malformed requests or to unwelcome visitors, it is best to provide only minimal information. In fact, if a malicious request is intentionally denied, it's sometimes better to pretend that there is a problem with your server; Instead of saying "Access denied," you might be better off saying "Server Error." Keep in mind that information is power: Don't unnecessarily empower the bad guys.

Also, as g1smd stated above, an external redirect will 'expose' the new 'page1.html' and 'page2.html' URLs to these blocked visitors. Therefore, they will know that the URL changed, and that "you did something." And if these visitors are robots, they may not follow the redirect (remember that it is up to the client browser/robot whether it wants to follow a redirect). For this reason, you might consider using an internal rewrite, rather than an external redirect. For example, the second rule would change to

RewriteRule ^(.+\.(html¦php))?$ /page2.html [L]

-- the protocol, domain, and [R=] flag are removed to make it a rewrite instead of a redirect. In this case, only one HTTP transaction is involved, and the server simply returns the content of the page1.html file in response to an unwelcome request for any URL. You can view this as 'content substitution' rather than as a redirect.

Keep in mind that HTTP Referer headers are not always present in HTTP requests. It is up to the client whether to send them, and they are optional -- not required by the HTTP protocol specification. Some ISPs' caching proxies and some "internet security" software will also block this header. What this means is that your referrer-based blocking will not be 100% effective, and that in cases where no Referer header is sent, visitors blocked by both methods will see page2.html instead of page1.html.

Jim

timothius

8:35 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Thank-you so very much for your help Jim. The code looks very complete. I'm looking it all up so I can understand what it means.

I tested the first part separately "redirecting visitors from bad sites" to see if it works, but though I replaced the broken pipe with solid pipe and changed the file names and domains appropriately, it doesn't redirect anywhere... which is rather curious. No idea what needs fixing...

One thing I don't understand is the "$1" after the first RewriteCond. What does that signify and could that be causing the problem? I just don't see what else could be flawed...

I've made sure to make he redirects very discreet so there wouldn't be much chance of the users noticing anything. However, the internal rewrite looks like a great option as well, which I'll test and look into once I get the original code you wrote implemented.

jdMorgan

8:53 pm on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In mod_rewrite, variables $1 through $9 refer back to (they "back-reference") the matched contents of the first through ninth sub-pattern in the RewriteRule, respectively. %1 through %9 refer back to the the matched contents of the first through ninth sub-pattern in the last-matched RewriteCond.

In this case, we are looking at the requested URL-path to be sure that we don't redirect previously-redirected requests for either page1.html or page2.html. This avoids an infinite redirection loop. Note that "!" means "NOT" in this context.

Be aware that within a mod_rewrite "routine," the RewriteRule pattern is evaluated first. The RewriteConds are not evaluated if the RewriteRule pattern does not match. One effect (and reason for doing it this way) is that RewriteConds can then 'see' the back-references created by RewriteRules, and RewriteRules and RewriteConds can use back-references created by a preceding matched RewriteCond. This is quite useful when using multiple RewriteConds and when building the RewriteRule substitution URL-path.

You will have to completely flush your browser cache and then click on a link on one of the bad sites for this to work. You can use the Live HTTP Headers add-on for Firefox/Mozilla browsers to see the transactions between your browser and your server.

If you have not already done so, read the mod_rewrite documentation at apache.org. Yes, it is long and complicated, but be aware that one single typo or other small error in your code can take down your server -- if you are lucky. If you are not so lucky, it can slowly erode your search engine rankings over time, and perhaps put you out of business. Therefore "looking it all up so I can understand what it means" and reading the documentation are both very, very good ideas.

Jim

[edited by: jdMorgan at 8:55 pm (utc) on June 20, 2009]

timothius

9:13 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Ok, so this is *all the code I have in my .htaccess file at this point. (testing it block by block)

# Redirect visitors referred by 'bad' sites
RewriteCond $1 !^(not-found¦not-found2)\.html$
RewriteCond %{HTTP_REFERER} ^http://(www\.)?subdomain\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://(www\.)?example2\.com
RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ [mysite.com...] [R=302,L]
#

I have link pointed to mysite.com (which has this .htaccess file) from example.com, but the .htaccess file appears to be doing nothing. I can't see anything obvious that would be making this happen...

Oh, and I have flushed my browsers cache, cookies, etc completely a few times now. I've also tried using several different browsers... still no go. (and as said previously, I *did repair the pipes.)

jdMorgan

9:19 pm on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If this is *all* the code, then it must be preceded by either the second line below, or both the first and second line -- The only way to find out is to test.

Options +FollowSymLinks -MultiViews
RewriteEngine on

It is best to test this with a very simple rule, such as

RewriteRule ^test-foo\.html$ http://www.google.com/ [R=302,L]

Request 'test-foo.html" from your server, and you should land at Google.

Jim

[edited by: jdMorgan at 9:31 pm (utc) on June 20, 2009]

timothius

9:34 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Ok, well that was the problem. Thanks!

I put in "RewriteEngine on" at the top of the file. If that works by itself, do I still need the 1st line?

Also, do these directives just have to be present once in the .htaccess file or do I have to place them before every rewriterule? Was never sure about that...

jdMorgan

9:46 pm on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Place the one or two directives required (as indicated by your testing) once before your list of rewriterules.

Jim

timothius

10:50 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Ok, so I was wondering about what you said with internal rewriting...

RewriteRule ^(.+\.(html¦php))?$ /not-found.html [L]

I replaced the external redirect rule with the code above, and when I followed a link from a 'bad domain', I got a 500 Error page. The address bar showed the destination that the link was pointing to, which is good but the error message was the usual 500 error message, and didn't contain the code that I had in my custom page.

I'm just not sure what you mean by 'content substitution'. I thought then that the page was supposed to show my error message on the not-found.html page on the linked to page... am I wrong in my thinking here or not?

jdMorgan

10:56 pm on Jun 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Look at your server error log. It should tell you what the problem is.

Make sure the anti-loop RewriteCond is still present.

Your simplified pattern will not execute as fast as the one I posted, BTW.

Jim

timothius

11:26 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Hmmm... sorry to be a pain here, but my host actually doesn't allow access to Apache error logs for "technical reasons".

The pattern I just posted looks identical? to the one you posted above. I just changed the filename I thought. And the same rewritecond is there. The only thing I changed was the rewrite rule.

You posted: RewriteRule ^(.+\.(html¦php))?$ /page2.html [L]
I posted: RewriteRule ^(.+\.(html¦php))?$ /not-found.html [L]

Is there something better I should use?

timothius

11:45 pm on Jun 20, 2009 (gmt 0)

10+ Year Member



Woot! I think the internal rewrite works now. I guess I had a commented line between the last RewriteCond and the RewriteRule. I guess that might do it?

Still, is the rewriterule directly above not the best thing to use?

jdMorgan

4:10 am on Jun 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> my host actually doesn't allow access to Apache error logs for "technical reasons".
Then you need a better host -- for technical reasons... You need one that is technically competent to provide you with basic logging resources.

Try the more-specific pattern as in the redirect rule:

 RewriteRule ^(([^/]+/)*([^.]+\.(html¦php))?)$ /not-found.html [L]

If you commented-out the last RewriteCond (the one without an [OR] flag), leaving one with an [OR] as the last RewriteCond, then that would cause a problem. But without access to the error log, you're kind of lost, and I cannot recommend using or developing mod_rewrite code without access to the error log.

Jim

timothius

4:19 am on Jun 21, 2009 (gmt 0)

10+ Year Member



Well, it gives me access to quite a few different kinds of logs, just not the error logs.

After a lot of testing it looks like I'm finally in business. Thank-you so much for your help Jim. I couldn't have done it (right) without you!

jdMorgan

12:25 pm on Jun 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had a keyboard once, with only a slight problem. It gave me access to quite a few different keys, just not the backspace key.

I summarily pitched it in the bin.

There were simply too many perfectly-good keyboards available at reasonable cost to bother with a defective one.

Without access to error logs, you will pay over and over and over again -- in terms of time wasted "debugging code by staring at it" and because of "unfindable" bugs. For anything except a pure static-HTML site with no scripting, no SSI, and no mod_rewrite, error logs are NOT optional.

I've got error logs on hosts that charge me less than 35 cents a day for hosting, so it is not a matter of cost.

Jim

timothius

7:13 pm on Jun 22, 2009 (gmt 0)

10+ Year Member



Thank-you for the advice Jim. I have 2 hosting accounts, the one doesn't allow access to the error logs, and the other one does, but only if you have a more expensive hosting account. With the traffic my sites currently get, I'm currently still fine with basic shared hosting accounts. However, I probably will be upgrading sometime in the future here.

I just wonder if I'll actually be able to decipher what the error logs are actually telling me if I *did have access.

g1smd

8:18 pm on Jun 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They can be cryptic, but the chances are that many people will have had that exact error message some time in the past decade or more, and literally thousands of them will have posted the message into some forum or other and had the error explained and their code debugged.

timothius

8:23 pm on Jun 22, 2009 (gmt 0)

10+ Year Member



Right... that's true.

Til' then however. I'll keep to the status quo. I'll just be really careful when editing my .htaccess file in the future. :) Thanks for all the help guys!