Forum Moderators: phranque

Message Too Old, No Replies

htaccess and rewrites; forbid all files, except specified ones

         

whitenoise

1:19 pm on Mar 25, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi everyone, I wonder if you could help me out with a (might seem basic) problem with htaccess please?

In my htaccess file I have section which bans certain IP addresses after going somewhere on the website they shouldn't. So far the code is as below:

RewriteCond %{REMOTE_HOST} ^xx\.xxx\.xxx\.x$ [OR]
RewriteCond %{REMOTE_HOST} ^xx\.xx\.xx\.xxx$
RewriteRule .* - [F,L]


While this shows my custom error 403 page, it just shows an unstyled page, since, of course, the stylesheet, logo, background image etc are blocked by the rule.

In my attempt to ban all files, apart from a few, I tried the following lines placed at the top of the others:

RewriteCond %{REQUEST_URI} ^/images/logo\.jpg$ [L]


that didn't work, so removed that and tried:

RewriteRule ^/images/logo\.jpg$ [L]


But neither of these allows logo.jpg to show on the error page. Am I doing something wrong? Thanks.

lucy24

6:46 pm on Mar 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Short answer: omit the leading / slash. You also forgot the empty target - but I'm guessing that was just an artifact of posting, or you would probably have got an Apache error.

Long answer:
RewriteRule ^/

This formulation will never work in htaccess, because of the leading / slash.

What I personally recommend is constraining most access-control RewriteRules to requests for pages. Malign robots asking for images are vanishingly rare-- especially if they haven't been able to crawl the page and learn the image URLs in the first place. This gives your server a break, since it doesn't have to evaluate conditions over and over again on every single request ever. So something on this pattern, replacing "html" with whatever extention(s) you actually use
RewriteCond blahblah
RewriteRule (^|/|\.html)$ - [F]

How do your unwanted visitors even see your custom 403 page? I don't see a rule exempting it. (Every mod that issues a 403 needs a separate hole poked for it.)

That list of REMOTE_HOST conditions is going to get exceedingly long. Why don't you use the more common mod_authzthingy approach?
Deny from aa.bb
Deny from aa.bb.cc.0/19
Deny from aa.bb.0.0/14

et cetera.
after going somewhere on the website they shouldn't

I don't see anything in the rule about "going somewhere they shouldn't". It seems to be an absolutely universal rule.

Incidentally, the [F] flag carries an implied [L]. The form [F,L] will do no harm; it just isn't needed.

whitenoise

2:04 pm on Mar 27, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks very much for your answer and the time taken to reply Lucy. I'm still not 100% sure I understand your answer though, sorry to be a pain.

The situation is that on each page of my website, I have a tiny hidden image/link which if clicked runs a script that records the IP, and then redirects them to the forbidden error page. They cannot see anything else on the website apart from this page. My purpose of doing this is the only way the hidden image/link will be clicked on is if someone is trying screen scrape or download the whole site. This is designed to stop that.

I had been using a method of

SetEnvIf Remote_Addr ^xx\.xxx\.xxx\.xx$ getout
SetEnvIf Request_URI "^(/error/error403\.php|/styles/main.css|/robots\.txt)$" allowsome
order deny,allow
deny from env=getout
allow from env=allowsome


This seems to work well, denies people from seeing other pages but the error page, but allows things like the stylesheet, the error page, logo, backgrounds etc.

Problem is that my host is now using Litespeed instead of Apache, and the above no longer works. Consequently I have had to change this to the Rewrite approach, but having difficulty.

Hopefully this clarifies what I am trying to do, and any further help would be much appreciated :)

whitenoise

3:29 pm on Mar 27, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



As an update, I think I've managed to sort it. After a bit of playing around, I've come up with the following

RewriteCond %{REMOTE_HOST} ^xx\.xxx\.xxx\.x$ [OR] #27-03-2015, 10:13
RewriteCond %{REMOTE_HOST} ^xx\.xx\.xxx\.xxx$
RewriteCond %{REQUEST_URI} !^(/error/error403\.html|/styles/main\.css|/images/background\.jpg|/images/error/403\.jpg|/images/logo\.jpg|favicon\.ico|/robots\.txt)$
RewriteRule .* - [F]


This might not be the most efficient, but it seems to work as expected. I might need to manually go through this file once in a while to try and combine IP ranges/blocks. Alternatively if the request came from aa.bb.ccc.ddd, I could just block people from aa.bb.ccc.

Could you see any problems in using the above code? Thanks :)

lucy24

8:16 pm on Mar 27, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



my host is now using Litespeed instead of Apache, and the above no longer works. Consequently I have had to change this to the Rewrite approach

That makes no sense. SetEnvIf and RewriteRule are both Apache. If you can use one, why not the other? On most installations, mod_setenvif runs before mod_rewrite, so code accordingly.

Could you see any problems in using the above code?

The whole rule seems backward. Why don't you make a single conditionless rule and place it before all other RewriteRules, like this? Here I've split it into categories for readability.

RewriteRule ^error/error403\.html$ - [L]
RewriteRule ^styles/main\.css$ - [L]
RewriteRule ^(favicon\.ico|robots\.txt)$ - [L]
RewriteRule ^images/(background|error/403|logo)\.jpg$ - [L]

You may want to give universal access to other stuff in your /error/ directory, in which case you'd scratch some of the above and go to
RewriteRule ^error/ - [L]

without closing anchor. But in any case you should have a robots.txt exemption right at the beginning. If someone disobeys robots.txt you want it to be because they're nasty Ukrainian scrapers-- or because they just didn't ask for it-- not because you wouldn't let them see it.

I like to keep a universal exemption for .css and .ico because it helps identify humans who were wrongly blocked.
but allows things like the stylesheet, the error page, logo, backgrounds etc.

You can also achieve this using a Files or FilesMatch envelope with an "Allow from all".

whitenoise

10:15 am on Mar 28, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Many thanks for your help once again Lucy, you are a star :)