Forum Moderators: phranque

Message Too Old, No Replies

Help Phrasing htaccess

Blocking hotlinkers and site downloaders

         

DonX

10:59 am on Oct 9, 2004 (gmt 0)

10+ Year Member



I am currently updating my website to be based on PHP+MySQL to help increase traffic, but at the same time I would like to severely reduce the number of users who link to my files or download my whole website each month so I might reduce bandwidth costs there. The trouble is I'm not quite sure how I would write the statements I'm looking for within the htaccess file. I believe I would need to know if ANDs are possible and how I would use them.

At present my htaccess file resembles:

RewriteEngine on
SetEnvIfNoCase Request_URI "^(.*)/$" valid-link=1
SetEnvIfNoCase Referer "(.*)mydomain\.com(.*)" valid-link=1
SetEnvIfNoCase Referer "(.*)mydomain\.co\.uk(.*)" valid-link=1
SetEnvIfNoCase REQUEST_URI "\.php$" valid-link=1
SetEnvIfNoCase REQUEST_URI "/public/" valid-link=1
<FilesMatch "\.*$">
order allow,deny
allow from env=valid-link
</FilesMatch>

RewriteCond %{HTTP_USER_AGENT} leech1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leech2 [NC]
RewriteRule /* http://www.mydomain.com/leech.html [L,R]

What I would actually like to have is:

Allow links to any php pages from any referee OR 
Allow links to any index folder* from any referee OR
Allow links to all other files from my domain/my php pages only OR
Let the public folder be completely visible, even for hot-linkers
THEN
Block website downloaders

*To allow cases where people type www.mydomain.com/bob or www.mydomain.com/bob/ instead of www.mydomain.com/bob/index.php

If anyone could give me any advice as to how I might update my htaccess statements to this affect, or even point out any errors I might have with what I have written so far, I would greatly appreciate it.

jdMorgan

8:45 pm on Oct 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DonX,

Welcome to WebmasterWorld!

Your code (and life) would probably be much simpler if you would choose either mod_access or mod_rewrite to accomplish what you need. As it is, you have mixed the two methods, which unnecessarily complicates things.

Please post specific questions, rather than asking for a code rewrite. We can help you get your code working, but can't write it for you -- See our charter [webmasterworld.com].

As to your question about ANDing conditions, this is easily done by omitting [OR] in RewriteCond directives -- the default multiple-RewriteCond behavior is AND.

Jim

DonX

10:20 pm on Oct 9, 2004 (gmt 0)

10+ Year Member



I am sorry, I didn't mean for it to sound like I was asking for a code rewrite. I was just attempting to give as much information as I could to describe what I was trying to do.

I am looking to learn how I would put AND/OR SetEnvIf statements together, eg. What I was thinking was

SetEnvIf ... "..." valid-link
SetEnvIf ... "..."
SetEnvIf ... "..." valid-link
etEnvIf ... "..." valid-link
<FilesMatch "\.*$">
order allow,deny
allow from env=valid-link
</FilesMatch>

to give (line1 OR (line2 AND line3) OR line4) but that doesn't sound right for writing an AND and I can't find an example of ANDing online.

I hadn't thought of using SetEnvIfs for the second part, but I think I'll wait until I can find a full list of commands before I try that.

jdMorgan

11:31 pm on Oct 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



SetEnvIfs don't lend themselves to ANDing - I'd suggest you replace all of that with mod_rewrite RewriteCond directives, and AND those.

You can use SetEnvIfs if you must, by employing negative logic, but mod_rewrite is more straightforward.
NOT ( (NOT A) OR (NOT B) ) is equivalent to A AND B

For an example using mod_rewrite, the following will return a 403-Forbidden response to the WGET user-agent referred from example.com, and only if a subdirectory is requested:


RewriteCond %{HTTP_REFERER} ^http://example.com
RewriteCond %{HTTP_USER_AGENT} ^WGET
RewriteRule ^.+/ - [F]

In the absense of an [OR] flag on the first RewriteCond, it is ANDed with the second one, thus requiring both to match before the RewriteRule can be invoked. The RewriteRule itself requires the requested resource URL-path to contain one or more characters followed by a slash, thus identifying a subdirectory request.

The references cited in our charter will lead you to the Apache documentation, where you will find documentation for all Apache directives.

Jim

DonX

1:48 pm on Oct 10, 2004 (gmt 0)

10+ Year Member



Thank you for that information. I have a couple of questions about writing RewriteConds.

Can you have several sets of RewriteConds followed by their own RewriteRule?

If you were to have a statement such as:

RewriteCond %{HTTP_REFERER} ^http://example1.com [OR]
RewriteCond %{HTTP_REFERER} ^http://example2.com
RewriteCond %{HTTP_USER_AGENT} ^WGET [OR]
RewriteCond %{HTTP_REFERER} ^http://example3.com
RewriteRule ^.+/ - [F]

Would that give you (line1 OR (line2 AND line3) OR line4)?
If that is the case then to have (line1 AND (line2 OR line3 OR line4) would that have to be written in several seperate sets of statements? As you can tell I'm trying to get a feel for how the logic works :)

jdMorgan

3:37 pm on Oct 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteCond %{HTTP_REFERER} ^http://example1.com [OR]
RewriteCond %{HTTP_REFERER} ^http://example2.com
RewriteCond %{HTTP_USER_AGENT} ^WGET [OR]
RewriteCond %{HTTP_REFERER} ^http://example3.com
RewriteRule ^.+/ - [F]

Would that give you (line1 OR (line2 AND line3) OR line4)?

No, I believe it would give you ((line1 OR line2) AND (line 3 OR line4)). The [OR] flag is referred to as a "local OR" and its scope is limited to the line it's on and the line that follows.

Frankly, I avoid such constructs, and I'd recommend testing it to find out; For purposes of clarity and ease of maintenance, it's often better to avoid large combinatorial logic fur-balls by breaking up the code using multiple RewriteRules. Until your .htaccess file exceeds 15kB or so, you won't notice any performance issues (unless you get 500,000+ hits per day).

Note that you can also do a "local AND." This is tricky, and you must keep in mind that we are using string comparison here and account for anchoring issues, but this works, and would give you (line1 OR (line2a AND line2b) OR line3):


RewriteCond %{HTTP_REFERER} ^http://example1.com [OR]
RewriteCond %{HTTP_REFERER}<->%{HTTP_USER_AGENT} ^http://example2\.com[^<]+<->WGET [OR]
RewriteCond %{HTTP_REFERER} ^http://example3.com
RewriteRule ^.+/ - [F]

Note that the "<->" is a completely-arbitrary string, and serves only as a "marker" to delimit the two pieces of the combined pattern. As such, the characters should be unique. I used "<->" because it implies concatenation, but it means nothing special to mod_rewrite or to the regular-expressions parser. It simply denotes the end of one sub-pattern and the beginning of the next, and as such, serves as a sort of "start anchor" for the second sub-pattern.

This is not a well-known technique, so I'd suggest you document it well if someone might come along after you in maintaining the site.

Jim

DonX

5:19 pm on Oct 12, 2004 (gmt 0)

10+ Year Member



Perfect, thank you, that makes sence. You have been ever so helpful.

As hopefully my last question, what has confused me though is if I write the following it stops hotlinking but if I then use a website downloader the images and zip files are downloaded anyway:

RewriteCond %{HTTP_REFERER}!mysite\.com [NC]
RewriteCond %{REQUEST_URI}!\.php [NC]
RewriteRule ^.*$ http://www.mysite.com/blocked.html [L,R]

I had thought that the referer wouldn't be my url and would be the program or a forged browser heading (causing it to be blocked), but I am now guessing that is not the case? Unless some form of caching of the called webpages is helping the downloader work.

jdMorgan

6:09 pm on Oct 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A site download utilitiy usually won't have any referrer, so you're stuck with the problem that if you block blank referrers, then you'll also block anyone behind a caching proxy, such as corporate users or any AOL user. Blocking blank referers will also lock out anyone who types your URL directly into their browser address bar.
So, in this case, you're much better off if you block the site downloaders by user-agent name, rather than blocking blank referers. Use "RewriteCond %{HTTP_USER_AGENT}" to check the UA and enforce UA-based access control.

Take a look at your raw server logs and review legitimate and unwelcome accesses; reviewing the referrer and user-agent strings in the log will make all of this much clearer.

One more issue: You have specified an external redirect in your RewriteRule. Understand that this requires handshaking with -- and the cooperation of -- the requesting user-agent. I'd suggest you use an internal rewrite instead.

Jim

DonX

3:06 pm on Oct 13, 2004 (gmt 0)

10+ Year Member



Wouldn't blocking blank referees be fine for AOL users etc as long as full access to all PHP webpages and directories was granted?
Although blocking blank referees still seems to allow these programs to download my images and zip files?

Ah, I hadn't realised the redirect was external. That is useful.

[edited by: jdMorgan at 7:51 pm (utc) on Oct. 13, 2004]
[edit reason] speling [/edit]

jdMorgan

8:07 pm on Oct 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All I can say is, if you want to block blank referrers, try it. However, you will need a 24-hour help desk to answer the phone and explain to your visitors that they can't access your site from AOL or from behind corporate caching proxies without seeing broken images.

One source of confusion that may be applicable here: HTTP is a "stateless" protocol. A single server request has no "memory" of any request that has gone before, and cannot have any effect on those that come after, unless some other method (such as cookies) are used to pass state information back to the browser to be included with the next request. Each request to your server, whether for a page, an image on that page, a client-side script, or a CSS style sheet, is a completely-separate request. So, if you block blank referrers, and the request comes from an ISP that caches images (such as AOL), then the request for the page will work, but the request for the images on that page won't -- because AOL's cache will block the referer, and your site will then block AOL's image requests. I cite AOL as a well-known example only; many ISPs and many corporations use the same kind of caching proxy to minimize their bandwidth utilization.

The bottom line is that blocking by referer works only well enough to discourage most hot-linkers, and cannot be made 100% reliable without causing problems for innocent, legitimate users. There are other methods, such as blocking IP addresses, blocking by behaviour (using scripts), and blocking by user-agent that can be used to supplement referrer blocking. Combining all of these methods provides fairly good access control against 'casual' hot-linkers.

If you need 100% access control, then you'll need to use cookies, or cookies and sessions, or password-protection.

So, I strongly suggest that you don't try to block blank referrers - It simply causes too many problems for legitimate users.

Jim