Forum Moderators: phranque
The preamble
-----------------------
This is the code that I'm currently using for anti-leeching purposes. But there's a couple of improvements I'd like to make for different senarios.
In the first case senario where someone is leeching bandwidth (or infringing copyright) by embedded my images in their site, I'd like to use REWRITE RULE 1, so that it displays an antileeching.jpg which would contain an appropriate alert/warning message on their site.
In the second case senario where someone is using a hypertext link to one of my images (so not actual displaying the image on their site, but rather linking directly to an image on my site) in this case I think it would be better to redirect traffic to my homepage, using something like REWRITE RULE 2.
This sounds good in theory but any ideas on how I'd go about writing the conditional statement to handle this?
RewriteEngine On
RewriteCond %{HTTP_REFERER}!^$
Options +FollowSymlinks
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?myfriends.org(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mywork.com(/)?.*$ [NC]
# REWRITE RULE 1
RewriteRule .*\.(gif¦jpg¦jpeg¦png¦swf)$ [mydomain.com...] [R,NC]
# REWRITE RULE 2
RewriteRule .*\.(gif¦jpg¦jpeg¦png¦swf)$ [mydomain.com...] [R,NC]
Welcome to WebmasterWorld!
There's no foolproof way to tell the difference between a link on a page including your images and a direct access to your image. This is because the HTTP_REFERRER header is notoriously unreliable. A few searches on WebmasterWorld for 'hotlinking' will turn up a lot more details on why this is so.
In addition, you cannot redirect from an image file to an HTML page file -- The browsers can't handle that.
Looking at your code, the first RewriteCond is misplaced, and should either be moved into the rule-set or commented out. Also, you may want to consider using an internal rewrite, rather than a redirect -- simply substitute your hotlink image for the requested image inside your server. This method does not require the cooperation of the client, and so keeps them unaware of the image substitution.
Changing that, and removing several instances of unneccessary leading and trailing ".*" sub-patterns, the code looks like this:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain\.com [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?myfriends\.org [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mywork\.com [NC]
RewriteRule \.(gif¦jpg¦jpeg¦png¦swf)$ /img/antileech.gif [NC]
Jim
thanks for the reply.
I suspected that might be the case. Thanks also for picking up those mistakes. I originally used a code generator
but that was my fault for mixing up the first few lines.
<snip>
The 2nd ReWrite rule does actual work, I have tested it. If someone links directly to an image it will redirect them to a html or php page. Which is a good option becuase you can redirect to your homepage. Which is a kind of seemless background approach.
The only downside to the 2nd ReWrite rule, is that if they embed the image, it produces a broken image icon, unlike the 1st method which does the substitution.
Just regarding the trailing ".*" sub-patterns, I'm just cautiously wonderring why that was added by the generator? If it's there for a reason that we've overlooked?
Many thanks again for your help.
riki
[edited by: jdMorgan at 1:24 am (utc) on Feb. 2, 2005]
[edit reason] No URLs or sigs, please. See TOS. [/edit]
The 2nd ReWrite rule does actual work, I have tested it. If someone links directly to an image it will redirect them to a html or php page. Which is a good option becuase you can redirect to your homepage. Which is a kind of seemless background approach.
What happens if an image search engine picks up that link and follows it? It makes a mess. And if the image is hotlinked, then the browser can't handle it properly... Browsers can't handle a redirect to an HTML page from an <img src=...> link.
The only downside to the 2nd ReWrite rule, is that if they embed the image, it produces a broken image icon, unlike the 1st method which does the substitution.
I don't fool around with wasting bandwidth on other sites. That's their problem. I prefer to serve up a simple, short, 403-Forbidden response, and worry about other more important things.
Just regarding the trailing ".*" sub-patterns, I'm just cautiously wonderring why that was added by the generator? If it's there for a reason that we've overlooked?
No. It is there because the generator or its author are taking the easy route, and not fixing the special case.
Avoid ".*" whenever possible. It is the greediest and most-ambiguous pattern, and therefore, the least effcient to process. Leading "^.*" and ".*" and trailing ".*$" and ".*" patterns are a waste of space and CPU time.
You can use a generator to get started, but don't count on automation for quirk-free, efficient code.
Jim
If one version blocks image requests from your own URL, at least that proves that mod_rewrite is functioning on your server. What this hints at is that the version that blocks your own referrer is blocking blank referrer requests (and this is what happens when you do that), whereas the other version allows blank referrer requests (which is what you must do in order to avoid such problems). This second approach obviously has a hole in it, but it is the best you can do, because the HTTP_REFERRER header is notoriously unreliable. Many ISP caches (e.g. AOL) block it, and many PC security packages like Norton Internet Security block it. So, blank referrers must be allowed.
If you allow blank referrers, then some proportion of the hotlink requests will work. But others won't. The webmaster of the other site will probably get plenty of complaints about the broken image, but it won't look broken for all visitors. I like to think this might just help drive him crazy...
You can also make this method more effective by controlling your image caching policy. If you don't set caching policy on your files, then the ISP caches and browsers will use their defaults. This may result in copies of your images sitting around in some ISP's cache for a long time, making it appear that your code doesn not work if you test through that ISP. Expire your images faster to avoid having old cached copies accessible for a long time. Expire them later to reduce server load. It's a balance.
This points out another factor; In order to test access-control code, you must flush your browser cache before testing each change to the code. If your browser has a copy of the image in its local cache, then that image won't be requested from the server, and so your server-side access control code can have no effect. So, flush that cache!
As I stated above, using .htaccess to block hotlinking based on the HTTP referrer is a convenient, simple, and only partially-effective approach. If you need better protection, then you've got to modify your scripts and establish a context-based image access policy. This is typically done with cookies tested by the script. If the cookie is present, the script supplies the image (as if from a database), and if not, supplies nothing or supplies an alternate image. Of course, this approach is complicated, but it works against all but determined image theives.
So, it's your choice; A simple partially-effective method, or a complex and very effecitve solution.
Jim
Mod_rewrite those image requests from the e-commerce script to a second script. This second script opens the image file, outputs the response header and MIME-type of the image, and then sends the image data. So the script pretends that it is the image file. However, this allows you to store your images in a directory that is completely inaccessible via HTTP. And the script can check for the cookie that allows the image to be served, and output a 403-Forbidden response if it's not valid.
You might try searching for scripts that do this, using keywords specific to e-commerce, hotlinking, anti-leeching, and image and bandwidth protection scripts.
Jim
# BLOCK linking from outside our domain except Google, Yahoo, AllTheWeb, AltaVista, Gigablast,
# Comet Systems, SearchHippo, Wayback Machine, and freetranslation.com translators and caches,
# plus Netscape4 image loading.
RewriteCond %{HTTP_REFERER} .
# Your domain(s)
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)?example\.com
# Your IP address(es)
RewriteCond %{HTTP_REFERER} !^http://192\.168\.0\.1$
# SE cache and transaltion service exclusions (substitute your own domain name for "example")
RewriteCond %{HTTP_REFERER} !^http://.*(search¦cache¦translate).+example\.com
RewriteCond %{HTTP_REFERER} !^http://images\.google\..+www\.example\.com
# Google image IPs
RewriteCond %{HTTP_REFERER} !^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com
RewriteCond %{HTTP_REFERER} !^http://rds\.yahoo\.com/.+example\.com
RewriteCond %{HTTP_REFERER} !^http://aolsearch\.aol\.com/aol/search
RewriteCond %{HTTP_REFERER} !^http://babelfish\.altavista\.com/.*example\.com
RewriteCond %{HTTP_REFERER} !^http://.*gigablast\.com/
RewriteCond %{HTTP_REFERER} !^http://.*searchhippo\.com.*example\.com
RewriteCond %{HTTP_REFERER} !^http://web\.archive\.org/web/.+example\.com
RewriteCond %{HTTP_REFERER} !^http://fets.*\.freetranslation\.com.+example
RewriteCond %{HTTP_REFERER} !^http://client\.sidesearch\.lycos\.com
RewriteCond %{HTTP_REFERER} !^http://cc\.msnscache\.com/cache\.aspx
RewriteCond %{HTTP_REFERER} !^http://web.ask.com/redir.*example\.com
# Netscape 4
RewriteCond %{HTTP_REFERER} !^wy[cs]iwyg://[0-9]{1,2}/http://(www\.)?example\.com
# Synergetics translation
RewriteCond %{REMOTE_ADDR} [b] [/b]!^207\.228\.(19[2-9]¦2[01][0-9]¦22[0-3])\.
RewriteRule \.(jpg¦jpeg?¦gif¦js¦css)$ - [F]
[added] Make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code above. [/added]
Jim
[edited by: jdMorgan at 6:52 am (utc) on Feb. 7, 2005]
Normally, only one picture is stolen so I can easily substitute that with something else.
A copule of months ago a Chinese site used the pictyre of a cover of a game box I scanned to help them sell a software version of the same game. I substitued a picture of two men doing something that I'm quite sure would get you sent to a correctional facility in China. I suspect their sales suffered.
RewriteCond %{HTTP_REFERER} ^http://images\.google\..+www\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com
RewriteCond %{REQUEST_URI} !/watermark/
RewriteRule ^(.+/)?([^/]+)\.(jpg¦jpeg?¦gif¦js¦css)$ /$1watermark/$2\.$3 [L]
Jim
# SE cache and transaltion service exclusions (substitute your own domain name for "example")
RewriteCond %{HTTP_REFERER}!^http://.*(search¦cache¦translate).+example\.com
RewriteCond %{HTTP_REFERER}!^http://images\.google\..+www\.example\.com
these dont seem to work for the google cache, I can't figure out why. Works for MSN cache though. (yes I changed EXAMPLE to my site)
I don't see why the original code didn't work for the referrer you posted.
"http://64.233.167.104/" matches [.*...]
"search" matches (search¦cache¦translate)
"?q=cache:lKyRJzLEOGoJ:www." matches .+
"example.com" matches example.com
and "/images/+inurl:mysite.com/images/&hl=en" is discarded and not required to match, because the pattern in unanchored.
I would recommend you try the original rule again, but make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code -- posting on this forum modifies some characters for security reasons, and "¦" is one of them (emphasis added for later browsers in this thread).
Jim
make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code -- posting on this forum modifies some characters for security reasons, and "¦" is one of them (emphasis added for later browsers in this thread)
why is that symbol so insecure anyways? Id like to know.
sorry for going slightly OT
I', not sure I understand all of it, but this morning when I went to check the pages which were hotlinking again, all of my images were there. I thought that maybe I had done something to the code, but it was fine. I turned off my firewall and lo and behold there was the new swapped hotlink gif. Would this be the case with most people...doesn't it really almost negate the effectiveness of the code, or will they all see the new hotlink GIF once the server's cache has updated?
The method shown above is a simple, easy method that works most of the time. Better solutions using cookies and image-serving scripts are available if you have the time and need to implement them.
Jim
I', not sure I understand all of it, but this morning when I went to check the pages which were hotlinking again, all of my images were there. I thought that maybe I had done something to the code, but it was fine. I turned off my firewall and lo and behold there was the new swapped hotlink gif. Would this be the case with most people...doesn't it really almost negate the effectiveness of the code, or will they all see the new hotlink GIF once the server's cache has updated?
That is becasue some (Bad) firewalls block all referer info. In this case your rules allow the image to be downloaded becasue of this line "RewriteCond %{HTTP_REFERER}!^$" you can comment out that line but then anyone who comes to your page with such a firewall will not see images and image bots like googlebot-image will not index the content.
RewriteCond %{HTTP_REFERER} ^http://images\.google\..+www\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com
I noticed the [OR] statement here. If we have multiple RewriteCond statement, does the order we list the domains matter? Did images.google.com RewriteCond statement come before 213.239.xx.xx because that would reduce processing time?
For example, *all* AOL users.
> If we have multiple RewriteCond statement, does the order we list the domains matter? Did images.google.com RewriteCond statement come before 213.239.xx.xx because that would reduce processing time?
The doamin is listed first because I see rquests from the domain more often than I see requests from the IP address.
Try to make your RewriteRule pattern as specific and exclusive as possible. If the pattern match in RewriteRule fails, no RewriteConds will be processed, which save time. Then put the RewriteConds in order from [most-likely to fail and fastest to process] to [least-likely to fail and slowest to process].
Generally, RewriteConds testing back-references and server variables are fastest.
RewriteConds testing file-exists or directory-exists must query the filesystem and are therefore slower.
The slowest RewriteCond is testing %{REMOTE_HOST}, because this invokes a reverse-DNS request; Your server must send a request to the domain name system and await a response before the current transaction can proceed. Avoid this at all costs, and in unavoidable, try to make this happen as infrequently as possible by writing specific RewriteRule and RewriteCond patterns, and by putting the %{REMOTE_HOST} test last in the list of RewriteConds.
I should point out that if you run a database-driven site, it's unlikely that you'll notice much difference in optimizing mod_rewrite code; Database query time will likely swamp out any gains from optimizing mod_rewrite. The same is true to a lesser extent if you run complex php or php scripts -- or any server-side scripts for that matter.
Just for reference, I have a site with a 35kB .htaccess file. I readily admit this is excessive, and I've been paring it down recently, now that the bad guys have figured out they can't steal anything without getting banned and reported and are starting to leave me alone. But the fact is that there is no noticeable difference in site performance when this big .htaccess file is enabled or disabled; The other site factors are much more important to performance.
At the same time, I believe making whatever code I've got run as efficiently as possible when I write it, even if there is a *lot* of it. So it's a balance; Try to write efficient code, but don't beat yourself to death trying to fine-tune everything... The computers are supposed to work for us, not the other way around! ;)
Jim