homepage Welcome to WebmasterWorld Guest from 54.205.189.156
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 38 message thread spans 2 pages: 38 ( [1] 2 > >     
Mod Rewrite Anti-Leech (A Better Version)
Mod Rewrite Anti-Leech (A Better Version)
riki




msg:1513400
 1:54 pm on Feb 1, 2005 (gmt 0)

Hi everyone my first post on these boards.

The preamble
-----------------------

This is the code that I'm currently using for anti-leeching purposes. But there's a couple of improvements I'd like to make for different senarios.

In the first case senario where someone is leeching bandwidth (or infringing copyright) by embedded my images in their site, I'd like to use REWRITE RULE 1, so that it displays an antileeching.jpg which would contain an appropriate alert/warning message on their site.

In the second case senario where someone is using a hypertext link to one of my images (so not actual displaying the image on their site, but rather linking directly to an image on my site) in this case I think it would be better to redirect traffic to my homepage, using something like REWRITE RULE 2.

This sounds good in theory but any ideas on how I'd go about writing the conditional statement to handle this?

RewriteEngine On
RewriteCond %{HTTP_REFERER}!^$
Options +FollowSymlinks
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mydomain.com(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?myfriends.org(/)?.*$ [NC]
RewriteCond %{HTTP_REFERER}!^http://(www\.)?mywork.com(/)?.*$ [NC]

# REWRITE RULE 1

RewriteRule .*\.(gif¦jpg¦jpeg¦png¦swf)$ [mydomain.com...] [R,NC]

# REWRITE RULE 2

RewriteRule .*\.(gif¦jpg¦jpeg¦png¦swf)$ [mydomain.com...] [R,NC]

 

jdMorgan




msg:1513401
 5:04 pm on Feb 1, 2005 (gmt 0)

riki,

Welcome to WebmasterWorld!

There's no foolproof way to tell the difference between a link on a page including your images and a direct access to your image. This is because the HTTP_REFERRER header is notoriously unreliable. A few searches on WebmasterWorld for 'hotlinking' will turn up a lot more details on why this is so.

In addition, you cannot redirect from an image file to an HTML page file -- The browsers can't handle that.

Looking at your code, the first RewriteCond is misplaced, and should either be moved into the rule-set or commented out. Also, you may want to consider using an internal rewrite, rather than a redirect -- simply substitute your hotlink image for the requested image inside your server. This method does not require the cooperation of the client, and so keeps them unaware of the image substitution.

Changing that, and removing several instances of unneccessary leading and trailing ".*" sub-patterns, the code looks like this:

Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mydomain\.com [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?myfriends\.org [NC]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?mywork\.com [NC]
RewriteRule \.(gif¦jpg¦jpeg¦png¦swf)$ /img/antileech.gif [NC]

This implements what I'd consider 'best practices' for casual-hotlink protection. It will prevent 'easy' hotlinking and dissuade people who don't know it's wrong and who don't know how to get around your blocking. It is easy and simple, but it won't help against a determined hotlinker.

If you need better protection, then you'll need to use a script-and-cookies-based approach; You set a cookie on an 'authorizated' page of your site, and then use a script to serve images only if the correct cookie is present in the image request. Images are kept in a directory accessible only to the script, and not via the Web. So, the script acts as an 'image server' on your site.

Jim

riki




msg:1513402
 12:32 am on Feb 2, 2005 (gmt 0)

Hi Jim

thanks for the reply.

I suspected that might be the case. Thanks also for picking up those mistakes. I originally used a code generator
but that was my fault for mixing up the first few lines.

<snip>

The 2nd ReWrite rule does actual work, I have tested it. If someone links directly to an image it will redirect them to a html or php page. Which is a good option becuase you can redirect to your homepage. Which is a kind of seemless background approach.

The only downside to the 2nd ReWrite rule, is that if they embed the image, it produces a broken image icon, unlike the 1st method which does the substitution.

Just regarding the trailing ".*" sub-patterns, I'm just cautiously wonderring why that was added by the generator? If it's there for a reason that we've overlooked?

Many thanks again for your help.

riki

[edited by: jdMorgan at 1:24 am (utc) on Feb. 2, 2005]
[edit reason] No URLs or sigs, please. See TOS. [/edit]

jdMorgan




msg:1513403
 1:36 am on Feb 2, 2005 (gmt 0)

The 2nd ReWrite rule does actual work, I have tested it. If someone links directly to an image it will redirect them to a html or php page. Which is a good option becuase you can redirect to your homepage. Which is a kind of seemless background approach.

What happens if an image search engine picks up that link and follows it? It makes a mess. And if the image is hotlinked, then the browser can't handle it properly... Browsers can't handle a redirect to an HTML page from an <img src=...> link.

The only downside to the 2nd ReWrite rule, is that if they embed the image, it produces a broken image icon, unlike the 1st method which does the substitution.

I don't fool around with wasting bandwidth on other sites. That's their problem. I prefer to serve up a simple, short, 403-Forbidden response, and worry about other more important things.

Just regarding the trailing ".*" sub-patterns, I'm just cautiously wonderring why that was added by the generator? If it's there for a reason that we've overlooked?

No. It is there because the generator or its author are taking the easy route, and not fixing the special case.
Avoid ".*" whenever possible. It is the greediest and most-ambiguous pattern, and therefore, the least effcient to process. Leading "^.*" and ".*" and trailing ".*$" and ".*" patterns are a waste of space and CPU time.

You can use a generator to get started, but don't count on automation for quirk-free, efficient code.

Jim

ichthyous




msg:1513404
 5:02 pm on Feb 6, 2005 (gmt 0)

I have had this same problem for some time now. The problem is growing worse and worse and this morning I even found a major NYC university which will remain nameless (Columbia U.) hotlinking to one of my images. I have tried at least 100 variations of this .htaccess code, including the one above, and none of them work right. They either block all images even from my own URL or they do nothing. The directory where the images are located are part of an online store so it can't be removed/renamed or altered in any way. Do you think it might be more effective to add this to my httpd.conf file than as .htaccess in the directory folder itself? How would I rewrite it to do that? Thanks

jdMorgan




msg:1513405
 6:00 pm on Feb 6, 2005 (gmt 0)

Moving the code to httpd.conf won't make any difference in whether it blocks hotlink attempts or not. The only difference is in execution speed; code in httpd.conf is compiled at server start, whereas code in .htaccess is interpreted for each HTTP request.

If one version blocks image requests from your own URL, at least that proves that mod_rewrite is functioning on your server. What this hints at is that the version that blocks your own referrer is blocking blank referrer requests (and this is what happens when you do that), whereas the other version allows blank referrer requests (which is what you must do in order to avoid such problems). This second approach obviously has a hole in it, but it is the best you can do, because the HTTP_REFERRER header is notoriously unreliable. Many ISP caches (e.g. AOL) block it, and many PC security packages like Norton Internet Security block it. So, blank referrers must be allowed.

If you allow blank referrers, then some proportion of the hotlink requests will work. But others won't. The webmaster of the other site will probably get plenty of complaints about the broken image, but it won't look broken for all visitors. I like to think this might just help drive him crazy...

You can also make this method more effective by controlling your image caching policy. If you don't set caching policy on your files, then the ISP caches and browsers will use their defaults. This may result in copies of your images sitting around in some ISP's cache for a long time, making it appear that your code doesn not work if you test through that ISP. Expire your images faster to avoid having old cached copies accessible for a long time. Expire them later to reduce server load. It's a balance.

This points out another factor; In order to test access-control code, you must flush your browser cache before testing each change to the code. If your browser has a copy of the image in its local cache, then that image won't be requested from the server, and so your server-side access control code can have no effect. So, flush that cache!

As I stated above, using .htaccess to block hotlinking based on the HTTP referrer is a convenient, simple, and only partially-effective approach. If you need better protection, then you've got to modify your scripts and establish a context-based image access policy. This is typically done with cookies tested by the script. If the cookie is present, the script supplies the image (as if from a database), and if not, supplies nothing or supplies an alternate image. Of course, this approach is complicated, but it works against all but determined image theives.

So, it's your choice; A simple partially-effective method, or a complex and very effecitve solution.

Jim

ichthyous




msg:1513406
 6:57 pm on Feb 6, 2005 (gmt 0)

Thanks JD!...I did empty my cache several times, but not every time. I will go back and see if some of the umpteen thousand versions I tried will work. At this point I would gladly spend some $ to buy a stronger script, as i have spent countless hours on this. Would a script of the type you mentioned work with a pre-existing image folder? In order for my store to work the images must be served from a specific folder. From what i understand you are saying the script would require serving all images from another folder. Can you recommend any script? Thanks!

jdMorgan




msg:1513407
 7:26 pm on Feb 6, 2005 (gmt 0)

No, this is not my market area, so I'm speaking generally.

Mod_rewrite those image requests from the e-commerce script to a second script. This second script opens the image file, outputs the response header and MIME-type of the image, and then sends the image data. So the script pretends that it is the image file. However, this allows you to store your images in a directory that is completely inaccessible via HTTP. And the script can check for the cookie that allows the image to be served, and output a 403-Forbidden response if it's not valid.

You might try searching for scripts that do this, using keywords specific to e-commerce, hotlinking, anti-leeching, and image and bandwidth protection scripts.

Jim

ichthyous




msg:1513408
 7:41 pm on Feb 6, 2005 (gmt 0)

I actually just used the code in this post and tried it in all my browsers after flushing the cache...works like a charm! The only problem is that now the search engine cached pages also show the redirected-to jpeg. I know I can add code to exclude the cached search engines...but which URLs should I use for them? Is there a way to allow all Google related referrers to see the images for example?

ichthyous




msg:1513409
 7:59 pm on Feb 6, 2005 (gmt 0)

I went into Google and got the IP address where the cached pages are stored, so they are showing the normal images now...but don't Google and the other SEs have hundreds of IP addresses for this purpose? Thanks for the cleaned up code...works great

jdMorgan




msg:1513410
 8:22 pm on Feb 6, 2005 (gmt 0)

Here's an example of how to add exclusions. This is copied from one of my sites. This code comes with no warranties, expressed or implied, and no support. You *will* probably have to modify it to suit your needs.

# BLOCK linking from outside our domain except Google, Yahoo, AllTheWeb, AltaVista, Gigablast,
# Comet Systems, SearchHippo, Wayback Machine, and freetranslation.com translators and caches,
# plus Netscape4 image loading.
RewriteCond %{HTTP_REFERER} .
# Your domain(s)
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)?example\.com
# Your IP address(es)
RewriteCond %{HTTP_REFERER} !^http://192\.168\.0\.1$
# SE cache and transaltion service exclusions (substitute your own domain name for "example")
RewriteCond %{HTTP_REFERER} !^http://.*(search¦cache¦translate).+example\.com
RewriteCond %{HTTP_REFERER} !^http://images\.google\..+www\.example\.com
# Google image IPs
RewriteCond %{HTTP_REFERER} !^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com
RewriteCond %{HTTP_REFERER} !^http://rds\.yahoo\.com/.+example\.com
RewriteCond %{HTTP_REFERER} !^http://aolsearch\.aol\.com/aol/search
RewriteCond %{HTTP_REFERER} !^http://babelfish\.altavista\.com/.*example\.com
RewriteCond %{HTTP_REFERER} !^http://.*gigablast\.com/
RewriteCond %{HTTP_REFERER} !^http://.*searchhippo\.com.*example\.com
RewriteCond %{HTTP_REFERER} !^http://web\.archive\.org/web/.+example\.com
RewriteCond %{HTTP_REFERER} !^http://fets.*\.freetranslation\.com.+example
RewriteCond %{HTTP_REFERER} !^http://client\.sidesearch\.lycos\.com
RewriteCond %{HTTP_REFERER} !^http://cc\.msnscache\.com/cache\.aspx
RewriteCond %{HTTP_REFERER} !^http://web.ask.com/redir.*example\.com
# Netscape 4
RewriteCond %{HTTP_REFERER} !^wy[cs]iwyg://[0-9]{1,2}/http://(www\.)?example\.com
# Synergetics translation
RewriteCond %{REMOTE_ADDR} [b] [/b]!^207\.228\.(19[2-9]¦2[01][0-9]¦22[0-3])\.
RewriteRule \.(jpg¦jpeg?¦gif¦js¦css)$ - [F]

You'll have to work out what you want and need by examining your raw server logs and testing.

[added] Make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code above. [/added]

Jim

[edited by: jdMorgan at 6:52 am (utc) on Feb. 7, 2005]

ichthyous




msg:1513411
 9:12 pm on Feb 6, 2005 (gmt 0)

Thanks for posting that...I am wondering if I really need to worry about excluding any of these anyway. Does it really matter if the images on my page can't be seen in the SE cached pages? Seeing that my site is also more about images than text, and my clients are overwhelmingly english speakers I am not worried about translation services either. Is there some benefit I may not be seeing in allowing all of these to see the images?

matt21811




msg:1513412
 10:19 pm on Feb 6, 2005 (gmt 0)

My sites are too small to worry about this serious prevention of bandwidth theft so I can worry about it on a case by case basis. This isnt helpful to your discussion but I'm hoping you find it amusing.

Normally, only one picture is stolen so I can easily substitute that with something else.

A copule of months ago a Chinese site used the pictyre of a cover of a game box I scanned to help them sell a software version of the same game. I substitued a picture of two men doing something that I'm quite sure would get you sent to a correctional facility in China. I suspect their sales suffered.

philaweb




msg:1513413
 10:27 pm on Feb 6, 2005 (gmt 0)

RewriteRule \.(gif¦jpg¦jpeg¦png¦swf)$ /img/antileech.gif [NC]

Make sure the anti leech image isn't covered by the same .htaccess code. Made that mistake once. ;)

paulroberts3000




msg:1513414
 10:56 pm on Feb 6, 2005 (gmt 0)


how about redirecting google to a watermarked version of the image?

sun818




msg:1513415
 12:41 am on Feb 7, 2005 (gmt 0)

> # BLOCK linking from outside

Is it possible to block image hotlinkers that are coming from specific domains only? The examples that I am seeing seem to allow only those you specify. I'm trying to figure out a way block the major abusers only.

[edited by: sun818 at 1:35 am (utc) on Feb. 7, 2005]

jdMorgan




msg:1513416
 12:43 am on Feb 7, 2005 (gmt 0)

> watermarked...
Sure, here's one way to do it:

RewriteCond %{HTTP_REFERER} ^http://images\.google\..+www\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com
RewriteCond %{REQUEST_URI} !/watermark/
RewriteRule ^(.+/)?([^/]+)\.(jpg¦jpeg?¦gif¦js¦css)$ /$1watermark/$2\.$3 [L]

This would rewrite a Google request for /shopping/images/widget.gif to /shopping/images/watermark/widget.gif
or a request for /widget.jpg to /watermark/widget.jpg

Jim

M0nKeY




msg:1513417
 1:12 am on Feb 7, 2005 (gmt 0)

# SE cache and transaltion service exclusions (substitute your own domain name for "example")
RewriteCond %{HTTP_REFERER}!^http://.*(search¦cache¦translate).+example\.com
RewriteCond %{HTTP_REFERER}!^http://images\.google\..+www\.example\.com

these dont seem to work for the google cache, I can't figure out why. Works for MSN cache though. (yes I changed EXAMPLE to my site)

jdMorgan




msg:1513418
 2:45 am on Feb 7, 2005 (gmt 0)

It works for me. Did you flush your browser cache before testing? (See message 6 above).

If you still have problems, paste the image request from your raw server access log so we can look at it. Remove your IP address to protect your privacy.

Jim

M0nKeY




msg:1513419
 3:30 am on Feb 7, 2005 (gmt 0)

No its not my browser caching.

"GET /images/pic.jpg HTTP/1.1" 302 190 "http://64.233.167.104/search?q=cache:lKyRJzLEOGoJ:www.mysite.com/images/+inurl:mysite.com/images/&hl=en"

M0nKeY




msg:1513420
 3:35 am on Feb 7, 2005 (gmt 0)

Fixed it.

[EDITED OUT MY FAULTY RULESET]

needed a * after the "(search¦cache¦translate)"

[edited by: M0nKeY at 4:10 am (utc) on Feb. 7, 2005]

jdMorgan




msg:1513421
 3:58 am on Feb 7, 2005 (gmt 0)

Adding a star after the grouped subexpression makes it optional (it would match the previous parenthesized subexpression repeated zero or more times). This means that any referrer that begins with
"http://" and contains your domain will bypass the rule. That opens a fairly large hole in the protection.

I don't see why the original code didn't work for the referrer you posted.

"http://64.233.167.104/" matches [.*...]
"search" matches (search¦cache¦translate)
"?q=cache:lKyRJzLEOGoJ:www." matches .+
"example.com" matches example.com
and "/images/+inurl:mysite.com/images/&hl=en" is discarded and not required to match, because the pattern in unanchored.

I would recommend you try the original rule again, but make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code -- posting on this forum modifies some characters for security reasons, and "¦" is one of them (emphasis added for later browsers in this thread).

Jim

M0nKeY




msg:1513422
 4:05 am on Feb 7, 2005 (gmt 0)

Yep it was the pipes.

Thanks for the rules!

dmmh




msg:1513423
 1:10 pm on Feb 7, 2005 (gmt 0)

make sure you replace the broken pipe "¦" characters with solid pipes before trying to use the code -- posting on this forum modifies some characters for security reasons, and "¦" is one of them (emphasis added for later browsers in this thread)

why is that symbol so insecure anyways? Id like to know.

sorry for going slightly OT

ichthyous




msg:1513424
 2:02 pm on Feb 7, 2005 (gmt 0)

"If one version blocks image requests from your own URL, at least that proves that mod_rewrite is functioning on your server. What this hints at is that the version that blocks your own referrer is blocking blank referrer requests (and this is what happens when you do that), whereas the other version allows blank referrer requests (which is what you must do in order to avoid such problems). This second approach obviously has a hole in it, but it is the best you can do, because the HTTP_REFERRER header is notoriously unreliable. Many ISP caches (e.g. AOL) block it, and many PC security packages like Norton Internet Security block it. So, blank referrers must be allowed. If you allow blank referrers, then some proportion of the hotlink requests will work. But others won't."

I', not sure I understand all of it, but this morning when I went to check the pages which were hotlinking again, all of my images were there. I thought that maybe I had done something to the code, but it was fine. I turned off my firewall and lo and behold there was the new swapped hotlink gif. Would this be the case with most people...doesn't it really almost negate the effectiveness of the code, or will they all see the new hotlink GIF once the server's cache has updated?

sun818




msg:1513425
 7:22 pm on Feb 7, 2005 (gmt 0)

> will they all see the new hotlink GIF once the server's cache has updated?

The effect should be immediate. jdMorgan has instructed that you clear your browse cache before testing any code changes. I would say this includes any proxy or personal firewall software too that might cache images.

jdMorgan




msg:1513426
 8:20 pm on Feb 7, 2005 (gmt 0)

I'll point out once again that access control based on the HTTP_REFERER header is not 100% reliable. But it is reliable enough that it will cause noticeable problems for the site hotlinking to yours.

The method shown above is a simple, easy method that works most of the time. Better solutions using cookies and image-serving scripts are available if you have the time and need to implement them.

Jim

M0nKeY




msg:1513427
 8:23 pm on Feb 7, 2005 (gmt 0)

I', not sure I understand all of it, but this morning when I went to check the pages which were hotlinking again, all of my images were there. I thought that maybe I had done something to the code, but it was fine. I turned off my firewall and lo and behold there was the new swapped hotlink gif. Would this be the case with most people...doesn't it really almost negate the effectiveness of the code, or will they all see the new hotlink GIF once the server's cache has updated?

That is becasue some (Bad) firewalls block all referer info. In this case your rules allow the image to be downloaded becasue of this line "RewriteCond %{HTTP_REFERER}!^$" you can comment out that line but then anyone who comes to your page with such a firewall will not see images and image bots like googlebot-image will not index the content.

sun818




msg:1513428
 9:35 pm on Feb 7, 2005 (gmt 0)

RewriteCond %{HTTP_REFERER} ^http://images\.google\..+www\.example\.com [OR]
RewriteCond %{HTTP_REFERER} ^http://216\.239\.(3[2-9]¦[45][0-9]¦6[0-3])\..*example\.com

I noticed the [OR] statement here. If we have multiple RewriteCond statement, does the order we list the domains matter? Did images.google.com RewriteCond statement come before 213.239.xx.xx because that would reduce processing time?

jdMorgan




msg:1513429
 10:28 pm on Feb 7, 2005 (gmt 0)

> then anyone who comes to your page with such a firewall will not see images...

For example, *all* AOL users.

> If we have multiple RewriteCond statement, does the order we list the domains matter? Did images.google.com RewriteCond statement come before 213.239.xx.xx because that would reduce processing time?

The doamin is listed first because I see rquests from the domain more often than I see requests from the IP address.

Try to make your RewriteRule pattern as specific and exclusive as possible. If the pattern match in RewriteRule fails, no RewriteConds will be processed, which save time. Then put the RewriteConds in order from [most-likely to fail and fastest to process] to [least-likely to fail and slowest to process].

Generally, RewriteConds testing back-references and server variables are fastest.

RewriteConds testing file-exists or directory-exists must query the filesystem and are therefore slower.

The slowest RewriteCond is testing %{REMOTE_HOST}, because this invokes a reverse-DNS request; Your server must send a request to the domain name system and await a response before the current transaction can proceed. Avoid this at all costs, and in unavoidable, try to make this happen as infrequently as possible by writing specific RewriteRule and RewriteCond patterns, and by putting the %{REMOTE_HOST} test last in the list of RewriteConds.

I should point out that if you run a database-driven site, it's unlikely that you'll notice much difference in optimizing mod_rewrite code; Database query time will likely swamp out any gains from optimizing mod_rewrite. The same is true to a lesser extent if you run complex php or php scripts -- or any server-side scripts for that matter.

Just for reference, I have a site with a 35kB .htaccess file. I readily admit this is excessive, and I've been paring it down recently, now that the bad guys have figured out they can't steal anything without getting banned and reported and are starting to leave me alone. But the fact is that there is no noticeable difference in site performance when this big .htaccess file is enabled or disabled; The other site factors are much more important to performance.

At the same time, I believe making whatever code I've got run as efficiently as possible when I write it, even if there is a *lot* of it. So it's a balance; Try to write efficient code, but don't beat yourself to death trying to fine-tune everything... The computers are supposed to work for us, not the other way around! ;)

Jim

This 38 message thread spans 2 pages: 38 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved