Forum Moderators: phranque

Message Too Old, No Replies

Preventing hotlinking

but allowing search engines?

         

geekay

7:20 pm on Dec 13, 2004 (gmt 0)

10+ Year Member



Hotlinking to picture files on other people's sites seems to have increased dramatically after the invention of efficient image search engines. To prevent this waste of bandwidth I had to implement the appropriate mod_rewrite code.

The drawback is that legitimate users, of image search engines, who click on the "Actual size: see image alone" link in the SE, or on a link in a cached page, will no longer get the file. In 2002 there was a thread on WW about how to exclude specific SE's in this mod_rewrite code, but it is based on their actual IP numbers.

Is there any simple way to find out what IP ranges are used by at least Google and Yahoo today, and to keep that list up-to-date? Or are the different ranges too numerous and inconsistent? Any other solution to the main problem (the hotlinking)?

jdMorgan

8:24 pm on Dec 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you'll find that all you need to do is check the HTTP_REFERER and allow requests referred from the image search providers you wish to allow.

For example:


RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://www\.yoursite\.com
RewriteCond %{HTTP_REFERER} !^http://images\.google\..+www\.yoursite\.com
RewriteRule \.gif - [F]

Jim

geekay

9:42 pm on Dec 13, 2004 (gmt 0)

10+ Year Member



This is the solution! But it works although I omit the text ".+www\.yoursite\.com" from the last Cond. Such a text is indeed part of the full referer in my web log, but I cannot figure out why it is good to include in the code.

Maybe other readers here would also be interested in learning why that text is useful?

jdMorgan

9:44 pm on Dec 13, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because it prevent images from being displayed if googleimages indexes a site that hotlinks to your image. It requires that the referring link in google contains your site address, and not someone else's.

Jim

kewen

9:42 pm on Dec 14, 2004 (gmt 0)

10+ Year Member



J.D.

With the same problem but using a different method.

This is the code I am using:

RewriteCond %{http_referer} ^.*\.co.*¦.*\.net.*¦.*\.org.*$
RewriteCond %{request_method}<>%{request_uri}!^.*\.htm.*¦.*\.css.*¦.*\.txt.*¦.*\.ico.*$
RewriteCond %{http_referer}!^.*\google.*¦.*\xnyf\.org.*$
ReWriteRule ^.*$ xbcgdb/image/special/remove\.gif [L]

My hotlinks are coming from chat rooms with .co and .com and .net and .org. so I excluded all but the permitted files.

Most people cannot do it this way and have it work because

1. Their own referer is .com .org whereas mine is .xx

2. 99% of the traffic on the site is not to the index page at first. When they do click on it, it has the sites referrer which does not end with the .com etc.

So this works for me, except.

One percent of people do want to click on the front page, and maybe five percent of those link with a referrer of .com .org etc.

I added one site, that links to the front page with a .org .

I would like to allow the front page to be seen by anyone, but people do not type in www.mysite.cc/index.html

They type: www.mysite.xx

I don't know how to make the front page an exclusion

Such as:

RewriteCond %{request_method}<>%{request_uri}!

One reason I use this method is that I can direct a Please remove hotlink gif as soon as a hotlink is attempted, so people in the chat rooms can see what it looks like.

It has been stopping new hotlinks.

(I only have google coming onto the site for images from the search engines and with the 'please remove' image popping up when hotlinker checks the hotlink from the .co , etc site, this seems to stop them, and they remove the hotlink.)

Thanks for any help on allowing the front page to anyone.

jdMorgan

5:17 am on Dec 15, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



kewen,

Your "front page" is just that -- a page. If you block images, they are blocked, no matter what page they are included on. It's important to understand that each page and each image included by each page is requested separately, and the server has no idea what page is responsible for requesting each image unless that information is included in the HTTP_REFERER header sent by the browser.

The code you posted is very badly broken in many ways. I suggest you use the code posted in msg#2 above as a starting point to fix the problem with your front page images.

Jim

kewen

3:25 pm on Dec 15, 2004 (gmt 0)

10+ Year Member



Thanks for the reply.

After posting the above I changed the code to:

RewriteCond %{http_referer} ^.*\.co.*¦.*\.ne.*¦.*\.or.*$
RewriteCond %{request_method}<>%{request_uri} ^.*\.jp.*¦.*\.gif
RewriteCond %{http_referer}!^.*\google.*$
ReWriteRule ^.*$ vvvvv/image/special/remove\.gif [L]

I realized I could just reverse what I was trying to do and it would work, and not affect those trying to get to the front page.

Also that anyone could use it simply by adding part of their domain name next to google as an additional exclusion.

Not sure what you mean by the code is very badly broken in many ways but in the absence of you explaining thank you for comments anyway.