Forum Moderators: open

Message Too Old, No Replies

Hotlink Protection from Sites Blocking Refers on Cloudflare

         

3zero

12:01 am on Jul 12, 2018 (gmt 0)



Hello

Thought I might share this script and if possible perhaps improve it with community input

OK So first of all the problem, most image hotlink scripts rely on the the "referer" to block the image, however more advanced hotlinking simply removes the referer

So why not block blank referer? Well yes thats kinda what you have to do with an exception for searchbots

I use Cloudflare and have now added the following javascript to all pages using their new "Workers" feature and applied it to the image directory

I also think this could be used to protect other areas of the site (like forms)


addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
// check if there is a user agent
if (request.headers.get('user-agent')) {
// if a search engine allow request
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('msn')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {

//check if there is a referer
if (referer) {
// It's an image and there's a Referer. Verify that the
// hostnames match.
if (new URL(referer).hostname ==
new URL(request.url).hostname) {
return fetch(request)
} else {

console.log('referer',referer)
// Hosts don't match. This is a hotlink. Redirect the
// user to our 404.
return new Response('Sorry, this page is not available at this time.', {
status: 404,
headers: {
'Location': '/404'
}

})
}
}
}
}
else {
console.log('referer',referer)
// `No Useragent. Redirect to
// 404
return new Response('Sorry, this page is not available at this time.', {
status: 404,
headers: {
'Location': '/404'
}

})
}
}


The reason I wanted to use a 404 as the error page is it gives the hotlinker the least information

Leosghost

12:53 am on Jul 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My browser doesn't send a referer unless I set it to do so..I'm not the only one by any means whose browser is set to do this..

Sending a 404 gives us no information either..whereas a 403 does give hint that for some reason "we" cannot access a page which does in fact exist..If one accepts that a 404 is "temporarily moved", a 410 is "gone" and a 403 is "forbidden", a 403 would seem to be the logical code to use.

3zero

1:18 am on Jul 12, 2018 (gmt 0)



Thanks Leosghost really appreciate the feedback, great idea on 403, also I didn't explain it very well it only blocks images so the page is still visable.

amended code


addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
if (request.headers.get('user-agent')) {
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('msn')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {


if (referer) {
// It's an image and there's a Referer. Verify that the
// hostnames match.
if (new URL(referer).hostname ==
new URL(request.url).hostname) {
return fetch(request)
} else {
console.log('referer',referer)
// Hosts don't match. This is a hotlink. Redirect the
// user to our homepage.
return new Response('Sorry, this image is not available at this time.', {
status: 403,
headers: {
'Location': '/403'
}

})
}
}
}
}
else {
console.log('referer',referer)
// Hosts don't match. This is a hotlink. Redirect the
// user to our homepage.
return new Response('Sorry, this image is not available at this time.', {
status: 403,
headers: {
'Location': '/403'
}

})
}
}

Leosghost

8:53 am on Jul 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Last night ..late ( early morning my time actually about 03.00 am ) I wrote above
If one accepts that a 404 is "temporarily moved"

which is incorrect, I was tired, and realised whilst drifting off to sleep what I'd written, but was certainly not going to get back in front of the keyboard to correct it then..
So..should read ..
"if one accepts that a 404 is "cannot find"..or "not found""..( which may be a temporary condition )
because "temporarily moved" would be a 302..although 302 got redefined as "found"
[en.wikipedia.org...]

Leosghost

9:03 am on Jul 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You could also in some interpretations of which code to use and why reply with a 400 or a 401..
some of the 400s are somewhat wide in scope and some are overlapping, depending upon what one considers to be included in , or excluded by each definition.
[en.wikipedia.org...]

keyplyr

9:43 am on Jul 12, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My browser doesn't send a referer...
That's the problem. In these times of HTTPS and various security headers, most requests do not include a referer so anti-hotlinking measures are no longer the smart move.

I removed my anti-hotlinking code a couple years ago. It became a useless endeavor and caused a few problems.

If you swap the hotlinked image with an alternate image, that one will often get cached by ISPs and even indexed in the various Image Searches... I've seen it happen more than a few times.

If you see a significant impact from hotlinking, example: several thousand hourly requests from a remote page, just replace that image with a 1X1 clear gif (even if the image is a jpg) on your server.

Then stop worrying about it.

3zero

10:31 pm on Jul 16, 2018 (gmt 0)



Hi Keyplr I appreciate your advice but possibly I didn't explain this fully

most requests do not include a referer


Ok we are only filtering direct requests to the image folder not webpages. Users that land on a webpage will normally send a referrer (about 98.7% of the time) to the image folder. The 1.3% of requests are probably users that mask the referrer and hotlinks but its by no means most.

Most requests are actually bots and maybe they don't include a referrer, this script is designed for users, and whitelists search engine bots.

I am happy to post visuals that qualify this if your interested.

keyplyr

1:18 am on Jul 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There are numerous reasons for not blocking hot-linking.

It is not true that most of your visitors send a referrer to your image directory. They don't send a referrer period. Most browsers just don't do this anymore. You may think this because of some stats software or analytics report.

In addition, when Facebook comes around to cache your images, it won't get them since their image caching agent doesn't send a referrer either. Then when someone wants to share your site at FB, and potentially bring you thousands of visitors, there won't be that nice image linked to your site, so most won't follow the link.

Most all the social media apps do not send a referrer.

The list goes on... but I guess you'll have to find out for yourself.

.

3zero

9:18 pm on Jul 17, 2018 (gmt 0)



Hi Keyplyr

Thanks for the suggestions I have amended the script to allow sharing of images on social networks. Can you point me in the right direction about "most" browsers not sending referrers, I can't find any evidence of this. Yes a browser can be set to not send referrer but its an extremely low percentage of users.


addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
let urlreq = new URL(request.url).hostname


if (request.headers.get('user-agent')) {
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('bingbot')) || (request.headers.get('user-agent').includes('pinterest')) || (request.headers.get('user-agent').includes('facebookfacebookexternalhit')) || (request.headers.get('user-agent').includes('facebook')) || (request.headers.get('user-agent').includes('twitter')) || (request.headers.get('user-agent').includes('GoogleImageProxy')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {



if (referer) {
// It's an image and there's a Referer. Verify that the host

if (new URL(referer).hostname ==
new URL(request.url).hostname) {
return fetch(request)

// Else if its pinterest

} else if (new URL(request.url).hostname = 'pinterest.com') {
return fetch(request)

// Else if its twiiter

} else if (new URL(request.url).hostname = 'twitter.com') {
return fetch(request)

// Else if its google

} else if (new URL(request.url).hostname = 'google.com') {
return fetch(request)
// Else if its bing

} else if (new URL(request.url).hostname = 'bing.com') {
return fetch(request)
} else {

console.log('referer',referer)
// Hosts don't match. This is a hotlink and its not a friendly search bot or social media
return new Response('Sorry, this image is not available at this time.', {
status: 403,


})
}
}
}
}
else {
console.log('referer',referer)
// No useragent
return new Response('Sorry, this image is not available at this time.', {
status: 403,


})
}

}


keyplyr

10:06 pm on Jul 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you point me in the right direction about "most" browsers not sending referrers, I can't find any evidence of this...
What's your evidence. Where are you getting your data from?

If you are going to do this, it's highly recommended to allow blank referrer.

3zero

10:51 pm on Jul 17, 2018 (gmt 0)



Sorry correction to the code please use this instead:


addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
let urlreq = new URL(request.url).hostname


if (request.headers.get('user-agent')) {
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('bingbot')) || (request.headers.get('user-agent').includes('pinterest')) || (request.headers.get('user-agent').includes('facebookfacebookexternalhit')) || (request.headers.get('user-agent').includes('facebook')) || (request.headers.get('user-agent').includes('twitter')) || (request.headers.get('user-agent').includes('GoogleImageProxy')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {



if (referer) {
// It's an image and there's a Referer. Verify that the
// hostnames match.
if (new URL(referer).hostname ==
new URL(request.url).hostname) {
return fetch(request)

// Else if its pinterest
// hostnames match.

} else if (new URL(request.url).hostname.indexOf('pinterest')) {
return fetch(request)

// Else if its twiiter
// hostnames match.
} else if (new URL(request.url).hostname.indexOf('twitter')) {
return fetch(request)

// Else if its google
// hostnames match.
} else if (new URL(request.url).hostname.indexOf('google')) {
return fetch(request)
// Else if its bing
// hostnames match.
} else if (new URL(request.url).hostname.indexOf('bing')) {
return fetch(request)
} else {

console.log('referer',referer)
// Hosts don't match. This is a hotlink and its not a friendly search bot or social media
return new Response('Sorry, this image is not available at this time.', {
status: 403,


})
}
}
}
}
else {
console.log('referer',referer)
// No useragent
return new Response('Sorry, this image is not available at this time.', {
status: 403,


})
}

}

keyplyr

11:01 pm on Jul 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can always block the Cloudflare bot that caches the hotlinked image from your images files:
Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online) AppleWebKit/534.34

3zero

11:06 pm on Jul 17, 2018 (gmt 0)



Hi Keyplyr,

I basing it on log requests and research. For instance Chrome even in incognito will still send a referrer. Most browser requests do, bots however don't of course. As for implementation, its live already. The purpose of this is to eliminate sites that not only hotlink images but are also offering free downloads of those images causing bandwidth costs and brand dilution. Is therefore essential blank referrers to image directories are filtered and good domains whitelisted. Its unfortunate that a small amount of users may be affected I agree, but by no means is it "most" users.

lucy24

11:06 pm on Jul 17, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



most requests do not include a referer so anti-hotlinking measures are no longer the smart move
How do you get from point A to point B? Blocking hotlinking was never about denying requests with no referer--you’d have to poke holes right and left for every legitimate search engine, for starters. It’s about requests with wrong referers, of which there are still plenty.

3zero

11:10 pm on Jul 17, 2018 (gmt 0)



You can always block the Cloudflare bot that caches the hotlinked image from your images files:


Now your talking..... great idea I'll look into it

3zero

11:16 pm on Jul 17, 2018 (gmt 0)



Hi Lucy,

This is designed to tackle the specific problem of sites that hotlink, block referrers and then offer free downloads of your image files, on your bandwidth. I am not happy about filtering blank referrers and potentially blocking users from images which is why I published and opened this up for contributions. However at present its the only way i can think of resolving this nasty issue.

keyplyr

3:27 am on Jul 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Cloudflare bot crawls the original site with the image and caches it. Once done, you won't see that many referrers from the hot-linking site, since requests for that image will now be pointed toward the Cloudflare caching system.

So it's important to catch the hotlinker early before all this gets hidden. The hotlinking damage (depending on how you look at it) continues to exist since they now have your unique image they can do whatever they want with, plus it delutes branding since Google Image Search and others will now have an additional copy of this image and you won't be the only one.

3zero

10:35 pm on Jul 18, 2018 (gmt 0)



Thanks for your help with this Keyplyr really appreciated, OK I think this should do the trick, OK this is the latest script



addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
let urlreq = new URL(request.url).hostname

if (request.headers.get('user-agent')) {
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('applebot')) || (request.headers.get('user-agent').includes('bingbot')) || (request.headers.get('user-agent').includes('pinterest')) || (request.headers.get('user-agent').includes('facebookfacebookexternalhit')) || (request.headers.get('user-agent').includes('facebook')) || (request.headers.get('user-agent').includes('twitter')) || (request.headers.get('user-agent').includes('GoogleImageProxy')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {



if (referer) {


if ((new URL(referer).hostname !== new URL(request.url).hostname) && (new URL(request.url).hostname.indexOf('google')=== -1) && (new URL(request.url).hostname.indexOf('apple')=== -1) && (new URL(request.url).hostname.indexOf('bing')=== -1) && (new URL(request.url).hostname.indexOf('twitter')=== -1) && (new URL(request.url).hostname.indexOf('facebook')=== -1) && (new URL(request.url).hostname.indexOf('yandex')=== -1)) {
// Hosts don't match. This is a hotlink and its not a friendly search bot or social media
return new Response('Sorry, this image is not available at this time.... ', {

status: 403,
})

} else {
return fetch(request)

}
// no referer
} else {
console.log('referer',referer)
return new Response('Sorry, this image is not available at the moment.', {
status: 403,
})
// close else
}
}

} else { return new Response('Sorry, this image is not available at the moment.', {
status: 403,
})
}
}


and then to prevent other sites caching your images on cloudflare I put this in http.conf:


<If "%{HTTP_REFERER} =~ /yourdomain/">
</If>
<Else>
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif)$">
Header set Cache-Control "private, no-cache, max-age=0"
Header set Pragma "no-cache"
</FilesMatch>
</Else>


Seems to have worked all images down on the hotlinking sites :)

keyplyr

11:40 pm on Jul 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now you just need to check every single App, every Social Media site, every browser, every proxy, every company firewall, every Search Engine thumbnail, every Search Engine image search, every directory screenshot, every translation service, ad infinitum... to see if your images are missing.

3zero

11:55 pm on Jul 18, 2018 (gmt 0)



Now you just need to check every single App, every Social Media site, every browser, every proxy, every company firewall, every Search Engine thumbnail, every Search Engine image search, every directory screenshot, every translation service, ad infinitum... to see if your images are missing.


LOL your funny, why would I need to do that, this system whitelists where images are allowed to show, admittedly its not yet a comprehensive list but if they aint on the list they aint getting in. Apple, Pinterest, Twiiter, Google, Facebook and Bing isn't a bad start though....

keyplyr

11:58 pm on Jul 18, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Because, as I said up above, I have done all this. It doesn't work as effectively as you think.

3zero

1:13 am on Jul 19, 2018 (gmt 0)



You know what the next person who spams my webform I gonna write some code to eliminate all bot submissions without captcha and release it free!

3zero

12:00 am on Jul 20, 2018 (gmt 0)



UPDATE Script now updated to allow browser users that have no referer set to view images as well. The script sends a cookie to the user allowing them to view images when they visit YOUR site and then allows images to be shown even if the referer is empty. The script should now be deployed to entire site.



addEventListener('fetch', event => {
event.respondWith(fetchAndApply(event.request))
})

async function fetchAndApply(request) {
let referer = request.headers.get('Referer')
let urlreq = new URL(request.url).hostname
let url = new URL(request.url).pathname
let response = await fetch(request)

response = new Response(response.body, response)
if (url.indexOf('/IMAGES/') === -1){

console.log('cookies',url)
response.headers.set("Set-Cookie", "images=true")
return response
}
if (url.startsWith('/IMAGES/')) {


let cookies = request.headers.get('Cookie') || ''
if (cookies.includes("images=true")) {
// Its a user on your site let them through.
return fetch(request)
} else {
if (request.headers.get('user-agent')) {
if ((request.headers.get('user-agent').includes('googlebot')) || (request.headers.get('user-agent').includes('applebot')) || (request.headers.get('user-agent').includes('bingbot')) || (request.headers.get('user-agent').includes('pinterest')) || (request.headers.get('user-agent').includes('facebookfacebookexternalhit')) || (request.headers.get('user-agent').includes('facebook')) || (request.headers.get('user-agent').includes('twitter')) || (request.headers.get('user-agent').includes('GoogleImageProxy')) || (request.headers.get('user-agent').includes('yandex'))) {
return fetch(request)
} else {



if (referer) {


if ((new URL(referer).hostname !== new URL(request.url).hostname) && (new URL(request.url).hostname.indexOf('google')=== -1) && (new URL(request.url).hostname.indexOf('apple')=== -1) && (new URL(request.url).hostname.indexOf('bing')=== -1) && (new URL(request.url).hostname.indexOf('twitter')=== -1) && (new URL(request.url).hostname.indexOf('facebook')=== -1) && (new URL(request.url).hostname.indexOf('yandex')=== -1)) {
// Hosts don't match. This is a hotlink and its not a friendly search bot or social media
console.log('referer',url)
return new Response('Sorry, this image is not available at this time.... ', {

status: 403,
})

} else {
return fetch(request)

}
// no referer
} else {
console.log('referer',url)
return new Response('Sorry, this image is not available at the moment.', {
status: 403,
})
// close else
}
}

} else { return new Response('Sorry, this image is not available at the moment.', {
status: 403,
})
}
}
} else { return fetch(request)

}
}


Code to remove existing cached images on cloudflare (can be removed after 1 month) for http.conf


<If "%{HTTP_REFERER} =~ /yourdomain/">
</If>
<Else>
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif)$">
Header set Cache-Control "private, no-cache, max-age=0"
Header set Pragma "no-cache"
</FilesMatch>
</Else>

3zero

12:13 am on Jul 20, 2018 (gmt 0)



I think I have now gone as far as I can with the script. I am now working on a new script to protect forms without the use of CAPTCHA , which I will share when its working. Thanks for the feedback and suggestions.

Steven29

8:06 pm on Jul 24, 2018 (gmt 0)



I think you have something confised. How is that going to block hotlinking of your images? That is checking the referrer of images loading on the specific page with that JavaScript. What happens when the image is linked from elsewhere? Your basically checking the referrer of people on your website.

3zero

8:59 pm on Jul 25, 2018 (gmt 0)



Hello Steven29 thanks for the interest in the script. Can I suggest you read the accompanying post and the comments within the javascript..... I have highlighted the relevant part below:


if ((new URL(referer).hostname !== new URL(request.url).hostname) ......
// Hosts don't match. This is a hotlink ......
return new Response('Sorry, this image is not available at this time.... ', {
status: 403,
})


Unfortunately the other script I was writing for forms didn't work out, sorry bout that....

NickMNS

9:23 pm on Jul 25, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Where exactly will this script be placed?

If you have an image at example.com/images/image1.jpg that appears on page example.com/my-image-one-page.html and your script appears on my-image-one-page.html page. Then when someone links directly to example.com/images/image1.jpg instead of the page, your script will never be called and thus it is rendered useless.

Also, most bots do not execute JS, so the likelihood of blocking bots is low whereas the likelihood of blocking legitimate users (that do execute your script) is high.

3zero

9:33 pm on Jul 25, 2018 (gmt 0)



Hi NickMNS thanks for the interest. The script runs on cloudflare Workers as I said at the start and should be implemented for all pages in the final version. This is javascript for routing not on page, perhaps that's why people are getting confused - this link may help.

[cloudflare.com...]

Just to add because the code is neither bound to the user or browser it will stop bots, it will stop hotlinking, it will be called and only in the event of a user blocking both refers and cookies will it block images. There is also included a whitelist of User Agents and Hosts that are allowed, to prevent legit bots and social media from being blocked. This can be added to.


[edited by: not2easy at 12:34 am (utc) on Jul 26, 2018]
[edit reason] See ToS #12 [/edit]

3zero

11:58 pm on Jul 25, 2018 (gmt 0)



NickMNS you are a respected part of the forum and I have read and appreciated your posts, so having read my reply it would be appreciated that you confirm that EVERYTHING you said about the script was infact complete bollox.

For the record the amount of negativity on here means I will probably post elsewhere in future. It's ironic the only plus vote's are for unfounded criticism. I think I'm swimming with sharks here... If so I'm afraid guys the Orca's are coming and the scripts are gonna .... well you'll see !

3zero

2:28 am on Jul 26, 2018 (gmt 0)



Ok I am leaving this script for now.

Things to add:
Automate dmca requests where hotlink occurs
This 36 message thread spans 2 pages: 36