Forum Moderators: phranque
Recently i've seen an increase in the amount of times my landing pages are stolen. This is something i don't actually mind, but the amount of people who don't even remove my tracking code is scary. This is really messing up my stats. As the majority are behind whois guard, contacting them is impossible, so i'm currently using this .htaccess code to prevent their sites from being able to hotlink to my server.
## SITE REFERRER BANNING
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} domain1\.com [NC,OR]
RewriteCond %{HTTP_REFERER} domain2\.com [NC]
RewriteRule .* - [F]
This works pretty nicely as the server will send back a 404, but i was thinking... we could make this a little more fun, and maybe teach the theifs to remove the the tracking code if they can't think up their own landing pages.
I'm trying to figure out how to send back an image, or have a bunch of javascript snow fall down their sites, or a javascript alert, as they are linking to a javascript file on my server - that would teach them a lesson :)
Any ideas how i would be able to do this via htaccess?
Thanks
Lee
Playing games past that is a waste of your time and a waste of internet bandwidth.
Also remember who suffers -- Not the hotlinker, but the people who visit the hotlinker's site. And in most cases, they're utterly blameless, so why pick on them or confuse them?
We deal in protocols, markup languages, Web standards, and coding. Playing jokes on likely-innocent Web users has no place here. Your time will be better spent doing something constructive, such as blocking the servers that are stealing your pages and causing the better part of this problem in the first place.
Jim
The main point is to fix that problem at the earliest opportunity, instead of chasing the resulting hotlinking problem. You can however, use the hotlink accesses to identify the referring domains, look them up to get their IP addresses (or IP address ranges) and block those. This does assume that the same server used to serve your content is used to scrape your content, but this is likely the case because scrapers tend to be lazy (and sloppy), and they do things the easiest way possible. Because of this, they also tend to pick the low-hanging fruit, so making your site harder to scrape will likely prevent many problems in the future.
Look into access control using both whitelists and blacklists of both user-agents and IP addresses. Look at HTTP header validation. Look at bot-traps based on robots.txt violations and on frequency/speed-of-access. Using all of these techniques can head off many scraper-related problems from the start.
The problem with serving alternate images is that it tells the hotlinking Webmaster that you've caught him.
That can be good if he decides to pack it in and go pick on someone else, but it can also be bad if he's determined and simply decides to copy all of your images as well, in order to avoid your anti-hotlinking methods.
When dealing with on-line nuisances and criminals, remember that knowledge is power. Don't empower your enemies. Tell them nothing (In some cases, I have intentionally returned false server response and status codes that make it look like my site is badly-broken, just to mislead and get rid of unwelcome 'visitors').
Jim