Forum Moderators: phranque

Message Too Old, No Replies

.htaccess blacklist certain domains

         

burcot

2:31 am on Sep 20, 2009 (gmt 0)

10+ Year Member



Hi All,

Recently i've seen an increase in the amount of times my landing pages are stolen. This is something i don't actually mind, but the amount of people who don't even remove my tracking code is scary. This is really messing up my stats. As the majority are behind whois guard, contacting them is impossible, so i'm currently using this .htaccess code to prevent their sites from being able to hotlink to my server.

## SITE REFERRER BANNING
RewriteEngine on
# Options +FollowSymlinks

RewriteCond %{HTTP_REFERER} domain1\.com [NC,OR]
RewriteCond %{HTTP_REFERER} domain2\.com [NC]
RewriteRule .* - [F]

This works pretty nicely as the server will send back a 404, but i was thinking... we could make this a little more fun, and maybe teach the theifs to remove the the tracking code if they can't think up their own landing pages.

I'm trying to figure out how to send back an image, or have a bunch of javascript snow fall down their sites, or a javascript alert, as they are linking to a javascript file on my server - that would teach them a lesson :)

Any ideas how i would be able to do this via htaccess?

Thanks
Lee

jdMorgan

4:52 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That code doesn't return a 404, it returns a 403-Forbidden response, which is the proper response.

Playing games past that is a waste of your time and a waste of internet bandwidth.

Also remember who suffers -- Not the hotlinker, but the people who visit the hotlinker's site. And in most cases, they're utterly blameless, so why pick on them or confuse them?

We deal in protocols, markup languages, Web standards, and coding. Playing jokes on likely-innocent Web users has no place here. Your time will be better spent doing something constructive, such as blocking the servers that are stealing your pages and causing the better part of this problem in the first place.

Jim

burcot

8:39 am on Sep 20, 2009 (gmt 0)

10+ Year Member



Thanks for the feedback jim, and you're right. Mostly, when my landing pages are stolen they change the content and products to a more negative approach. So it would be nice to warn these users with an alert, but i see where your coming from.

Cheers,

Lee

jdMorgan

2:52 pm on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you can identify the servers used to scrape your content, then you can block them using mod_access or by rules similar to those anti-hotlinking rules above, but testing REMOTE_ADDR and/or REMOTE_HOST (the latter only if supported by your server config and if you're willing to accept the performance implications).

The main point is to fix that problem at the earliest opportunity, instead of chasing the resulting hotlinking problem. You can however, use the hotlink accesses to identify the referring domains, look them up to get their IP addresses (or IP address ranges) and block those. This does assume that the same server used to serve your content is used to scrape your content, but this is likely the case because scrapers tend to be lazy (and sloppy), and they do things the easiest way possible. Because of this, they also tend to pick the low-hanging fruit, so making your site harder to scrape will likely prevent many problems in the future.

Look into access control using both whitelists and blacklists of both user-agents and IP addresses. Look at HTTP header validation. Look at bot-traps based on robots.txt violations and on frequency/speed-of-access. Using all of these techniques can head off many scraper-related problems from the start.

The problem with serving alternate images is that it tells the hotlinking Webmaster that you've caught him.

That can be good if he decides to pack it in and go pick on someone else, but it can also be bad if he's determined and simply decides to copy all of your images as well, in order to avoid your anti-hotlinking methods.

When dealing with on-line nuisances and criminals, remember that knowledge is power. Don't empower your enemies. Tell them nothing (In some cases, I have intentionally returned false server response and status codes that make it look like my site is badly-broken, just to mislead and get rid of unwelcome 'visitors').

Jim