Forum Moderators: DixonJones
The raw log files also show these accesses from the domain and sub-domain (e.g., "www.evildomain.com/evilsubdomain/evilpage.html"). But (if I'm reading the records correctly) it shows a variety of different DNS source numbers (the DNS numbers at the end of the record), so I'm not sure what that's about. I've looked up the domain on WHOIS and it shows two server names but no DNS. How is that possible? If I could find a DNS I'd just use that.
I just added code in my .htaccess today to block that domain specifically from accessing graphics files, but I've had problems in the past trying to prevent hotlinking (too much or too little of a good thing) so I'm wary of it.
I'm also puzzled that the first try didn't keep them out. (See the sample code below.) Can anyone see where I went wrong and how these guys are still apparently getting through? Is it because the request is coming from a subdomain and I've only included the domain? Should I be including the subdomain in the code?
Here's the equivalent of the code I used in my first attempt. The second line is where I'm also blocking another DNS, and then I've put the hotlinking domain name in there:
order deny,allow
deny from 999.999.99.
deny from .evildomain.com
allow from all
Thanks for any help about this!
Starhugger
The accesses come from the end-user's web browser, not from the web site that is hot-linking to you.
The reason you know which site is hot-linking is that the browser is sending you a "referrer" field which tells you which site linked to your images. But blocking the referrer is going to do nothing for you. The DNS issues with the referring site, then, are moot.
There are solutions to this, which hopefully others can help you with. The solution is not, however, blocking the referrer.
You need something (most likely an Apache module) that will sit in front of all of your pages and screen for the offensive referrer. It will then either return a 404 error or a substitute image stating that the referrer has hot-linked to your images without permission.
See if the following set-up works a little better:
RewriteEngine on
RewriteCond %{HTTP_REFERER}!^$ RewriteCond %{HTTP_REFERER}!^72\.18\.130\.37.*$ [NC] RewriteCond %{HTTP_REFERER}!^64\.233\.161\.104.*$ [NC] RewriteCond %{HTTP_REFERER}!^64\.233\.167\.104.*$ [NC] RewriteCond %{HTTP_REFERER}!^http://okaytranslationsite.com/.*$ [NC] RewriteCond %{HTTP_REFERER}!^yourdomain.com.*$ [NC] RewriteCond %{HTTP_REFERER}!^friendlydomain.com.*$ [NC] RewriteCond %{HTTP_REFERER}!^forumyouposton.com.*$ [NC] RewriteRule .*\.(jpg¦gif)$ /hotlink.png [R,NC]
Of course, you'll want to replace the numeric IPs (for search-engine caches) and the faked URLs with the domain names and IPs that apply to your situation.
You can find lots of other information by searching for ".htaccess hotlink" (without the quote marks).
Eliz.
jtara wrote: The accesses come from the end-user's web browser, not from the web site that is hot-linking to you.
Oh okay, so that must be the different DNS's I'm seeing then. That would make sense that they show the origin of the viewer who accesses the hotlinking site.
I wonder why WHOIS doesn't show a DNS for that website though? I find that very strange.
Eliz/Stapel wrote: You're not wanting to block people from evildomain.com from accessing your site (that is, from following links to your site). You're wanting to block images from being served from domains other than what you specify.
Yes, that makes sense, and I think I see the subtle difference now. Actually, if "evilpeople" had bothered to include a link to my site along with the graphic, I might not mind so much. It's quite bizarre actually; the graphic they chose is one that's way in the basement of my site's directory system and is just a very vanilla navigation bar image that has nothing to do with their site topic. (?!) Weird! But I guess spam sites aren't trying to follow any logic other than making viewers bored enough to click on their ads.
Thanks for the sample code. I tried a similar code a few months back and found that the translation sites couldn't access my site either, and I don't want to block them. But I don't know how to track them all down and ensure that they all have access through the .htaccess, and I really don't have the time to research it right now. So I took the code off again.
I was trying to block specific domains/DNS's that have pages that hotlink to my graphics, instead of "everyone-except." I was under the impression that I could do that...? I'm sure I found a site recently that showed how to block GET requests from specific domains, but I couldn't find it in my links.
That other DNS that I'm blocking (in the sample code I posted) is a rather obnoxious bot that's been making the rounds lately. So far I haven't seen it trying to access my site since I put up the code. But it sounds like it's a totally different situation between a bot trying to crawl my site versus a site trying to serve my files remotely.
Starhugger
I nip the problem of hotlinked images in the bud by banning the various crawlers from indexing images via robots.txt. Firstly I ban my /images/ folder and then I ban the dedicated image/multimedia search bots so it looks like this (in part):
User-agent: *
Disallow: /images/
User-agent: ConveraMultiMediaCrawler
User-agent: Googlebot-Image
User-agent: psbot
User-agent: Web.Image.Collector
User-agent: Yahoo-MMCrawler
Disallow: /
You might need to submit your robots.txt to Google's removal service, or it could take a very long time for your graphics to expire from their index: [services.google.com:8882...]
Then, to deter those who found our images via visiting the website by normal means, I have placed anti-hotlinking mod_rewrite rules in the images directory, a little bit similar to stapel's (above). Specifically allowing translation services and SE caches.
starhugger said: I tried a similar code a few months back and found that the translation sites couldn't access my site either, and I don't want to block them.
RewriteCond %{HTTP_REFERER}!^http://babelfish.altavista.com/.*$ [NC] RewriteCond %{HTTP_REFERER}!^http://translate.google.com/.*$ [NC]
Then their users will be able to see the images in the translated page.
The same deal works with search-engine caches. If you don't mind being viewed in the caches (which often store the HTML but not the graphics), then put those IPs in the "exceptions" list.
And since legitimate sites (such as Google and Babelfish) will probably have more users, and more consistent IPs, than the spammer-scammer-scraper sites, it should be a lot easy to make "exceptions" for the "good guys" than to try to hunt down all the bad guys. At least, this has been my experience.
By the way, with respect to search-engine image-caches: People seem to think that, if it's in the image-cache, it's "fair game". I have had to block the image-bots in order to cut down on the plagiarism. If your images are custom-designed, you might want to do the same.
Eliz.