Forum Moderators: phranque

Message Too Old, No Replies

Is this hotlinking or.?

         

Lorel

4:52 pm on Jun 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've noticed that a client site started getting a lot of traffic so did some checking and there are 3 sites (appear to be search sites) all 3 mirrors of each other, that are linking to one of my client's images. I set up code to prevent hotlinking of images but it hasn't stopped. Did I set up this right?

RewriteCond %{HTTP_REFERER} !^http://(.+\.)?example\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png)$ - [F]


Also can someone tell me if this is indeed hotlinking or what exactly are they doing. The 2nd link below is a thumbnail of the image on their site which is linked to the larger image on the client's site.

There is so much traffic I really doubt people are clicking on the image on the other site and the result on the client's site meter shows up as Unknown so they aren't using the link to get to the site and thus I suspect it is a hot link to the image.

<td align="left" class="imageresulttd" width="20%"><a
href="http://www.example.com/kitchen-dining.html"><img class="resultimage"
src="SCRAPERSITE.com/nimage/97f07ca8ad4741fa" border="0"></a><br>
<span class="imagetitle"><b>kitchen</b>1 jpg</span>
<br>
<span class="imagedetails">407x480&nbsp;-&nbsp;24.51K&nbsp;-&nbsp;jpeg</span>
<br>
<span class="imagedomain">www.example.com</span> <br>
<span class="imagepreview">[&nbsp;<a class="imagepreviewlink" rel="lightbox['result']" title="Image Results" href="http://www.example.com/images/kitchen1.jpg">View full size</a>&nbsp;]</span>
</td>

jdMorgan

6:16 pm on Jun 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That anti-hotlinking code should work, if not very efficiently.

RewriteCond %{HTTP_REFERER} ^.
RewriteCond %{HTTP_REFERER} !^http://([^./]+\.)*example\.com [NC]
RewriteRule \.(jpe?g|gif|bmp|png)$ - [F]

would be more efficient.

You really need to look at your raw server access logs to find out what's going on here. It is likely that the requests are being made with no referrer, and therefore are "allowed" by your code (as they must be).

In cases like this, it's often useful to look at large numbers of the hotlinked-image requests in your raw server access logs to see if there is any 'pattern' to the requesting IP addresses. If they all come from one ISP or hosting company, for example, you could block them by IP address range.

There are several other useful methods, but selecting the right ones depends on exactly what the problem is...

Jim

Lorel

11:29 pm on Jun 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim, Thanks for the new and improved code.

re checking the raw server, I'm on a Mac and have looked several times for a program to do this that didn't cost too much, with no avail. Can you recommend one?

wilderness

12:58 am on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



any text file software that allows a "no wrap" or "line breaks" will work for viewing raw log files.

Hard to believe that Mac's don't come standard with a text file software?

Sgt_Kickaxe

1:01 am on Jun 9, 2010 (gmt 0)



It works, the site you're describing first tries to hotlink your image but if it fails it hotlinks google's cache of your image.

Lorel

3:47 pm on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@wilderness:
Sorry, what I meant was a program to analyze the log files. I'm able to pick up the log files with any text editor.

There have been 64 visitors to the site already this morning all from the same IP and picking up different images on the site (plus the css file) but all hitting one page on the site. They hit the site at about 30 per one minute, then they do it again in 25 min with 30 more. It looks like they are getting 404 errors due to the anti-hotlinking code but it's still registering as traffic on the sitemeter.

Re getting the images from a brower's cache they are using Yahoo's cache of the image:

***.50.204.164 - - [09/Jun/2010:08:26:32 -0600] "GET /kitchen-dining.html HTTP/1.1" 200 3465 "http://images.search.yahoo.com/images/view?back=http%3A%2F%2Fimages.search.yahoo.com%2Fsearch%2Fimages%3Fp%3Dkitchen%26ni%3D21%26fr2%3Dxpl&w=480&h=407&imgurl=www.EXAMPLE.com%2Fimages%2Fkitchen1.jpg&rurl=http%3A%2F%2Fwww.EXAMPLE.com%2Fkitchen-dining.html&size=25k&name=kitchen1+jpg&p=kitchen&oid=97f07ca8ad4741fa&fr2=xpl&no=5&tt=20621139&ni=21&sigr=11hbhr0dj&sigi=11avlbfb2&sigb=1249tdra5" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"

Here is another that doesn't go after the Yahoo cached image:

***.50.204.164 - - [09/Jun/2010:08:26:33 -0600] "GET /images/kkclogo3.gif HTTP/1.1" 404 793 "http://www.example.com/kitchen-dining.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)"

I searched for the IP address and it produces an error in the browser. I checked it in a server header checker and it says it timed out although it only took a second for the result.

So it won't do any good to block it in Htaccess. Correct?

I assume I can't post the IP address here so I blocked part of it out.

Can someone explain what this site is doing? Are they just trying to produce so much traffic it blocks traffic to my client's site or is it something more sinister?

jdMorgan

5:04 pm on Jun 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Who can know? They're stealing your images and server bandwidth for fun and/or profit... In many cases, it's not worth the bother to worry about it -- Just 403 the requests and move on to more-productive work.

Go ahead and block the IP address or the IP address range and test the results. If you cannot get better info on the IP address(es), then try a different lookup site -- ARIN.net for the US, and they provide links for many others --RIPE, APNIC, etc.

No matter what you do in htaccess or server config files, these requests will always show in your log files and "site meter." The only way to keep them out of your logs is to use custom logging to exclude them (not a good idea, because it will 'hide' all exploits) or to block them by IP address using a firewall.

We don't like to have IP addresses posted here for security reasons (the host might be malicious) and for legal reasons. However, if you obscure only the final digits, we can at least get an idea of the country and hosting or ISP company it keeps...

Some webmasters block entire hosting company ranges, or even entire countries. In the first case, why should another server make requests to your server unless it's a search engine or a smart-firewall security scanner? And secondly, if you market country- or region-specific goods, then losing a tiny bit of legitimate traffic from other countries or regions may be acceptable if most of the traffic from those areas is malicious.

Similarly, some webmasters allow image-bots from Google, etc. to fetch only low-quality versions of images from our sites. Or we serve watermarked images to search engines, with our URLs plastered the whole way across the image -- diagonally is good, so it can't be trimmed-off easily. The former can be handled by allowing only thumbnail fetches in robots.txt, while the latter is "IP-address-or-user-agent-based content selection" -- or "protective cloaking," if you prefer.

Really, it depends on whether you are trying to protect the rights to your images, prevent server bandwidth wasted on hot-linkers, or both.

Jim

Lorel

5:10 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I blocked the 2 IP addresses yesterday which showed up in the log using Yahoo's image cache and receiving a 404 from the htaccess file which tells me they were hot linking to the images. However now there is a whole new set of IPs doing the same thing.

I tried Arin.net to see which company owns the IPs but no info was available. I also tried Dnsstuff and still nothing available.

I found 3 sites that are doing the hotlinking by searching for links to the affected page and checking the sites (3 mirrors of each other) but when checking their domain for Reverse IP in DNS stuff their IPs are not available there either.

the format for all 3 is the following:
http://example.com/search/images?search=kitchen&type=images

Anyone have another idea on how to block those sites?

One of the numbers is very close to a Google ip so I suspect they are using a proxy for some of the searches.

Lorel

5:15 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PS. I'm trying to prevent server bandwidth wasted and hot linkers. The images aren't that important.

wilderness

5:52 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone have another idea on how to block those sites?

One of the numbers is very close to a Google ip so I suspect they are using a proxy for some of the searches.


You may provide the three IP's here, HOWEVER obscure the Class D numbers, (which will get others in the ballpark, and allow them to assist you in recognition).
EXample:
123.456.789.zzz

Lorel

6:22 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You may provide the three IP's here


Ok, thanks;

59.103.193.***
86.96.227.***
74.78.164.***

wilderness

6:34 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your going to need to disregard these IP ranges and focus upon either the User Agent, page/image requested, or refer, to prevent access.

None of the three IP ranges you've provided are used by the major search engines.

jdMorgan

10:38 pm on Jun 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



59.103.193.*** Pakistan
86.96.227.*** United Arab Emirates
74.78.164.*** RoadRunner (New York State)

I suspect that you're not using the DNS tools properly.

An rDNS lookup --a reverse-DNS lookup-- takes an input IP address and tells you the ISP, hosting company, or specific hostname (loosely, the domain name) -- sometimes the latter is not available, or available only by using the ISP/host's rwhois servers as given in the results.

A forward DNS lookup takes an input hostname and returns the IP adress(es) assigned to it.

If you use ARIN, for example, it will either give you the ISP, hosting company, or hostname, or it will give you a message that says, in essence, "That is not a North American address. Try the European lookup service at RIPE.net." Or it names the Asian lookup at APNIC.net, etc. You have to use those others if ARIN says it doesn't have the info.

I'm looking at your reports so far, and I have to ask, if you get an image referral from
http://images.search.yahoo.com/images/view?back=http...
is that referral not being blocked/403'ed by the code I posted above?

If not, then there's a problem with the .htaccess file, the .htaccess file's location relative to your image directory, or the code snippet's position in that .htaccess file.

Your log entry above shows a 200-OK response for one of these referred requests, but the request is for an HTML page not an image, and so is not affected by anti-hotlinking code I posted above.

Jim

Lorel

4:54 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Those 3 IPs posted above are not the ones that I blocked in htaccess the other day. The two I blocked did not show up in the log the next day so I "assumed" the block was working. I haven't blocked any more yet as I want to make sure I'm doing it right.

Here are the two I blocked the other day:
60.50.204.***
162.119.238.***

They are still not showing up in the log today, nor the 3 listed above. So it looks like every visitor is unique, although several of the requests are for the image file. This is how I recognized something was afoot because the site meter I use indicates that all unknown visitors are going directly to the kitchen page, at which time I searched for links to that page and found those 3 sites mirroring each other linking to that image. I tried getting the IP of those 3 sites but as I explained yesterday they weren't available. I'll try your suggestions above for those.

Here is one from the log this morning seeking the image and getting a 404:

206.130.173.** - - [11/Jun/2010:07:50:30 -0600] "GET /images/kitchen1.jpg HTTP/1.1" 404 793 "http://ca.images.search.yahoo.com/images/view?back=http%3A%2F%2Fca.images.search.yahoo.com%2Fsearch%2Fimages%3Fp%3Dkitchen%26rs%3D0%26ei%3DUTF-8%26fr%3Dyfp-t-715%26fr2%3Dtab-web&w=480&h=407&imgurl=www.EXAMPLE.com%2Fimages%2Fkitchen1.jpg&rurl=http%3A%2F%2Fwww.EXAMPLE.com%2Fkitchen-dining.html&size=25k&name=kitchen1+jpg&p=kitchen&oid=97f07ca8ad4741fa&fr2=tab-web&no=5&tt=21004689&sigr=11hbhr0dj&sigi=11avlbfb2&sigb=130beefge&type=JPG" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

jdMorgan

5:04 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All requests will show up in raw access the log if they reach your server.

If a request is blocked, it will show a 403-Forbiddden response.

If a request is not blocked and the requested resource exists, then it will show a 200-OK, a 304-Not Modified, a 206-Partial Content response. If the requested resource does not exist, it will show a 404-Not Found.

Your log entry above shows a 404. If the resurce exists, then this is an error, because it should have been blocked (403). You need to follow-up on this and find out why a 404 was returned instead of the proper 403. The most likely cause (assuming that /images/kitchen1.jpg does actually exist) is that you have defined a custom 404 error page in .htaccess or by using your control panel, and that custom 404 error page does not exist. This puts your server into a loop from which it recovers only after several/many iterations, essentially making the cure worse than the disease...

The details are very important here. All requests get logged, and the server response code is critical.

Jim

Lorel

5:23 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I used the APNIC.net lookup and all 3 of the domains that are linking to those images are located in texas using different IPs. I blocked them in htaccess but didn't block the whole range as one was a well known host. We'll see how that goes.

The reason I was confused re using Arin.net is because it didn't say what you mentioned above. However I understand now that I have to try different country specific lookups.

it looks like I was previously blocking the individual visitors to the site instead of the sites linking to the image page.

Thanks

Lorel

5:49 pm on Jun 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry, I was confused between a 403 and a 404.

The images are loading just fine on the website.

I do have a custom 404 page and it's working when I missspell the name of that page.

Here is the relevant line in the htaccess for the custom 404:
ErrorDocument 404 /missing.html

Is this in error or should it be placed somewhere besides the root.

jdMorgan

12:41 am on Jun 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's fine. My question is why did this (from your post above) go 404?

206.130.173.** - - [11/Jun/2010:07:50:30 -0600] "GET /images/kitchen1.jpg HTTP/1.1" 404 793 "http://ca.images.search.yahoo.com/images/view?back=http...

Jim

Lorel

1:52 pm on Jun 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have no idea what's going on. When I posted a message last week this was happending to an image on the kitchen page. Today there have been several visitors and they are allllll landing on a bedroom page, which is strange.

It appears that any time the request is coming via Google or Yahoo, it's a 404 for just one image on the page (there are 4 other image on the page). Yet I just searched for that page in Google, and clicked on the cache and the image comes up just fine. There is no robot's txt stopping it.

When other people click on that page and don't come via Google, or Yahoo, the image on that page shows a 200.

I have this pop out of frames code on the page. Could this cause a 404 somehow?

<script type="text/javascript">
if(top!= self) top.location.href = self.location.href
</script>

Lorel

1:59 pm on Jun 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



PS. I just checked the site meter. All the visitors are unknown and all from Asia, i.e., they are typing in the URL which tells me they are seeing this site/page somewhere that doesn't have a direct link and all landing on the same page.

I'm wondering if the page has been scraped. I have full urls to all the pages but not the images so that doesn't make sense as sometimes the images load just fine.

jdMorgan

1:54 am on Jun 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No progress can be made here if you cannot find out why an image request gets a 404-Not Found one time and a 200-OK the next. I'd suggest removing your hotlinking code to see if that makes the images always return a 200-OK. If so, then your 403-Forbidden response-handling is broken, and is actually returning a 404 under certain circumstances (which is wrong/bad). If not, then you may have a serious problem with your server.

This means you'll have to ignore the hotlinking issue for now, and get your server working properly. All details of where the IPs are located or how they are fetching from your site are, unfortunately, irrelevant at this time.

Jim