homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
How does Google really handle hotlinked images?
Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4522181 posted 12:59 pm on Nov 24, 2012 (gmt 0)

Like most I've had my fair share of images being hotlinked and I've followed the steps required to prevent hotlinking as best I can, including some great threads here on webmasterworld, but something changed recently(Nov 15th/16th ?).

I've experienced a recent surge in scraping/hotlinking since then that's forced me to shut down my sites partial rss feed. My htaccess method of blocking sites from displaying images on domains not my own is still working, but my image rankings aren't holding against scrapers.

In one case a site is producing pages filled with scraped text and images, the text includes a mashup of the top 10 search result "descriptions" under each result link in serps and the five images are, of course, the top five images in Google image results. I'd say that's pretty blatant.

The site is outranking at least three of the five results they are displaying for those images, e.g. the scraper site is credited as being the original and it does return ahead of the sites the image is hosted on. That's a 60% failure rate by Google (I know that it's my legal problem, not Google's but still).

Question: How does Google REALLY handle images that it comes across on a page?

- Any hotlink protection I employ does not change my image url from appearing in the scraped site's code. Does Google look at the image on THEIR domain to see how it renders or do they just look at the image directly from the URL provided in which case it will look fine regardless of how much hotlink protection I use.

If it matters I return an error code and scraper sites simply display nothing, I don't switch images for fear that Google assigns me with the switched version.

Displaying nothing on scraper sites isn't working for me right now in terms of retaining credit for the image, even though it's working on the scraper sites which display nothing. Ideas?

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4522181 posted 11:11 pm on Nov 24, 2012 (gmt 0)

Does Google look at the image on THEIR domain

Google doesn't look at images on anyone's domain. It gets every image in isolation.

Hotlinking is one of those areas where a human has the advantage over a robot, because we're seeing it in context. The googlebot just has a shopping list that says "pick up suchandsuch dot jpg". With rare exceptions it comes in without a referer-- and that means the picture comes through loud and clear for them, because with equally rare exceptions, hotlink routines make an exception for blank referers.

It would be interesting if Googlebot-Image made one simple procedural change: name the originating page as referer in all its visits. Now that it can tell what a picture looks like (see assorted threads from, oh, a year or so back) it should be easy to tell which one is a picture of the Eiffel Tower at dawn and which one is a four-color png blaring NO HOTLINKS. And return its results accordingly.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4522181 posted 3:24 am on Nov 25, 2012 (gmt 0)

I'm not sure that's completely right anymore lucy, in August 2011 Google stated:
We've recently launched an update to the algorithm that looks at the relevance and quality of both the webpage and the image to surface more relevant results in Google Images. Improving the quality of results in Google Images is very important to us, and we're always working hard to improve our algorithms.


Bolding mine. Google claims to be weighing the quality of the rest of the page content in determining image rank. They might be visiting image urls directly but they are at least using the page as reference for quality and context.

I used the "view site as googlebot" feature in GWT and aimed it at an image on my site that has htaccess hotlink protection on and they returned a 200 code which is good but does it return a 200 ok code even on the scrapers site despite being hotlinked since googlebot DOES visit image urls directly?

And if it does return a 200 ok code even if the site in question gets a 403 forbidden code when trying to show the image... what more can I do? I'd love some official clarification, does Googlebot see the 403 forbidden (or nasty switcheroo) at all?

edit: I just Google'd the question and one of the top 10 results is a 100% blackhat site, I'm not feeling optimistic here :)

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4522181 posted 5:25 am on Nov 25, 2012 (gmt 0)

I'd love to know an official answer to this as well. Lately I found about 5 of our competitors actually hotlinking to our images, I'm furious about it and renamed the image, I know I can block hotlinking totally, but I really don't want to block everyone.. ie google, ebay (we sell sometimes on ebay).

I'd love to know if we actually get some kind of link credit in exchange for our bandwidth.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4522181 posted 10:00 am on Nov 25, 2012 (gmt 0)

Most hotlink routines use rewriting rather than redirecting, so the image will always come through as a 200. It just won't come through as the physical picture they asked for.
I used the "view site as googlebot" feature in GWT and aimed it at an image on my site that has htaccess hotlink protection on and they returned a 200 code which is good but does it return a 200 ok code even on the scrapers site despite being hotlinked since googlebot DOES visit image urls directly?

My point was that for google there's no such thing as "on the scraper's site". On rare occasions you get the googlebot (by name) asking for an image and giving your site as referer-- but what it needs to do is give some other site as referer. Preferably some made-up name that you can't possibly have whitelisted.

Come to think of it, what do you mean by "on the scrapers site despite being hotlinked"? It's either scraped or hotlinked. It can't be both.

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4522181 posted 5:11 am on Nov 26, 2012 (gmt 0)

Well what I ment was if they scraped an entire page, leaving the html tags intact... I literally have a couple of sites that are hotlinking directly to the image on my site...

I did however start blocking them independently with .htaccess and serving up a large florescent green and black .png file with my site address on it.

RewriteCond %{HTTP_REFERER} ^http://(www\.)?example\.com(/.*)*$ [NC]
RewriteRule \.(jpeg|JPEG|jpe|JPE|jpg|JPG|gif|GIF)$ hotlink.png [L]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4522181 posted 7:06 am on Nov 26, 2012 (gmt 0)

RewriteRule \.(jpeg|JPEG|jpe|JPE|jpg|JPG|gif|GIF)$ hotlink.png [L]

=

RewriteRule \.(jpe?g|gif)$ /hotlink.png [L,NC]

Is there really such a thing as .jpe? Technically it exists, but I'm sure I've never seen one.

levo

10+ Year Member



 
Msg#: 4522181 posted 3:21 am on Feb 24, 2013 (gmt 0)

- Sitemap with images - every url in the sitemaps include images, including thumbnails.
- PuSH full atom feed with images

I think these two help associate the image to the page. Hotlink protection has nothing to do with association, Googlebot doesn't send referer.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4522181 posted 3:31 am on Mar 18, 2013 (gmt 0)

Oh, good. Was hoping this particular thread was still open, because I just found a "GoogleWHAT?!" in logs.

Unedited except for name of referring site, which is neither me nor google:

66.249.73.132 - - [17/Mar/2013:18:38:49 -0700] "GET /paintings/sparerats/blowups/largeratboxing.jpg HTTP/1.1" 200 4811 "http://example.de/_mm/rat-boxing" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

See how that works? It's the real googlebot-- from the identical IP that they've been using the last few days if not longer-- but giving a different site as referer.

This is a flat contradiction of what I just got through saying. Where "just" = four months ago.

Quick investigation in a different browser confirms that it's a hotlinking page. There's a dreary sameness to them. But thanks to a directory-specific hotlinking routine, this one didn't get my green-and-magenta NO HOTLINKS graphic. Instead they got the thumbnail version of the requested picture, weighing in at 4K instead of 33K. If the googlebot had asked in the usual way, without referer, it would have gotten the full-size version.

It would be interesting to know whether G concurrently picked up any other images associated with that page. I've met them with a referer before, but rarely for more than one or two images at a gulp. And, of course, never with someone else's site in the "referer" slot.

I don't see any recent referer-less requests for the same image. Seems like the only point of sending a referer would be to compare the with-referer version against the version they get if they ask for it "cold".

Oh, and either the offending page is brand new or they get absolutely no traffic, because their name was new to me. Heh.

Kendo

5+ Year Member



 
Msg#: 4522181 posted 3:42 am on Mar 18, 2013 (gmt 0)

Google checking an image for quality is worrisome and my imagination runs wild about how they can totally screw that up. For example I can see no difference in images at 100% quality and those at 60-70% quality, so we reduce to 70% quality as it reduces file size dramatically. So I would hate to think that some not-so-clever script is deciding that our images are poor quality and penalising us.

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4522181 posted 4:04 am on Mar 18, 2013 (gmt 0)

Lucy ..:)

Don't forget ..they are running their image search differently in Germany ( and France ) so as not to fall foul of German law..
May account for the apparent "tricksyness" with their referer..

I ( you know my IP , but you can ::sanitise:: it for here if you post the "string" ;) just ran you a test on the same image Lucy ( using Google.fr )..I get the thumbnail ( low quality image ) 126 x 88px and 4.4kb "blown up" to 504 352 px and overlayed on your page..with your page slightly greyed out ( the entire greyed out area is clickable and leads direct to your page when clicked ) but legible behind it ..original image size is indicated off to the right..

Screenshot if you want it ?

Btw..I've seen Gbot running ( clean, but with another domain in the string ) out of Germany many times in the past..Usually one of the first times Gbot indexes an newly hosted site of mine ( even if hosted in the USA ) it comes out of Germany ..it's subsequent visits mostly ( but not always ) come from the usual "stables"..

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4522181 posted 5:16 am on Mar 18, 2013 (gmt 0)

I really don't see how it can have anything to do with Image Search. Setting aside the Google IP and UA, why would a hotlinking site search for an image it's already got? I really don't see g### going to the trouble of constructing a whole fake site just to get around local legislation.

@leo: Your search comes through exactly as expected. Your IP, your UA, and the ordinary blahblah in the referer slot, same as it would in the US:

{leo's IP here} - - [17/Mar/2013:20:55:33 -0700] "GET /paintings/sparerats/blowups/largeratboxing.jpg HTTP/1.1" 200 4811 "http://www.google.fr/imgres?q=largeratboxing.jpg&{yata yata yata}" "{leo's UA here}"

I couldn't find any google.de image searches in logs, though admittedly I got bored after the first dozen or so manual searches and maybe I would have found something from 2012. I simply don't get a lot of German visitors at all. Instead I went to google.de myself-- from my local IP, of course-- and tried the same image search. Looked more or less like the google.fr version in logs, with my own IP and UA. And what a lot of entertaining results :)

Interestingly I got the old-style image search, the kind that shows an almost-full-size page preview when you click on a result. I always get this in Camino (faulty UA detection, heh) but this was in up-to-date Safari with no current g### cookies.

Also interestingly: Because it's old-style search, they didn't request the image up front. Instead they requested-- and showed-- the entire page when I clicked on the result.

But like I said, I don't think it has anything to do with image search. Except that possibly the next German who searches for filename.jpg will get that same physical file.

I get the thumbnail ( low quality image ) 126 x 88px and 4.4kb "blown up" to 504 352 px and overlayed on your page

Yup. They're using the image dimensions explicitly given in the page itself, and attaching them to the file that they've been handed, which is 1/4 the linear size. This rewrite system works nicely except when the hotlinker takes that extra step of spelling out an image size and they don't already have it in their cache (as google search generally does, though not here).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved