lucy24 - 12:18 am on Jan 10, 2012 (gmt 0) [edited by: tedster at 12:24 am (utc) on Jan 10, 2012]
Background: There are a couple of earlier threads touching on this subject:
Googlebot getting images with html ? [webmasterworld.com]
Googlebot with Referer [webmasterworld.com]
but neither of them got to anything decisive, so I wanted to bring it up again. Besides, both threads ran about five minutes before I started reading the Forum (early April 2011); I had to search to make sure it wasn't another of those things everyone but me has always known.
* * *
I discovered this phenomenon while-- stop me if you've heard this one-- looking up something else. The Googlebot has got a sideline in image-harvesting... and it's doing it with the html page as referer, exactly like a human. The regular Googlebot, by that name, from a regular Google IP-- so far, always 66.249. The detour can come smack in the middle of a string of regular Googlebot hits. And then it carries on as if nothing out of the ordinary had happened.
There's no absolute pattern, but two behaviors I see pretty often. One is when a brand-new image or stylesheet has been added to a pre-existing page; it's as if the Googlebot decides on its own initiative to grab it quick before it gets roboted out. The other is when I've got a cluster of thumbnails on a single page (a pattern that one of those earlier post'ers also hinted at). Googlebot will then scoop up every last one of them at a rate of up to 2-3 files per second. (This is faster than their usual pace on my site.) But there's no rhyme or reason to which image pickups send a referer; sometimes the "referer" switch is turned off or on halfway through the visit.
* * *
Postscript: While re-checking, I found this masked bandit trying to blend in with the scenery.
67.221.235.nn - - [26/Dec/2011:15:55:56 -0800] "GET /games/LucysDownloads.html HTTP/1.1" 200 11941 "http://www.example.com/games/LucysDownloads.html" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
Far as I can tell, it was a one-off. Never seen the IP before, but the auto-referer is a dead giveaway.
[edit reason] Added the titles for the two threads [/edit]
[edited by: tedster at 12:24 am (utc) on Jan 10, 2012]