We've blocked many Google "features" for years;
feed bots etc.
It appears to be only a matter of time before we block all G bots as they sink in value and trustworthiness.
Appreciate you waking us up Bill.
I've actually used Google translate to infiltrate sites so-called protection that was designed to avoid prying eyes from seeing their cloaking activities. People think they're being clever and secure but nothing short of a full round-trip DNS check can be perfect.
See the original blog post where Matt Cutts explained how to do it:
Also, witness these recent threads about Google IPs:
Google? Is that you?
Google Test-Bot: Google-Test2
Bing, Ask, Yandex and all the rest you might allow need to be validated as tightly as possible too.
Bill I'm blocking GoogleImageProxy, are you?
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:22.214.171.124) Gecko/2009021910 Firefox/3.0.7 (via ggpht.com GoogleImageProxy)
Been seeing this UA scrape images from robots.txt disallowed directories. The same images also fly "X-Robots-Tag: noindex" in the response header, yet these images are showing up in Google's Image Search. So far my demands for them to be removed have been ignored.
Interesting find, GoogleImageProxy, from those awfully helpful people at Mountain View;
"If you can not directly access image links or the loading is slowly, this script will rewrite the image links to googleusercontent.com proxy address. With Google server you can display image normally and load faster."
KeyP: Better late than never: What are your belt and braces blocks for this nasty?
|What are your belt and braces blocks for this nasty? |
I'm not blocking any G ranges, only whitelisting header & UA.
|If you can not directly access image links or the loading is slow, this script will rewrite the image links to googleusercontent.com proxy address. With Google server you can display image normally and load faster. |
I'm trying to read that as something other than "Look! A useful new alternative to hotlinking!"
Here's another one I'm watching:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Google Publisher Plugin; Googlebot/2.1) Chrome/27.0.1453 Safari/537.36"
|Bill I'm blocking GoogleImageProxy, are you? |
Since I whitelist the answer to any question of that form is always: YES!
KeyP: "...only whitelisting header & UA."
Presumably they have to match the desirable G traffic?
iBill: Difficult to share your method in a public forum without wrecking it, but at least give us a lead-in.
It is Christmas :)
whitelist Bill Mar 2006 [webmasterworld.com]
Whitelist Jim Nov 2006 [webmasterworld.com]
Only allow access from Google-IP addresses where the accessing host also presents a UA that identifies it as Googlebot or Googlebot Mobile. :)