Welcome to WebmasterWorld Guest from 23.22.46.195

Forum Moderators: Ocean10000 & incrediBILL

Google IPs Cannot Be Trusted

Only Googlebot With Round-Trip DNS Validation

   
10:58 am on Dec 18, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I know many of you blindly accept Google's IP ranges as trustworthy and this is hardly the case. If you're allowing anything with a Google IP range global access then all you're doing is allowing a certain subset of scrapers carte blanche access because of their IP.

The Google IP range hosts all sorts of tools that be used for nefarious purposes including:

  • Google Wireless Transcoder
  • Google Translator
  • Google Engine

Luckily the Google Engine forces all requests to have an "AppEngine-Google” prefix that can be easily filtered.

Plus, I've seen the old proxy hijacking, which I thought that much like polio and had been eliminated, rear it's ugly head once again. The only wayt to stop this is to verify Googlebot is only crawling from it's valid IP addresses.

Full trip Googlebot validation is a must have front line defense, use it!

[edited by: incrediBILL at 6:31 pm (utc) on Dec 18, 2013]

11:12 am on Dec 18, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We've blocked many Google "features" for years;
translate,
prefetch,
preview,
feed bots etc.

It appears to be only a matter of time before we block all G bots as they sink in value and trustworthiness.

Appreciate you waking us up Bill.
8:51 am on Dec 19, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've actually used Google translate to infiltrate sites so-called protection that was designed to avoid prying eyes from seeing their cloaking activities. People think they're being clever and secure but nothing short of a full round-trip DNS check can be perfect.

See the original blog post where Matt Cutts explained how to do it:
[googlewebmastercentral.blogspot.com...]

Also, witness these recent threads about Google IPs:

Google? Is that you?
[webmasterworld.com...]

Google Test-Bot: Google-Test2
[webmasterworld.com...]

Google Translate
[webmasterworld.com...]

Bing, Ask, Yandex and all the rest you might allow need to be validated as tightly as possible too.
9:50 am on Dec 19, 2013 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Bill I'm blocking GoogleImageProxy, are you?

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7 (via ggpht.com GoogleImageProxy)


Been seeing this UA scrape images from robots.txt disallowed directories. The same images also fly "X-Robots-Tag: noindex" in the response header, yet these images are showing up in Google's Image Search. So far my demands for them to be removed have been ignored.
10:57 am on Dec 19, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting find, GoogleImageProxy, from those awfully helpful people at Mountain View;

"If you can not directly access image links or the loading is slowly, this script will rewrite the image links to googleusercontent.com proxy address. With Google server you can display image normally and load faster."

KeyP: Better late than never: What are your belt and braces blocks for this nasty?
7:11 pm on Dec 19, 2013 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





What are your belt and braces blocks for this nasty?


I'm not blocking any G ranges, only whitelisting header & UA.
8:41 pm on Dec 19, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



If you can not directly access image links or the loading is slow, this script will rewrite the image links to googleusercontent.com proxy address. With Google server you can display image normally and load faster.

I'm trying to read that as something other than "Look! A useful new alternative to hotlinking!"
9:02 pm on Dec 19, 2013 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Here's another one I'm watching:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Google Publisher Plugin; Googlebot/2.1) Chrome/27.0.1453 Safari/537.36"
5:51 am on Dec 20, 2013 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Bill I'm blocking GoogleImageProxy, are you?


Since I whitelist the answer to any question of that form is always: YES!
12:11 pm on Dec 23, 2013 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



KeyP: "...only whitelisting header & UA."

Presumably they have to match the desirable G traffic?

iBill: Difficult to share your method in a public forum without wrecking it, but at least give us a lead-in.

It is Christmas :)
2:53 pm on Dec 23, 2013 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



whitelist Bill Mar 2006 [webmasterworld.com]

Whitelist Jim Nov 2006 [webmasterworld.com]
10:55 am on Jan 31, 2014 (gmt 0)



Only allow access from Google-IP addresses where the accessing host also presents a UA that identifies it as Googlebot or Googlebot Mobile. :)
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month