Forum Moderators: open

Message Too Old, No Replies

GoogleImageProxy

         

keyplyr

2:38 am on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)
Protocol: HTTP/1.1
Robots.txt:
Host: google.com
64.233.160.0 - 64.233.191.255
64.233.160.0/19

This is the bot that retrieves images for many of Google's resources including Google Places, Google Plus, Google Search, etc.

Note: ggpht.com is not accessible

lucy24

3:37 am on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I spent some time poring over logs, headers and htaccess before figuring out that the “Firefox/11” is what done them in. Why, oh why must law-abiding operators do this kind of thing? (You may remember that until a few years ago, the faviconbot called itself Firefox/6.) Unless you're fetching PNGs with an alpha channel and calling yourself MSIE <=6, what possible difference can it make?

keyplyr

4:03 am on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The very 1st file this bot requested was favicon.ico.

Firefox/11 run from a server may look like that, but it's only 32 bits. No way for me to check headers right now.

what possible difference can it make?
Googlebot crawls and feeds the search index. Other bots belonging to Google do different things. This bot also collects images used at YouTube, Google Analytics, ad networks & possibly CDNs.

ggpht.com is a domain controlled by four nameservers at google.com. All of them are on different IP networks... servers in United States circa Dec 15, 2008

lucy24

5:59 am on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I meant, more narrowly: why does a UA string have to contain elements that are all too likely to get it blocked? It could so easily just replace “Googlebot” with “GoogleImageProxy” and keep everything else the same. In fact, that’s just what they do with Google Web Preview and Google Search Console: the two UA strings are otherwise identical.

keyplyr

6:15 am on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So when they build their UA string, you think they say... "I wonder if lucy24 or keyplyr think this is OK?"

I guess it *is* somebody's responsibility to decide that stuff. Maybe you should apply for the job. You'd have to relocate to Mountain View, but I think you get to wear rollerblades at work.

lucy24

5:49 pm on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Har har. Yuk yuk.

Seriously, you would think that people in the business would know that calling yourself {browser that hasn't been used by humans since 1997} is a good way to get blocked.

A year or two back, I came upon the best job title in the entire universe: Staff Linguist at Google. (Really. I don't know if there's just one of him, or if it's an entire department.) So, yeah, there must also be someone whose job it is to assign User-Agent names.

keyplyr

5:58 pm on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Seriously, you would think that people in the business would know that calling yourself {browser that hasn't been used by humans since 1997} is a good way to get blocked.
By the same thinking, it's amazing how many stick "Googlebot" in their string thinking it will enable better access.

lucy24

7:18 pm on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



USER_AGENT Googlebot
REMOTE_IP not-Google's-crawl-range
[F]

As it were.

keyplyr

7:29 pm on Sep 1, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nah... I don't block all those faking Googlebot. Some do get blocked, but some others are beneficial to me even though they do stupid stuff.