homepage Welcome to WebmasterWorld Guest from 50.19.169.37
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google? Is that you?
lucy24




msg:4625732
 12:25 am on Nov 25, 2013 (gmt 0)

Does anyone know for sure what this is?

66.249.83.176 - - [23/Nov/2013:07:38:12 -0800] "GET / HTTP/1.1" 200 1867 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0"
66.249.83.176 - - [23/Nov/2013:07:38:13 -0800] "GET /favicon.ico HTTP/1.1" 200 661 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0"

That's the full request: front page and favicon only.

The closest match I can find is the snippetbot:
66.249.83.169 - - [01/Sep/2013:03:06:28 -0700] "GET / HTTP/1.1" 403 1497 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google (+https://developers.google.com/+/web/snippet/)"

(If it hadn't received a 403 it would have gone on to ask for the favicon-- not realizing that this isn't blocked anyway. A human gets the favicon with their 403 page. This reminds me that I think I've figured out what the snippetbot does. It gets the favicon for any sites you've got listed on your google profile. One of these days I'll unblock it and see if I'm correct.)

The punch line is that I've recently started quasi-blocking FF 6. On my personal site it would have been redirected to "goaway.html". This one coasted through because it was visiting the art studio's site, which doesn't have as detailed an htaccess file.

 

lucy24




msg:4628203
 11:04 pm on Dec 5, 2013 (gmt 0)

:: bump ::

And no sooner do they try out this UA than we go to the long version:

66.249.84.204 - - [26/Nov/2013:20:37:00 -0800] "GET / HTTP/1.1" 301 602 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.84.204 - - [26/Nov/2013:20:37:00 -0800] "GET /boilerplate/goaway.html HTTP/1.1" 200 1481 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.84.204 - - [26/Nov/2013:20:37:01 -0800] "GET /favicon.ico HTTP/1.1" 200 661 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"

I think this was its first appearance; there have been later ones.

This version, like the shorter form, gets redirected to the "Are you human? Prove it!" page. I pasted-in the full visit because it shows the combination of humanoid and robotic behavior: YES instant following of redirect, YES favicon, NO stylesheet.

I think it replaces one or both of:
--the old faviconbot which came without a UA. (Logged as "-" which I think means they didn't send a UA header at all?)
--the snippetbot (the one with /+/ in the UA string). The last time I looked at my Google Plus page, the favicons were back; they disappeared at some time after I started blocking the snippetbot.

not2easy




msg:4628256
 4:45 am on Dec 6, 2013 (gmt 0)

I am blocking the snippet bot also, and apparently some other form of a googlebot. Need to do some access log checking to figure out what it might be. In GWT now, they show any crawl errors along with server responses broken down by Desktop, Smart Phone and Feature Phone. I'm blocking Feature Phones somehow, but it does not show a 403 response, no response at all. It is about #122 on my list of things to do right now. I just copied the list of URLs they say they were blocked from and the datestamp shown.

tangor




msg:4628259
 5:17 am on Dec 6, 2013 (gmt 0)

Does anyone know for sure what this is?

No, but I do recognize a number of strings I've disallowed for several years... And because of that I don't see url strings like this...

Sorry. No help!

JAB Creations




msg:4628424
 6:08 pm on Dec 6, 2013 (gmt 0)

I just noticed this is the only user agent with an IP that resolves to Google that has been getting blocked at my site.

Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon


I'm not sure what it is though when it first visits the site 99% of the time it goes to the front page.

Anyone have an idea what Google is doing with this bot?

- John

[edited by: incrediBILL at 4:23 am (utc) on Dec 15, 2013]
[edit reason] spliced to thread [/edit]

bobothecat2




msg:4630733
 10:40 pm on Dec 14, 2013 (gmt 0)

I started noticing these crawls a couple of days ago:

66.249.84.72 - - [12/Dec/2013:03:33:05 -0700] "GET / HTTP/1.1" 301 253 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.84.115 - - [12/Dec/2013:03:33:06 -0700] "GET / HTTP/1.1" 200 2339 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.84.115 - - [12/Dec/2013:03:33:06 -0700] "GET /favicon.ico HTTP/1.1" 200 2238 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.83.193 - - [12/Dec/2013:03:33:11 -0700] "GET / HTTP/1.1" 301 253 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.83.67 - - [12/Dec/2013:03:33:11 -0700] "GET / HTTP/1.1" 200 2339 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.81.104 - - [12/Dec/2013:03:33:11 -0700] "GET / HTTP/1.1" 301 253 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.83.67 - - [12/Dec/2013:03:33:12 -0700] "GET /favicon.ico HTTP/1.1" 200 2238 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.81.8 - - [12/Dec/2013:03:33:12 -0700] "GET / HTTP/1.1" 200 2339 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"
66.249.81.8 - - [12/Dec/2013:03:33:12 -0700] "GET /favicon.ico HTTP/1.1" 200 2238 "-" "Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon"

Not really sure what to make of it... anyone have any insight as to what the heck Google is trying to do now?

[edited by: incrediBILL at 4:23 am (utc) on Dec 15, 2013]
[edit reason] spliced to thread [/edit]

dstiles




msg:4630895
 6:24 pm on Dec 15, 2013 (gmt 0)

66.249.80.0 - 66.249.95.255 are a proxy IPs or their own tools (eg feed, translate etc - they are rubbish at segregation), so it's not a bot, it's a person (or bad bot) using the proxy. It's possible it's a google employee but could as easily be public access.

The complete google range within this subrange is 66.249.64.0 - 66.249.95.255. I have 66.249.64.0 - 66.249.79.255 for bots within that range.

lucy24




msg:4630931
 10:27 pm on Dec 15, 2013 (gmt 0)

Does that mean we're better off treating it as two unrelated ranges?

66.249.64.0/20 >> googlebot
66.249.80.0/20 >> assorted bots of unknown origin, including but not limited to googloid activity

not2easy




msg:4630961
 1:11 am on Dec 16, 2013 (gmt 0)

This looks like the same UA that was coming in from 66.249.83.187 and 66.249.83.144 with
(+https://developers.google.com/+/web/snippet/)
where it now shows "favicon". It never asked for anything but the front page and favicon. After more than enough visits it started getting a 403.

lucy24




msg:4631003
 4:42 am on Dec 16, 2013 (gmt 0)

If this is in fact the robot responsible for the Google Plus list of icons, I can only say that they're pushing the idea of "I'm not a robot so I don't have to heed robots.txt" to the absolute limit. It's not as if people visit your profile page and explicitly click on something that says "Yes, please, show me the icons for these sites!"

But it's more principle than real injury. I'm not a front-driven site, so I generally don't even care if a robot stops by for an unauthorized pickup. And the favicon is exempt from all 403 blocks.

dstiles




msg:4631283
 9:03 pm on Dec 16, 2013 (gmt 0)

Lucy - I think unrelated ranges, yes. I have the first part enabled as a bot range, the second is 403-blocked.

For the bot range, if any access is not a proper acceptable bot UA then it gets blocked (can't recall if it's 403 or 405).

For the "utils" range, there are proxy, translate and feed amongst other things on that range (or were - the ban goes back a while). I know I should enable the proxy and translate (at least) but if G can't differentiate between bot-like, proxy, translation and other activities by allocating a few of their VAST range of IPs to each, I'm afraid a few G-centric customers will have to be disappointed.

bobothecat2




msg:4631637
 10:22 pm on Dec 17, 2013 (gmt 0)

For clarity's sake, I gather that it's safe to assume that blocking 66.249.80.0/20 might be a good thing then?

dstiles




msg:4631942
 10:33 pm on Dec 18, 2013 (gmt 0)

It's what I do. Whether it's appropriate for you is something else: as I said, it seems to have a variety of "utils" and proxies.

Personally, I block as much of G as I can get away with; I'd block all of it but my clients would object. :(

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved