Welcome to WebmasterWorld Guest from 54.159.214.250

Forum Moderators: Ocean10000 & incrediBILL

NerdyBot and the Google Cloud

Google range with strange referrer

   
3:08 am on Apr 6, 2014 (gmt 0)



this showed up in my logs:

162.222.176.2 - - [04/Apr/2014:10:05:50 -0600] "GET / HTTP/1.0" 200 8501 "-" "NerdyBot"

that is the entire entry; no user agent. the ip address appears to be Google. What is going on here?
3:14 am on Apr 6, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The IP is Googleusercontent, which isn't the same as where Googlebot comes from. It's the Google cloud and should be blocked.
1:38 pm on Apr 7, 2014 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



In the logs for one of my sites, it looks like:
Host: 23.251.159.154
/
Http Code: 403 Date: Apr 06 22:37:17 Http Version: HTTP/1.0 Size in Bytes: 13
Referer: -
Agent: NerdyBot

So in this log entry it is considered an agent. This is also a different IP. I saw it several days ago and already added it to my block user-agents list in .htaccess
4:24 pm on Apr 7, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



that is the entire entry; no user agent

"NerdyBot" is the user agent. Anything in the User-Agent header is considered the UA string, whether or not it looks like a human browser. A missing or empty UA field can be blocked at the gate.
5:35 pm on Apr 7, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This is also a different IP.


That's because they use the Google cloud.

It's similar to AWS and should be blocked.

This is a simple example to show why all Google IPs should not be assumed to be Googlebot, which is why I strictly validate Googlebot or any of the other Google spiders to the best of my ability.
11:24 am on May 16, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



same UA from 107.178.212.96, 107.178.223.63 and 23.251.151.180.

NetRange: 107.178.192.0 - 107.178.255.255
CIDR: 107.178.192.0/18
OriginAS: AS15169
NetName: GOOGLE-CLOUD

NetRange: 23.251.128.0 - 23.251.159.255
CIDR: 23.251.128.0/19
OriginAS: AS15169
NetName: GOOGLE-CLOUD
2:17 pm on May 16, 2014 (gmt 0)



Hit my site yesterday, two different times.

Thanks for the ranges blend27!
8:10 pm on May 16, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Here's my Google-Cloud deny list:

23.236.48.0 - 23.236.63.255
23.236.48.0/20

108.59.80.0 - 108.59.95.255
108.59.80.0/20

162.222.176.0 - 162.222.183.255
162.222.176.0/21
10:01 pm on May 16, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





And another Google-Cloud

199.223.232.0 - 199.223.239.255
199.223.232.0/21
1:46 pm on May 18, 2014 (gmt 0)



Thank you, keyplyr.
11:17 am on May 20, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21
descr: Google via LEVEL3

[bgp.he.net...]

and here is ALL Google: [bgp.he.net...]
6:08 pm on May 20, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21

Which is inside of...
8.35.192.0 - 8.35.207.255
8.35.192.0/20
2:15 am on May 26, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Seeing new "visits" from:
NetRange: 108.170.192.0 - 108.170.255.255
CIDR: 108.170.192.0/18
OriginAS: AS15169
NetName: GOOGLE
no referer/homepage. I can't download the logs yet so data is from an auto email notification.
2:38 am on May 26, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Thanks not2easy, didn't have that one. Course the potential danger is, as Google expands its various services, these Google Inc ranges will be used for something that may play a role affecting our sites.
2:53 am on May 26, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



I downloaded the information shown in blend27's link above and this range is listed there, but it isn't productive to block all those ranges "just in case". I think we may want a Google Cloud thread to parallel the Amazon listings, just because they aren't all going to be NerdyBot hits but it sure is a growing list of Amazon type traffic.
8:41 am on May 26, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I think we may want a Google Cloud thread to parallel the Amazon listings

Yeah... maybe. Guess we'll find out :)

A lot of what I see from Google Inc and Google-Cloud comes in with "dev", "developer", "usercontent" and/or "app" somewhere in the UA. So far blocking these attributes has worked for me. When I see these (I pull all 403s and periodically give them a look) I have better info to judge whether further action is warranted.
3:59 pm on Jun 7, 2014 (gmt 0)



Here are the ranges I found this morning that resolve to some form of “googleusercontent.com”

23.236.48.0 - 23.236.63.255
162.222.176.0 - 162.222.183.255
192.158.28.0 - 192.158.31.255
23.251.128.0 - 23.251.159.255
107.167.160.0 - 107.167.191.255
146.148.0.0 - 146.148.127.255
173.255.112.0 - 173.255.127.255
107.178.192.0 - 107.178.255.255
108.170.192.0 - 108.170.255.255
8.34.216.0 - 8.34.223.255
8.35.192.0 - 8.35.199.255
66.102.0.0 - 66.102.15.255
108.59.80.0 - 108.59.95.255
199.223.232.0 - 199.223.239.255
8.34.208.0 - 8.34.215.255
199.192.112.0 - 199.192.115.255
8.35.200.0 - 8.35.207.255

So far the NerdyBot has done nothing but visit my homepage on a daily basis, but the smarmy tone of this page just hacks me off:

[nerdybot.com...]
8:52 pm on Jun 7, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



8.34.28.0 - 8.34.215.255
8.34.216.0 - 8.34.223.255
8.35.20.0 - 8.35.27.255
8.35.192.0 - 8.35.199.255

... and that's why it was so satisfying to be able to say

Deny from 8


Somewhere along the line I must have looked up
8.35.200.0/21
because I've got it labeled GoogleAppCenter. That was probably before I threw in the towel and blocked the whole alpha.

17, ###, really? I thought the whole thing was Apple. Is 18 still all MIT or have they been selling off, like Merck?

:: detour to raw logs ::

Huh. They must not like me.

We allow users to search the full source code of web pages in our index, not just the plaintext.

###! And this is supposed to increase webmasters' desire to be crawled?

:: idly wondering about alternative names ::
12:05 am on Jun 8, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Somewhere along the line I must have looked up
8.35.200.0/21

I block the /20

That was probably before I threw in the towel and blocked the whole [8] alpha.

Wow... you must not be a commerce site. I tried that for a week about a year ago and sales dropped. Too many humans. YMMV.
12:44 am on Jun 8, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



you must not be a commerce site

Me? I'm not an anything site; I'm just a human with a web page. I thought everyone knew that.

:: detour to logs to confirm hunch ::

Haha. Here is the sum total of all 8.x.y.z requests in the past year for errorstyles.css and/or the favicon (both indicative of locked-out humans):

8.35.201.53 - - [17/Feb/2014:22:55:22 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://example.com/fun/panda.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)" 
8.35.201.49 - - [17/Feb/2014:22:55:22 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)"
8.35.200.35 - - [06/Jan/2014:21:45:40 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.38 - - [09/May/2014:08:00:47 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)"

Yes, that's the page linked from my profile. Hence the haha.

Other site (no, I do not understand TextWrangler's alphabet, thank you for asking):
8.36.230.242 - - [10/Nov/2013:12:28:50 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" 
8.36.230.242 - - [10/Nov/2013:12:28:51 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"

Note that I don't have any explicitly coded links to favicon.ico, so this one's a barefaced lie and deserves to get locked out.
8.35.200.36 - - [27/Nov/2013:10:44:02 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)" 
8.35.201.0 - - [15/Aug/2013:14:16:11 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.47 - - [26/Jan/2014:07:51:11 -0800] "GET /favicon.ico HTTP/1.1" 200 1695 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.26.250.226 - - [05/Oct/2013:15:18:21 -0700] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2954 "http://www.example.com/ebooks/blind/ThreeBlindMice.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"
8.26.250.226 - - [05/Oct/2013:15:18:22 -0700] "GET /favicon.ico HTTP/1.1" 200 661 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"


That, at least, explains where I got the "AppEngine-Google" part from. Anyway, those are all expendable requests. If anyone had been trying for the /fonts/ or /hovercraft/ directory, I'd have had to give it another think.

One way and another, there are probably many, many WebmasterWorld readers who would not be able to get into their own sites.
7:11 am on Jun 8, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Well I do block appengine.
3:36 am on Sep 2, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



DigitalOcean now brings you; "NerdyBot"
104.131.0.0 - 104.131.255.255
104.131.0.0/16
DIGITALOCEAN-9

along with another Google-Cloud NerdyBot:
130.211.0.0 - 130.211.255.255
130.211.0.0/16
GOOGLE-CLOUD
1:32 am on Sep 3, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Nerdy uses HTTP/1.0
2:40 am on Sep 3, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Looking though a list of 37 "NerdyBot" UA's from 8 different CIDRs during the past month on one site, they are all using HTTP/1.1 so I would not count on them all being set up the same way.
6:40 pm on Oct 3, 2014 (gmt 0)



Another range to add to the Google Cloud list:

104.154.0.0 - 104.155.255.255
104.154.0.0/15
10:03 pm on Nov 25, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Looking through this month's security log of proxy accesses I came upon further abuse of the google proxy IP ranges by google itself (again!). I hadn't come across these two user-agents before...

Mozilla/5.0 (compatible; X11; Linux x86_64; Google-StructuredDataTestingTool; +http://www.google.com/webmasters/tools/richsnippets)

...which has no reason that I can see to visit any of the sites it did, and certainly not as often as it did, and...

Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

...which fetched favicons (including the home page) several times where I would have thought once should have been enough.

Someone posted a while ago (here or in google forum) that webpreview had been discontinued. Not according to this months log, not by a long way. Another infraction of the proxy status.

IPs ranges involved:
64.233.173.0 - 64.233.173.255
66.249.80.0 - 66.249.93.255
8:14 am on Nov 26, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

...which fetched favicons (including the home page) several times where I would have thought once should have been enough.

Gosh. Have you really not met this before? For years the faviconbot traveled with no UA at all-- getting it an automatic lockout on most sites it visited, presumably including yours and mine. Then it changed to Firefox 6, a browser so ancient, even _I_ lock it out.

The faviconbot always requests the front page before the favicon itself. It has just this instant dawned on me that it isn't just doing this to be annoying: it needs to read one page in order to see whether the HTML includes an icon reference. The irony in my case is that I've poked a <Files> hole for the favicon, as it's one more way to identify wrongly blocked humans. So there's nothing to stop the robot from requesting and receiving example.com/favicon.ico even if I won't show it anything else.

I see Google Preview sometimes in logs, but I'm ### if I can figure out where it's coming from, since I never see a Preview link in SERPs.
8:53 pm on Nov 26, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



I don't think I have seen it before, no. No doubt it was there but it escaped me. I don't log images (except as a normal site log) as I cannot (or at least could not) block their access in IIS/ASP.

I suspect the real reason it grabs the page first is that firefox (if it really is that) needs to. I noticed it was v6 as well. I sometimes wonder what world some of these people live in. :(

I forgot to mention the (current?) preview UA...

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.145

The referer claims google.com/search but it seems fixated on a handful of actual pages. I wonder if they've switched off the search display but forgotten to turn off the bot. Or (insidious thought!) they haven't forgotten and are just using it as an excuse to bypass robots.txt. But they wouldn't do that, would they? :)
10:09 pm on Nov 26, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



The Google Preview UA is also used by some GWT functions. For example
66.249.85.40 - - [19/Feb/2014:04:47:18 -0800] "GET /boilerplate/legal.html HTTP/1.1" 200 4273 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36" 
69.228.abc.def - - [19/Feb/2014:04:47:18 -0800] "GET /boilerplate/legal.html HTTP/1.1" 200 4273 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en; rv:1.9.2.28) Gecko/20120308 Camino/2.1.2 (like Firefox/3.6.28)"

(The second line is me; that's how I'm sure it was GWT.) This is a no-indexed page, so the "/search" in the referer is clearly bogus.

Hm, here's another one I never noticed (log wrangling auto-ignores any 403 responses):
66.249.85.40 - - [20/Jan/2014:18:42:31 -0800] "GET /silence/nagvaarniq/kajuaq.html HTTP/1.1" 403 1124 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"

That one's roboted-out and the linking text is something neutral like "The rest of the story" so there is, again, zero possibility that someone actually found it in search. I had to do some hunting to figure out why it got a 403-- but the mere fact that this UA asked for the page shows that I was right to keep the rule.

I suspect the real reason it grabs the page first is that firefox (if it really is that) needs to.

But, but, it's not really FF is it? It's just a robot sending a UA string.
5:14 am on Nov 27, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Shock quote/
Firefox 6, a browser so ancient, even _I_ lock it out.
/shockquote

Sent using PPC Mac FF 3.5.9 entering second decade and still towing barges :)
This 54 message thread spans 2 pages: 54
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month