homepage Welcome to WebmasterWorld Guest from 54.227.77.237
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
NerdyBot and the Google Cloud
Google range with strange referrer
dupres01




msg:4660807
 3:08 am on Apr 6, 2014 (gmt 0)

this showed up in my logs:

162.222.176.2 - - [04/Apr/2014:10:05:50 -0600] "GET / HTTP/1.0" 200 8501 "-" "NerdyBot"

that is the entire entry; no user agent. the ip address appears to be Google. What is going on here?

 

incrediBILL




msg:4660808
 3:14 am on Apr 6, 2014 (gmt 0)

The IP is Googleusercontent, which isn't the same as where Googlebot comes from. It's the Google cloud and should be blocked.

aristotle




msg:4661128
 1:38 pm on Apr 7, 2014 (gmt 0)

In the logs for one of my sites, it looks like:
Host: 23.251.159.154
/
Http Code: 403 Date: Apr 06 22:37:17 Http Version: HTTP/1.0 Size in Bytes: 13
Referer: -
Agent: NerdyBot

So in this log entry it is considered an agent. This is also a different IP. I saw it several days ago and already added it to my block user-agents list in .htaccess

lucy24




msg:4661181
 4:24 pm on Apr 7, 2014 (gmt 0)

that is the entire entry; no user agent

"NerdyBot" is the user agent. Anything in the User-Agent header is considered the UA string, whether or not it looks like a human browser. A missing or empty UA field can be blocked at the gate.

incrediBILL




msg:4661189
 5:35 pm on Apr 7, 2014 (gmt 0)

This is also a different IP.


That's because they use the Google cloud.

It's similar to AWS and should be blocked.

This is a simple example to show why all Google IPs should not be assumed to be Googlebot, which is why I strictly validate Googlebot or any of the other Google spiders to the best of my ability.

blend27




msg:4671622
 11:24 am on May 16, 2014 (gmt 0)

same UA from 107.178.212.96, 107.178.223.63 and 23.251.151.180.

NetRange: 107.178.192.0 - 107.178.255.255
CIDR: 107.178.192.0/18
OriginAS: AS15169
NetName: GOOGLE-CLOUD

NetRange: 23.251.128.0 - 23.251.159.255
CIDR: 23.251.128.0/19
OriginAS: AS15169
NetName: GOOGLE-CLOUD

slipkid




msg:4671698
 2:17 pm on May 16, 2014 (gmt 0)

Hit my site yesterday, two different times.

Thanks for the ranges blend27!

keyplyr




msg:4671771
 8:10 pm on May 16, 2014 (gmt 0)



Here's my Google-Cloud deny list:

23.236.48.0 - 23.236.63.255
23.236.48.0/20

108.59.80.0 - 108.59.95.255
108.59.80.0/20

162.222.176.0 - 162.222.183.255
162.222.176.0/21

keyplyr




msg:4671782
 10:01 pm on May 16, 2014 (gmt 0)



And another Google-Cloud

199.223.232.0 - 199.223.239.255
199.223.232.0/21

dupres01




msg:4672039
 1:46 pm on May 18, 2014 (gmt 0)

Thank you, keyplyr.

blend27




msg:4672566
 11:17 am on May 20, 2014 (gmt 0)

here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21
descr: Google via LEVEL3

[bgp.he.net...]

and here is ALL Google: [bgp.he.net...]

keyplyr




msg:4672759
 6:08 pm on May 20, 2014 (gmt 0)

here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21

Which is inside of...
8.35.192.0 - 8.35.207.255
8.35.192.0/20

not2easy




msg:4674423
 2:15 am on May 26, 2014 (gmt 0)

Seeing new "visits" from:
NetRange: 108.170.192.0 - 108.170.255.255
CIDR: 108.170.192.0/18
OriginAS: AS15169
NetName: GOOGLE
no referer/homepage. I can't download the logs yet so data is from an auto email notification.

keyplyr




msg:4674431
 2:38 am on May 26, 2014 (gmt 0)



Thanks not2easy, didn't have that one. Course the potential danger is, as Google expands its various services, these Google Inc ranges will be used for something that may play a role affecting our sites.

not2easy




msg:4674433
 2:53 am on May 26, 2014 (gmt 0)

I downloaded the information shown in blend27's link above and this range is listed there, but it isn't productive to block all those ranges "just in case". I think we may want a Google Cloud thread to parallel the Amazon listings, just because they aren't all going to be NerdyBot hits but it sure is a growing list of Amazon type traffic.

keyplyr




msg:4674497
 8:41 am on May 26, 2014 (gmt 0)

I think we may want a Google Cloud thread to parallel the Amazon listings

Yeah... maybe. Guess we'll find out :)

A lot of what I see from Google Inc and Google-Cloud comes in with "dev", "developer", "usercontent" and/or "app" somewhere in the UA. So far blocking these attributes has worked for me. When I see these (I pull all 403s and periodically give them a look) I have better info to judge whether further action is warranted.

Flyby Knight




msg:4678205
 3:59 pm on Jun 7, 2014 (gmt 0)

Here are the ranges I found this morning that resolve to some form of “googleusercontent.com”

23.236.48.0 - 23.236.63.255
162.222.176.0 - 162.222.183.255
192.158.28.0 - 192.158.31.255
23.251.128.0 - 23.251.159.255
107.167.160.0 - 107.167.191.255
146.148.0.0 - 146.148.127.255
173.255.112.0 - 173.255.127.255
107.178.192.0 - 107.178.255.255
108.170.192.0 - 108.170.255.255
8.34.216.0 - 8.34.223.255
8.35.192.0 - 8.35.199.255
66.102.0.0 - 66.102.15.255
108.59.80.0 - 108.59.95.255
199.223.232.0 - 199.223.239.255
8.34.208.0 - 8.34.215.255
199.192.112.0 - 199.192.115.255
8.35.200.0 - 8.35.207.255

So far the NerdyBot has done nothing but visit my homepage on a daily basis, but the smarmy tone of this page just hacks me off:

[nerdybot.com...]

lucy24




msg:4678233
 8:52 pm on Jun 7, 2014 (gmt 0)

8.34.28.0 - 8.34.215.255
8.34.216.0 - 8.34.223.255
8.35.20.0 - 8.35.27.255
8.35.192.0 - 8.35.199.255

... and that's why it was so satisfying to be able to say

Deny from 8

Somewhere along the line I must have looked up
8.35.200.0/21
because I've got it labeled GoogleAppCenter. That was probably before I threw in the towel and blocked the whole alpha.

17, ###, really? I thought the whole thing was Apple. Is 18 still all MIT or have they been selling off, like Merck?

:: detour to raw logs ::

Huh. They must not like me.

We allow users to search the full source code of web pages in our index, not just the plaintext.

###! And this is supposed to increase webmasters' desire to be crawled?

:: idly wondering about alternative names ::

keyplyr




msg:4678254
 12:05 am on Jun 8, 2014 (gmt 0)



Somewhere along the line I must have looked up
8.35.200.0/21

I block the /20

That was probably before I threw in the towel and blocked the whole [8] alpha.

Wow... you must not be a commerce site. I tried that for a week about a year ago and sales dropped. Too many humans. YMMV.

lucy24




msg:4678258
 12:44 am on Jun 8, 2014 (gmt 0)

you must not be a commerce site

Me? I'm not an anything site; I'm just a human with a web page. I thought everyone knew that.

:: detour to logs to confirm hunch ::

Haha. Here is the sum total of all 8.x.y.z requests in the past year for errorstyles.css and/or the favicon (both indicative of locked-out humans):

8.35.201.53 - - [17/Feb/2014:22:55:22 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://example.com/fun/panda.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)"
8.35.201.49 - - [17/Feb/2014:22:55:22 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)"
8.35.200.35 - - [06/Jan/2014:21:45:40 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.38 - - [09/May/2014:08:00:47 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)"

Yes, that's the page linked from my profile. Hence the haha.

Other site (no, I do not understand TextWrangler's alphabet, thank you for asking):
8.36.230.242 - - [10/Nov/2013:12:28:50 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
8.36.230.242 - - [10/Nov/2013:12:28:51 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"

Note that I don't have any explicitly coded links to favicon.ico, so this one's a barefaced lie and deserves to get locked out.
8.35.200.36 - - [27/Nov/2013:10:44:02 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)"
8.35.201.0 - - [15/Aug/2013:14:16:11 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.47 - - [26/Jan/2014:07:51:11 -0800] "GET /favicon.ico HTTP/1.1" 200 1695 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.26.250.226 - - [05/Oct/2013:15:18:21 -0700] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2954 "http://www.example.com/ebooks/blind/ThreeBlindMice.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"
8.26.250.226 - - [05/Oct/2013:15:18:22 -0700] "GET /favicon.ico HTTP/1.1" 200 661 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"


That, at least, explains where I got the "AppEngine-Google" part from. Anyway, those are all expendable requests. If anyone had been trying for the /fonts/ or /hovercraft/ directory, I'd have had to give it another think.

One way and another, there are probably many, many WebmasterWorld readers who would not be able to get into their own sites.

keyplyr




msg:4678293
 7:11 am on Jun 8, 2014 (gmt 0)

Well I do block appengine.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved