homepage Welcome to WebmasterWorld Guest from 54.211.230.186
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 54 message thread spans 2 pages: 54 ( [1] 2 > >     
NerdyBot and the Google Cloud
Google range with strange referrer
dupres01



 
Msg#: 4660805 posted 3:08 am on Apr 6, 2014 (gmt 0)

this showed up in my logs:

162.222.176.2 - - [04/Apr/2014:10:05:50 -0600] "GET / HTTP/1.0" 200 8501 "-" "NerdyBot"

that is the entire entry; no user agent. the ip address appears to be Google. What is going on here?

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 3:14 am on Apr 6, 2014 (gmt 0)

The IP is Googleusercontent, which isn't the same as where Googlebot comes from. It's the Google cloud and should be blocked.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 1:38 pm on Apr 7, 2014 (gmt 0)

In the logs for one of my sites, it looks like:
Host: 23.251.159.154
/
Http Code: 403 Date: Apr 06 22:37:17 Http Version: HTTP/1.0 Size in Bytes: 13
Referer: -
Agent: NerdyBot

So in this log entry it is considered an agent. This is also a different IP. I saw it several days ago and already added it to my block user-agents list in .htaccess

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4660805 posted 4:24 pm on Apr 7, 2014 (gmt 0)

that is the entire entry; no user agent

"NerdyBot" is the user agent. Anything in the User-Agent header is considered the UA string, whether or not it looks like a human browser. A missing or empty UA field can be blocked at the gate.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 5:35 pm on Apr 7, 2014 (gmt 0)

This is also a different IP.


That's because they use the Google cloud.

It's similar to AWS and should be blocked.

This is a simple example to show why all Google IPs should not be assumed to be Googlebot, which is why I strictly validate Googlebot or any of the other Google spiders to the best of my ability.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4660805 posted 11:24 am on May 16, 2014 (gmt 0)

same UA from 107.178.212.96, 107.178.223.63 and 23.251.151.180.

NetRange: 107.178.192.0 - 107.178.255.255
CIDR: 107.178.192.0/18
OriginAS: AS15169
NetName: GOOGLE-CLOUD

NetRange: 23.251.128.0 - 23.251.159.255
CIDR: 23.251.128.0/19
OriginAS: AS15169
NetName: GOOGLE-CLOUD

slipkid



 
Msg#: 4660805 posted 2:17 pm on May 16, 2014 (gmt 0)

Hit my site yesterday, two different times.

Thanks for the ranges blend27!

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 8:10 pm on May 16, 2014 (gmt 0)



Here's my Google-Cloud deny list:

23.236.48.0 - 23.236.63.255
23.236.48.0/20

108.59.80.0 - 108.59.95.255
108.59.80.0/20

162.222.176.0 - 162.222.183.255
162.222.176.0/21

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 10:01 pm on May 16, 2014 (gmt 0)



And another Google-Cloud

199.223.232.0 - 199.223.239.255
199.223.232.0/21

dupres01



 
Msg#: 4660805 posted 1:46 pm on May 18, 2014 (gmt 0)

Thank you, keyplyr.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4660805 posted 11:17 am on May 20, 2014 (gmt 0)

here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21
descr: Google via LEVEL3

[bgp.he.net...]

and here is ALL Google: [bgp.he.net...]

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 6:08 pm on May 20, 2014 (gmt 0)

here is another one:

8.35.200.0 - 8.35.207.255
route: 8.35.200.0/21

Which is inside of...
8.35.192.0 - 8.35.207.255
8.35.192.0/20

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 2:15 am on May 26, 2014 (gmt 0)

Seeing new "visits" from:
NetRange: 108.170.192.0 - 108.170.255.255
CIDR: 108.170.192.0/18
OriginAS: AS15169
NetName: GOOGLE
no referer/homepage. I can't download the logs yet so data is from an auto email notification.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 2:38 am on May 26, 2014 (gmt 0)



Thanks not2easy, didn't have that one. Course the potential danger is, as Google expands its various services, these Google Inc ranges will be used for something that may play a role affecting our sites.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 2:53 am on May 26, 2014 (gmt 0)

I downloaded the information shown in blend27's link above and this range is listed there, but it isn't productive to block all those ranges "just in case". I think we may want a Google Cloud thread to parallel the Amazon listings, just because they aren't all going to be NerdyBot hits but it sure is a growing list of Amazon type traffic.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 8:41 am on May 26, 2014 (gmt 0)

I think we may want a Google Cloud thread to parallel the Amazon listings

Yeah... maybe. Guess we'll find out :)

A lot of what I see from Google Inc and Google-Cloud comes in with "dev", "developer", "usercontent" and/or "app" somewhere in the UA. So far blocking these attributes has worked for me. When I see these (I pull all 403s and periodically give them a look) I have better info to judge whether further action is warranted.

Flyby Knight



 
Msg#: 4660805 posted 3:59 pm on Jun 7, 2014 (gmt 0)

Here are the ranges I found this morning that resolve to some form of “googleusercontent.com”

23.236.48.0 - 23.236.63.255
162.222.176.0 - 162.222.183.255
192.158.28.0 - 192.158.31.255
23.251.128.0 - 23.251.159.255
107.167.160.0 - 107.167.191.255
146.148.0.0 - 146.148.127.255
173.255.112.0 - 173.255.127.255
107.178.192.0 - 107.178.255.255
108.170.192.0 - 108.170.255.255
8.34.216.0 - 8.34.223.255
8.35.192.0 - 8.35.199.255
66.102.0.0 - 66.102.15.255
108.59.80.0 - 108.59.95.255
199.223.232.0 - 199.223.239.255
8.34.208.0 - 8.34.215.255
199.192.112.0 - 199.192.115.255
8.35.200.0 - 8.35.207.255

So far the NerdyBot has done nothing but visit my homepage on a daily basis, but the smarmy tone of this page just hacks me off:

[nerdybot.com...]

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4660805 posted 8:52 pm on Jun 7, 2014 (gmt 0)

8.34.28.0 - 8.34.215.255
8.34.216.0 - 8.34.223.255
8.35.20.0 - 8.35.27.255
8.35.192.0 - 8.35.199.255

... and that's why it was so satisfying to be able to say

Deny from 8

Somewhere along the line I must have looked up
8.35.200.0/21
because I've got it labeled GoogleAppCenter. That was probably before I threw in the towel and blocked the whole alpha.

17, ###, really? I thought the whole thing was Apple. Is 18 still all MIT or have they been selling off, like Merck?

:: detour to raw logs ::

Huh. They must not like me.

We allow users to search the full source code of web pages in our index, not just the plaintext.

###! And this is supposed to increase webmasters' desire to be crawled?

:: idly wondering about alternative names ::

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 12:05 am on Jun 8, 2014 (gmt 0)



Somewhere along the line I must have looked up
8.35.200.0/21

I block the /20

That was probably before I threw in the towel and blocked the whole [8] alpha.

Wow... you must not be a commerce site. I tried that for a week about a year ago and sales dropped. Too many humans. YMMV.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4660805 posted 12:44 am on Jun 8, 2014 (gmt 0)

you must not be a commerce site

Me? I'm not an anything site; I'm just a human with a web page. I thought everyone knew that.

:: detour to logs to confirm hunch ::

Haha. Here is the sum total of all 8.x.y.z requests in the past year for errorstyles.css and/or the favicon (both indicative of locked-out humans):

8.35.201.53 - - [17/Feb/2014:22:55:22 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://example.com/fun/panda.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)"
8.35.201.49 - - [17/Feb/2014:22:55:22 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~leifengwang7)"
8.35.200.35 - - [06/Jan/2014:21:45:40 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.38 - - [09/May/2014:08:00:47 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)"

Yes, that's the page linked from my profile. Hence the haha.

Other site (no, I do not understand TextWrangler's alphabet, thank you for asking):
8.36.230.242 - - [10/Nov/2013:12:28:50 -0800] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2899 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
8.36.230.242 - - [10/Nov/2013:12:28:51 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "http://www.example.com/fun/ComingHome.html" "Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; SCH-R680 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"

Note that I don't have any explicitly coded links to favicon.ico, so this one's a barefaced lie and deserves to get locked out.
8.35.200.36 - - [27/Nov/2013:10:44:02 -0800] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: getfavicon)"
8.35.201.0 - - [15/Aug/2013:14:16:11 -0700] "GET /favicon.ico HTTP/1.1" 200 606 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.35.200.47 - - [26/Jan/2014:07:51:11 -0800] "GET /favicon.ico HTTP/1.1" 200 1695 "-" "AppEngine-Google; (+http://code.google.com/appengine; appid: s~getfavicon27)"
8.26.250.226 - - [05/Oct/2013:15:18:21 -0700] "GET /boilerplate/errorstyles.css HTTP/1.1" 200 2954 "http://www.example.com/ebooks/blind/ThreeBlindMice.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"
8.26.250.226 - - [05/Oct/2013:15:18:22 -0700] "GET /favicon.ico HTTP/1.1" 200 661 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36"


That, at least, explains where I got the "AppEngine-Google" part from. Anyway, those are all expendable requests. If anyone had been trying for the /fonts/ or /hovercraft/ directory, I'd have had to give it another think.

One way and another, there are probably many, many WebmasterWorld readers who would not be able to get into their own sites.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 7:11 am on Jun 8, 2014 (gmt 0)

Well I do block appengine.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 3:36 am on Sep 2, 2014 (gmt 0)

DigitalOcean now brings you; "NerdyBot"
104.131.0.0 - 104.131.255.255
104.131.0.0/16
DIGITALOCEAN-9

along with another Google-Cloud NerdyBot:
130.211.0.0 - 130.211.255.255
130.211.0.0/16
GOOGLE-CLOUD

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 1:32 am on Sep 3, 2014 (gmt 0)

Nerdy uses HTTP/1.0

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4660805 posted 2:40 am on Sep 3, 2014 (gmt 0)

Looking though a list of 37 "NerdyBot" UA's from 8 different CIDRs during the past month on one site, they are all using HTTP/1.1 so I would not count on them all being set up the same way.

bobothecat2



 
Msg#: 4660805 posted 6:40 pm on Oct 3, 2014 (gmt 0)

Another range to add to the Google Cloud list:

104.154.0.0 - 104.155.255.255
104.154.0.0/15

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4660805 posted 10:03 pm on Nov 25, 2014 (gmt 0)

Looking through this month's security log of proxy accesses I came upon further abuse of the google proxy IP ranges by google itself (again!). I hadn't come across these two user-agents before...

Mozilla/5.0 (compatible; X11; Linux x86_64; Google-StructuredDataTestingTool; +http://www.google.com/webmasters/tools/richsnippets)

...which has no reason that I can see to visit any of the sites it did, and certainly not as often as it did, and...

Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

...which fetched favicons (including the home page) several times where I would have thought once should have been enough.

Someone posted a while ago (here or in google forum) that webpreview had been discontinued. Not according to this months log, not by a long way. Another infraction of the proxy status.

IPs ranges involved:
64.233.173.0 - 64.233.173.255
66.249.80.0 - 66.249.93.255

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4660805 posted 8:14 am on Nov 26, 2014 (gmt 0)

Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0 Google favicon

...which fetched favicons (including the home page) several times where I would have thought once should have been enough.

Gosh. Have you really not met this before? For years the faviconbot traveled with no UA at all-- getting it an automatic lockout on most sites it visited, presumably including yours and mine. Then it changed to Firefox 6, a browser so ancient, even _I_ lock it out.

The faviconbot always requests the front page before the favicon itself. It has just this instant dawned on me that it isn't just doing this to be annoying: it needs to read one page in order to see whether the HTML includes an icon reference. The irony in my case is that I've poked a <Files> hole for the favicon, as it's one more way to identify wrongly blocked humans. So there's nothing to stop the robot from requesting and receiving example.com/favicon.ico even if I won't show it anything else.

I see Google Preview sometimes in logs, but I'm ### if I can figure out where it's coming from, since I never see a Preview link in SERPs.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4660805 posted 8:53 pm on Nov 26, 2014 (gmt 0)

I don't think I have seen it before, no. No doubt it was there but it escaped me. I don't log images (except as a normal site log) as I cannot (or at least could not) block their access in IIS/ASP.

I suspect the real reason it grabs the page first is that firefox (if it really is that) needs to. I noticed it was v6 as well. I sometimes wonder what world some of these people live in. :(

I forgot to mention the (current?) preview UA...

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.145

The referer claims google.com/search but it seems fixated on a handful of actual pages. I wonder if they've switched off the search display but forgotten to turn off the bot. Or (insidious thought!) they haven't forgotten and are just using it as an excuse to bypass robots.txt. But they wouldn't do that, would they? :)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4660805 posted 10:09 pm on Nov 26, 2014 (gmt 0)

The Google Preview UA is also used by some GWT functions. For example
66.249.85.40 - - [19/Feb/2014:04:47:18 -0800] "GET /boilerplate/legal.html HTTP/1.1" 200 4273 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
69.228.abc.def - - [19/Feb/2014:04:47:18 -0800] "GET /boilerplate/legal.html HTTP/1.1" 200 4273 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en; rv:1.9.2.28) Gecko/20120308 Camino/2.1.2 (like Firefox/3.6.28)"

(The second line is me; that's how I'm sure it was GWT.) This is a no-indexed page, so the "/search" in the referer is clearly bogus.

Hm, here's another one I never noticed (log wrangling auto-ignores any 403 responses):
66.249.85.40 - - [20/Jan/2014:18:42:31 -0800] "GET /silence/nagvaarniq/kajuaq.html HTTP/1.1" 403 1124 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/27.0.1453 Safari/537.36"
That one's roboted-out and the linking text is something neutral like "The rest of the story" so there is, again, zero possibility that someone actually found it in search. I had to do some hunting to figure out why it got a 403-- but the mere fact that this UA asked for the page shows that I was right to keep the rule.

I suspect the real reason it grabs the page first is that firefox (if it really is that) needs to.

But, but, it's not really FF is it? It's just a robot sending a UA string.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4660805 posted 5:14 am on Nov 27, 2014 (gmt 0)

Shock quote/
Firefox 6, a browser so ancient, even _I_ lock it out.
/shockquote

Sent using PPC Mac FF 3.5.9 entering second decade and still towing barges :)

This 54 message thread spans 2 pages: 54 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved