Welcome to WebmasterWorld Guest from 54.196.8.177

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

GoogleBot/2.1

Has anyone else seen this UA?

     
2:59 pm on Jun 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


One of the persons who uses my browscap.ini file reported seeing the user agent "GoogleBot/2.1" (Note: That's the entire UA.) in their logs. The IP Address belongs to Google even though it's not in Dan's list. Is this something new or just something I've missed? Thanks.
6:09 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 22, 2001
posts:2450
votes: 0


What's the IP, Gary?
6:38 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Going back awhile, all I ever show are these kinds of host+string combos in connection with "Googlebot/2.1" (from two different sites' logs):

2002

crawl4.googlebot.com - - [06/Mar/2002:05:52:29 -0800] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl4.googlebot.com - - [08/Apr/2002:12:04:55 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl5.googlebot.com - - [01/May/2002:18:25:27 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

2006

crawl-66-249-66-52.googlebot.com - - [01/Jun/2006:00:02:01 -0700] "GET /robots.txt HTTP/1.1" 200 9770 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

8:06 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


I started seeing this one in early March coming from the IPs & dates listed below:

(no compatible; and no +http://www.google.com/bot.html, just Googlebot/2.1 as the entire UA)

66.249.65.65 5/31
66.249.66.101 4/26, 5/02
66.249.66.106 5/10 to 5/24
66.249.66.114 4/27
66.249.66.115 4/15 to 4/25
66.249.66.116 5/03 to 5/04
66.249.66.200 3/08, 3/31, 4/4 to 4/14
66.249.66.243 4/30, 5/01
66.249.66.73 4/28
66.249.66.75 5/05 to 5/09
66.249.66.99 5/25 to 5/30

Kept meaning to ask about them, too.

8:38 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


FWIW, picking a couple at random:

IP address: 66.249.65.65
Reverse DNS: crawl-66-249-65-65.googlebot.com

IP address: 66.249.66.101
Reverse DNS: crawl-66-249-66-101.googlebot.com

FWIW redux, we talked about something kind of related a couple of weeks ago, about how non-googlebot UAs are using G's IPs as proxies [webmasterworld.com]. I now block a slew of G's IPs because Googlebot never came through them, but iffy visitors did.

8:59 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Dan the IP Address was 66.249.66.114.

Pfui I'll check that thread out. Thanks.

Thanks everyone else.

9:04 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


Thanks Pfui. I knew they all came from Google IPs and were crawlers. My question is the same as GaryK's (I think), why the incomplete UA for Googlebot?

Or, do you mean the these are not actually visits by a crawler but someone using G's IP as a proxy?

9:07 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 13, 2004
posts:1425
votes: 0


Whois comes back to Google for that IP #, its within the NetRange: 66.249.64.0 - 66.249.95.255.
Its not spoofing.
Can't connect directly, so probably a spider, maybe a special one. -Larry
9:09 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Nancy, yes that's the essence of my question. I just need to know if this is a legitimate crawler from Google.
10:35 pm on June 5, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


It wouldn't surprise me at all if someone was spoofing Googlebot through G's IPs. In four years, I've never seen Googlebot come in from other than .googlebot.com and in recent months I've seen all too many NON Google UAs come in through G's IPs, and go where Googlebot is not allowed to go.

Also, in addition to the missing 'parts' of the UA Gary reported, note the incorrect capitalization --

GoogleBot/2.1

-- and here's the typical form:

Googlebot/2.1

Now, might "GoogleBot" be a beta version, or even a brand-new one? I guess. So I'd assess it by the usual checks -- except it fails the Googlebot ID test, ditto the googlebot.com Host test. So, let's see.

Did it ask for robots.txt? Did it heed it?

Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

5:40 am on June 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts:2038
votes: 1


Gary, nancy, do you two 'do' Google ads? Because here's some more info [webmasterworld.com] from a current thread in Google Search News [webmasterworld.com]. The poster describes your same situ, says it's AdWords.

Then again, there's the new, dedicated AdsBot-Google [webmasterworld.com] for AdWords, so beats me whether, or if, "GoogleBot/2.1" is also AdWords-related?

Well, that sure clears things up -- clear as mud, that is:) Sorry!

3:44 pm on June 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 22, 2001
posts:2450
votes: 0


> Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

Seems likely to me, too.

4:30 pm on June 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Gary, nancy, do you two 'do' Google ads?

I tried AdSense for a few months last year and then stopped using it.

So, in the absence of anything official from Google I guess we're considering this user agent a faker?

5:03 pm on June 6, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:May 14, 2002
posts:378
votes: 0


The robot with this UA "Googlebot/2.1", coming from 66.249.72.* visits only the pages I'm advertising in Adwords. It does check robots.txt.

I haven't put up any ads in June so I have no data on the editors hand checking. Prior to June the editor would visit from 66.102.6.*

5:39 pm on June 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


My post last night evidently got lost in cyberspace ...

Gary, nancy, do you two 'do' Google ads?

No, never ads of any kind.

Also, "my" googlebot is Googlebot/2.1 - notice the bot is correct form without capitalization.

6:01 pm on June 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Is there some way we can get Google to confirm if either of these two bots is legitimate?
10:21 am on June 7, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 2, 2005
posts:529
votes: 0


I haven't seen this one on my sites yet, but would add it's not totally implausible that it's something to do with Adwords.

I don't do Adwords, but I do have Adsense, and the bot for that is Mediapartners-Google/2.1, which then works its way through with the UA "compatible; Googlebot/2.1; +http://www.google.com/bot.html". Coincidence? No idea ;)

10:22 pm on June 7, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Determining if it's Google isn't hard.

All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Matter of fact, I bounce everything claiming to be Google not hosted on a Google block of IPs, just send it packing.

Why you might ask?

Because you'll find Google actually crawling through proxy servers that cloak directories of websites to Google, which then crawls your site through the proxy and pages can be hijacked in this manner as the SE's are stupid.

This is another reason I block proxies too so when SEs can't crawl through them it's no ptoblem.

11:04 pm on June 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 22, 2001
posts:2450
votes: 0


All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy.

11:25 pm on June 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


Thanks, volatilegx, for that clarification.

Can you explain if there is a way to determine if someone is spoofing through a google-owned proxy?

BTW, I don't to adsense or adwords and I get many visits every day from Mediapartners bot.

11:30 pm on June 7, 2006 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy

If Google permits that, then they get what they deserve, not a lot we can do about it unless they cough up a definitive list of crawler IPs.

However, even if they were, most people don't use NOARCHIVE and the server caches your pages anyway and the content can be had there as well.

11:31 pm on June 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


Nancy, the way I do it is via Dan's list of IP Addresses. Basically if a user agent claims to be from Google and it's not on Dan's list I serve it a robots.txt file that disallows everything. If it's not really from Google but ignores robots.txt and starts crawling it will quickly fall into a spider trap that lets me know about it so I can investigate further.
1:19 am on June 8, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:July 18, 2001
posts:889
votes: 0


Thanks GaryK, I know about Dan's list, but I just don't have the time to learn how to create a spider trap (don't know a thing about Php), I was hoping there was some other way to determine if it was someone using a proxy.

Of course ... that's why you all have spider traps, right? :)

1:31 am on June 8, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 17, 2002
posts:2251
votes: 0


That's part of the reason Nancy. Another reason is to stop abusive spiders from bringing a website to a screeching halt. For example, if I see a user agent taking more pages at a time than anyone could possibly read, or even skim, I will stop them dead in their tracks so my users don't have to suffer from a slow or non-responsive server.

I don't do PHP either. Most of my code is compiled into .dll files written in C++ and VB.NET.

1:36 am on June 8, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5460
votes: 3


I was hoping there was some other way to determine if it was someone using a proxy

[dsbl.org...]
[atgi.net...]

or Google provides
[google.com...]

SamSpade.org
keeps track of many items, most of which for reasons on mail spam.