Welcome to WebmasterWorld Guest from 54.221.30.139

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

GoogleBot/2.1

Has anyone else seen this UA?

     

GaryK

2:59 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of the persons who uses my browscap.ini file reported seeing the user agent "GoogleBot/2.1" (Note: That's the entire UA.) in their logs. The IP Address belongs to Google even though it's not in Dan's list. Is this something new or just something I've missed? Thanks.

volatilegx

6:09 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What's the IP, Gary?

Pfui

6:38 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Going back awhile, all I ever show are these kinds of host+string combos in connection with "Googlebot/2.1" (from two different sites' logs):

2002

crawl4.googlebot.com - - [06/Mar/2002:05:52:29 -0800] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl4.googlebot.com - - [08/Apr/2002:12:04:55 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl5.googlebot.com - - [01/May/2002:18:25:27 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

2006

crawl-66-249-66-52.googlebot.com - - [01/Jun/2006:00:02:01 -0700] "GET /robots.txt HTTP/1.1" 200 9770 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

nancyb

8:06 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I started seeing this one in early March coming from the IPs & dates listed below:

(no compatible; and no +http://www.google.com/bot.html, just Googlebot/2.1 as the entire UA)

66.249.65.65 5/31
66.249.66.101 4/26, 5/02
66.249.66.106 5/10 to 5/24
66.249.66.114 4/27
66.249.66.115 4/15 to 4/25
66.249.66.116 5/03 to 5/04
66.249.66.200 3/08, 3/31, 4/4 to 4/14
66.249.66.243 4/30, 5/01
66.249.66.73 4/28
66.249.66.75 5/05 to 5/09
66.249.66.99 5/25 to 5/30

Kept meaning to ask about them, too.

Pfui

8:38 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



FWIW, picking a couple at random:

IP address: 66.249.65.65
Reverse DNS: crawl-66-249-65-65.googlebot.com

IP address: 66.249.66.101
Reverse DNS: crawl-66-249-66-101.googlebot.com

FWIW redux, we talked about something kind of related a couple of weeks ago, about how non-googlebot UAs are using G's IPs as proxies [webmasterworld.com]. I now block a slew of G's IPs because Googlebot never came through them, but iffy visitors did.

GaryK

8:59 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dan the IP Address was 66.249.66.114.

Pfui I'll check that thread out. Thanks.

Thanks everyone else.

nancyb

9:04 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Pfui. I knew they all came from Google IPs and were crawlers. My question is the same as GaryK's (I think), why the incomplete UA for Googlebot?

Or, do you mean the these are not actually visits by a crawler but someone using G's IP as a proxy?

larryhatch

9:07 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Whois comes back to Google for that IP #, its within the NetRange: 66.249.64.0 - 66.249.95.255.
Its not spoofing.
Can't connect directly, so probably a spider, maybe a special one. -Larry

GaryK

9:09 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nancy, yes that's the essence of my question. I just need to know if this is a legitimate crawler from Google.

Pfui

10:35 pm on Jun 5, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



It wouldn't surprise me at all if someone was spoofing Googlebot through G's IPs. In four years, I've never seen Googlebot come in from other than .googlebot.com and in recent months I've seen all too many NON Google UAs come in through G's IPs, and go where Googlebot is not allowed to go.

Also, in addition to the missing 'parts' of the UA Gary reported, note the incorrect capitalization --

GoogleBot/2.1

-- and here's the typical form:

Googlebot/2.1

Now, might "GoogleBot" be a beta version, or even a brand-new one? I guess. So I'd assess it by the usual checks -- except it fails the Googlebot ID test, ditto the googlebot.com Host test. So, let's see.

Did it ask for robots.txt? Did it heed it?

Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

Pfui

5:40 am on Jun 6, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Gary, nancy, do you two 'do' Google ads? Because here's some more info [webmasterworld.com] from a current thread in Google Search News [webmasterworld.com]. The poster describes your same situ, says it's AdWords.

Then again, there's the new, dedicated AdsBot-Google [webmasterworld.com] for AdWords, so beats me whether, or if, "GoogleBot/2.1" is also AdWords-related?

Well, that sure clears things up -- clear as mud, that is:) Sorry!

volatilegx

3:44 pm on Jun 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

Seems likely to me, too.

GaryK

4:30 pm on Jun 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Gary, nancy, do you two 'do' Google ads?

I tried AdSense for a few months last year and then stopped using it.

So, in the absence of anything official from Google I guess we're considering this user agent a faker?

fiestagirl

5:03 pm on Jun 6, 2006 (gmt 0)

10+ Year Member



The robot with this UA "Googlebot/2.1", coming from 66.249.72.* visits only the pages I'm advertising in Adwords. It does check robots.txt.

I haven't put up any ads in June so I have no data on the editors hand checking. Prior to June the editor would visit from 66.102.6.*

nancyb

5:39 pm on Jun 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My post last night evidently got lost in cyberspace ...

Gary, nancy, do you two 'do' Google ads?

No, never ads of any kind.

Also, "my" googlebot is Googlebot/2.1 - notice the bot is correct form without capitalization.

GaryK

6:01 pm on Jun 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there some way we can get Google to confirm if either of these two bots is legitimate?

malachite

10:21 am on Jun 7, 2006 (gmt 0)

5+ Year Member



I haven't seen this one on my sites yet, but would add it's not totally implausible that it's something to do with Adwords.

I don't do Adwords, but I do have Adsense, and the bot for that is Mediapartners-Google/2.1, which then works its way through with the UA "compatible; Googlebot/2.1; +http://www.google.com/bot.html". Coincidence? No idea ;)

incrediBILL

10:22 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Determining if it's Google isn't hard.

All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Matter of fact, I bounce everything claiming to be Google not hosted on a Google block of IPs, just send it packing.

Why you might ask?

Because you'll find Google actually crawling through proxy servers that cloak directories of websites to Google, which then crawls your site through the proxy and pages can be hijacked in this manner as the SE's are stupid.

This is another reason I block proxies too so when SEs can't crawl through them it's no ptoblem.

volatilegx

11:04 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy.

nancyb

11:25 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, volatilegx, for that clarification.

Can you explain if there is a way to determine if someone is spoofing through a google-owned proxy?

BTW, I don't to adsense or adwords and I get many visits every day from Mediapartners bot.

incrediBILL

11:30 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy

If Google permits that, then they get what they deserve, not a lot we can do about it unless they cough up a definitive list of crawler IPs.

However, even if they were, most people don't use NOARCHIVE and the server caches your pages anyway and the content can be had there as well.

GaryK

11:31 pm on Jun 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nancy, the way I do it is via Dan's list of IP Addresses. Basically if a user agent claims to be from Google and it's not on Dan's list I serve it a robots.txt file that disallows everything. If it's not really from Google but ignores robots.txt and starts crawling it will quickly fall into a spider trap that lets me know about it so I can investigate further.

nancyb

1:19 am on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks GaryK, I know about Dan's list, but I just don't have the time to learn how to create a spider trap (don't know a thing about Php), I was hoping there was some other way to determine if it was someone using a proxy.

Of course ... that's why you all have spider traps, right? :)

GaryK

1:31 am on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's part of the reason Nancy. Another reason is to stop abusive spiders from bringing a website to a screeching halt. For example, if I see a user agent taking more pages at a time than anyone could possibly read, or even skim, I will stop them dead in their tracks so my users don't have to suffer from a slow or non-responsive server.

I don't do PHP either. Most of my code is compiled into .dll files written in C++ and VB.NET.

wilderness

1:36 am on Jun 8, 2006 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I was hoping there was some other way to determine if it was someone using a proxy

[dsbl.org...]
[atgi.net...]

or Google provides
[google.com...]

SamSpade.org
keeps track of many items, most of which for reasons on mail spam.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month