homepage Welcome to WebmasterWorld Guest from 54.167.185.110
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
GoogleBot/2.1
Has anyone else seen this UA?
GaryK




msg:406882
 2:59 pm on Jun 5, 2006 (gmt 0)

One of the persons who uses my browscap.ini file reported seeing the user agent "GoogleBot/2.1" (Note: That's the entire UA.) in their logs. The IP Address belongs to Google even though it's not in Dan's list. Is this something new or just something I've missed? Thanks.

 

volatilegx




msg:406883
 6:09 pm on Jun 5, 2006 (gmt 0)

What's the IP, Gary?

Pfui




msg:406884
 6:38 pm on Jun 5, 2006 (gmt 0)

Going back awhile, all I ever show are these kinds of host+string combos in connection with "Googlebot/2.1" (from two different sites' logs):

2002

crawl4.googlebot.com - - [06/Mar/2002:05:52:29 -0800] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl4.googlebot.com - - [08/Apr/2002:12:04:55 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
crawl5.googlebot.com - - [01/May/2002:18:25:27 -0700] "GET /robots.txt HTTP/1.0" 200 204 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

2006

crawl-66-249-66-52.googlebot.com - - [01/Jun/2006:00:02:01 -0700] "GET /robots.txt HTTP/1.1" 200 9770 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

nancyb




msg:406885
 8:06 pm on Jun 5, 2006 (gmt 0)

I started seeing this one in early March coming from the IPs & dates listed below:

(no compatible; and no +http://www.google.com/bot.html, just Googlebot/2.1 as the entire UA)

66.249.65.65 5/31
66.249.66.101 4/26, 5/02
66.249.66.106 5/10 to 5/24
66.249.66.114 4/27
66.249.66.115 4/15 to 4/25
66.249.66.116 5/03 to 5/04
66.249.66.200 3/08, 3/31, 4/4 to 4/14
66.249.66.243 4/30, 5/01
66.249.66.73 4/28
66.249.66.75 5/05 to 5/09
66.249.66.99 5/25 to 5/30

Kept meaning to ask about them, too.

Pfui




msg:406886
 8:38 pm on Jun 5, 2006 (gmt 0)

FWIW, picking a couple at random:

IP address: 66.249.65.65
Reverse DNS: crawl-66-249-65-65.googlebot.com

IP address: 66.249.66.101
Reverse DNS: crawl-66-249-66-101.googlebot.com

FWIW redux, we talked about something kind of related a couple of weeks ago, about how non-googlebot UAs are using G's IPs as proxies [webmasterworld.com]. I now block a slew of G's IPs because Googlebot never came through them, but iffy visitors did.

GaryK




msg:406887
 8:59 pm on Jun 5, 2006 (gmt 0)

Dan the IP Address was 66.249.66.114.

Pfui I'll check that thread out. Thanks.

Thanks everyone else.

nancyb




msg:406888
 9:04 pm on Jun 5, 2006 (gmt 0)

Thanks Pfui. I knew they all came from Google IPs and were crawlers. My question is the same as GaryK's (I think), why the incomplete UA for Googlebot?

Or, do you mean the these are not actually visits by a crawler but someone using G's IP as a proxy?

larryhatch




msg:406889
 9:07 pm on Jun 5, 2006 (gmt 0)

Whois comes back to Google for that IP #, its within the NetRange: 66.249.64.0 - 66.249.95.255.
Its not spoofing.
Can't connect directly, so probably a spider, maybe a special one. -Larry

GaryK




msg:406890
 9:09 pm on Jun 5, 2006 (gmt 0)

Nancy, yes that's the essence of my question. I just need to know if this is a legitimate crawler from Google.

Pfui




msg:406891
 10:35 pm on Jun 5, 2006 (gmt 0)

It wouldn't surprise me at all if someone was spoofing Googlebot through G's IPs. In four years, I've never seen Googlebot come in from other than .googlebot.com and in recent months I've seen all too many NON Google UAs come in through G's IPs, and go where Googlebot is not allowed to go.

Also, in addition to the missing 'parts' of the UA Gary reported, note the incorrect capitalization --

GoogleBot/2.1

-- and here's the typical form:

Googlebot/2.1

Now, might "GoogleBot" be a beta version, or even a brand-new one? I guess. So I'd assess it by the usual checks -- except it fails the Googlebot ID test, ditto the googlebot.com Host test. So, let's see.

Did it ask for robots.txt? Did it heed it?

Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

Pfui




msg:406892
 5:40 am on Jun 6, 2006 (gmt 0)

Gary, nancy, do you two 'do' Google ads? Because here's some more info [webmasterworld.com] from a current thread in Google Search News [webmasterworld.com]. The poster describes your same situ, says it's AdWords.

Then again, there's the new, dedicated AdsBot-Google [webmasterworld.com] for AdWords, so beats me whether, or if, "GoogleBot/2.1" is also AdWords-related?

Well, that sure clears things up -- clear as mud, that is:) Sorry!

volatilegx




msg:406893
 3:44 pm on Jun 6, 2006 (gmt 0)

> Absent more info, or confirmation from G one way or another, I'm betting it's a fake, using G's IP(s) as a proxy.

Seems likely to me, too.

GaryK




msg:406894
 4:30 pm on Jun 6, 2006 (gmt 0)

Gary, nancy, do you two 'do' Google ads?

I tried AdSense for a few months last year and then stopped using it.

So, in the absence of anything official from Google I guess we're considering this user agent a faker?

fiestagirl




msg:406895
 5:03 pm on Jun 6, 2006 (gmt 0)

The robot with this UA "Googlebot/2.1", coming from 66.249.72.* visits only the pages I'm advertising in Adwords. It does check robots.txt.

I haven't put up any ads in June so I have no data on the editors hand checking. Prior to June the editor would visit from 66.102.6.*

nancyb




msg:406896
 5:39 pm on Jun 6, 2006 (gmt 0)

My post last night evidently got lost in cyberspace ...

Gary, nancy, do you two 'do' Google ads?

No, never ads of any kind.

Also, "my" googlebot is Googlebot/2.1 - notice the bot is correct form without capitalization.

GaryK




msg:406897
 6:01 pm on Jun 6, 2006 (gmt 0)

Is there some way we can get Google to confirm if either of these two bots is legitimate?

malachite




msg:406898
 10:21 am on Jun 7, 2006 (gmt 0)

I haven't seen this one on my sites yet, but would add it's not totally implausible that it's something to do with Adwords.

I don't do Adwords, but I do have Adsense, and the bot for that is Mediapartners-Google/2.1, which then works its way through with the UA "compatible; Googlebot/2.1; +http://www.google.com/bot.html". Coincidence? No idea ;)

incrediBILL




msg:406899
 10:22 pm on Jun 7, 2006 (gmt 0)

Determining if it's Google isn't hard.

All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Matter of fact, I bounce everything claiming to be Google not hosted on a Google block of IPs, just send it packing.

Why you might ask?

Because you'll find Google actually crawling through proxy servers that cloak directories of websites to Google, which then crawls your site through the proxy and pages can be hijacked in this manner as the SE's are stupid.

This is another reason I block proxies too so when SEs can't crawl through them it's no ptoblem.

volatilegx




msg:406900
 11:04 pm on Jun 7, 2006 (gmt 0)

All you have to do to verify it's Google is do a WHOIS <ip> and it will typically show "OrgName: Google Inc." if it's really them.

Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy.

nancyb




msg:406901
 11:25 pm on Jun 7, 2006 (gmt 0)

Thanks, volatilegx, for that clarification.

Can you explain if there is a way to determine if someone is spoofing through a google-owned proxy?

BTW, I don't to adsense or adwords and I get many visits every day from Mediapartners bot.

incrediBILL




msg:406902
 11:30 pm on Jun 7, 2006 (gmt 0)

Yeah but the point is that we think people are spoofing a user agent and routing through a google-owned proxy

If Google permits that, then they get what they deserve, not a lot we can do about it unless they cough up a definitive list of crawler IPs.

However, even if they were, most people don't use NOARCHIVE and the server caches your pages anyway and the content can be had there as well.

GaryK




msg:406903
 11:31 pm on Jun 7, 2006 (gmt 0)

Nancy, the way I do it is via Dan's list of IP Addresses. Basically if a user agent claims to be from Google and it's not on Dan's list I serve it a robots.txt file that disallows everything. If it's not really from Google but ignores robots.txt and starts crawling it will quickly fall into a spider trap that lets me know about it so I can investigate further.

nancyb




msg:406904
 1:19 am on Jun 8, 2006 (gmt 0)

Thanks GaryK, I know about Dan's list, but I just don't have the time to learn how to create a spider trap (don't know a thing about Php), I was hoping there was some other way to determine if it was someone using a proxy.

Of course ... that's why you all have spider traps, right? :)

GaryK




msg:406905
 1:31 am on Jun 8, 2006 (gmt 0)

That's part of the reason Nancy. Another reason is to stop abusive spiders from bringing a website to a screeching halt. For example, if I see a user agent taking more pages at a time than anyone could possibly read, or even skim, I will stop them dead in their tracks so my users don't have to suffer from a slow or non-responsive server.

I don't do PHP either. Most of my code is compiled into .dll files written in C++ and VB.NET.

wilderness




msg:406906
 1:36 am on Jun 8, 2006 (gmt 0)

I was hoping there was some other way to determine if it was someone using a proxy

[dsbl.org...]
[atgi.net...]

or Google provides
[google.com...]

SamSpade.org
keeps track of many items, most of which for reasons on mail spam.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved