Forum Moderators: open

Message Too Old, No Replies

Does Google have a mobile crawler?

If so do I have the right UA?

         

GaryK

10:11 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Someone who downloads my files just submitted a new user agent that claims to be Googlebot-Mobile. I'm embarrassed to say this is the first I've heard of it despite having several .mobi domains, so that makes me wonder if it's legit. The DoCoMo stuff really makes me wonder cause that's decidedly un-Google-like and usually indicates the Japanese mobile operator NTT DOCOMO. The person who submitted it didn't even do a rDNS and no longer has the IP Address. So I need to ask if you all can confirm this is in fact Googlebot-Mobile. Thanks.

DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

GaryK

10:45 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Based on other WebmasterWorld threads I'm convinced this is the Google WAP Proxy.

Key_Master

10:47 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that's correct. In fact, it's crawling my site now from 209.85.238.8 (no rdns).

GaryK

10:51 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you. :)

Key_Master

10:59 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your welcome. Here's another agent for Googlebot-Mobile (also crawling site at this moment from crawl-66-249-67-124.googlebot.com):

Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

It's almost like they're taking turns crawling pages.

GaryK

11:33 pm on Jan 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. I've got about a dozen of them in my database already.

Hey wait a minute. If it's crawling from *.googlebot.com it's a legit crawler right?

Key_Master

12:04 am on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If it's crawling from *.googlebot.com it's a legit crawler right?

In a perfect world, maybe. :)

If you search for that IP it's pretty clear it's a Google crawler. It even has the same headers as the Nokia crawler. Also, FeedFetcher uses that same IP.

Even though I've been chewed out in the past for suggesting it, use rdns for those search engines bots that claim to support it- just don't rely on it. Often times, they crawl from IPs that don't reverse resolve.

GaryK

12:16 am on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does that mean there are open proxies in the googlebot.com netrange? If so, oy! I've been letting anything in that comes from those kinds of search engine-designated host names. If so I'm glad I've got other bot traps that must be snaring some of them. There are so many these days I don't always bother keeping track of them. I just let the software do it's job.

ADDED: I did see the most recent fake Googlebot thread where the legit Googlebot bot seems to get caught in someone else's open proxy and starts crawling from there.

[edited by: GaryK at 12:17 am (utc) on Jan. 29, 2009]

Key_Master

12:54 am on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does that mean there are open proxies in the googlebot.com netrange?

What I meant was, if the IP doesn't resolve to googlebot.com, that doesn't mean it's not legit. In other words, even though 209.85.238.8 doesn't reverse resolve to googlebot.com, it's still a legit crawler.

I think it's pretty safe to say that there are no open proxies in the googlebot.com netrange. If so, Google would have a huge problem.

GaryK

2:04 am on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



if the IP doesn't resolve to googlebot.com, that doesn't mean it's not legit

Oh, OK. That makes sense in light of the thread I mentioned in my previous reply. Thanks again.

If so, Google would have a huge problem.

Agreed. I'm glad I misunderstood you.

Vimes

10:05 am on Jan 29, 2009 (gmt 0)

10+ Year Member



just come across this thread and tried posting something similar in the Google forum.

The question I'm interested in is Why Google's not following its advice to us regarding reverse forward look ups.

Crawling from this range:
209.85.128.0 - 209.85.255.255

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com

I'm still wondering whether i should keep blocking it :)

Vimes.
Added ip range

[edited by: Vimes at 10:08 am (utc) on Jan. 29, 2009]

GaryK

5:23 pm on Jan 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm now wondering something similar, Vimes.

The person who contacted me about the Googlebot-Mobile/DoCoMo user agent contacted me again today about a different albeit still somewhat similar UA.

Using the IP Address I did a full roundtrip DNS lookup.


access_log.1230437105:66.249.72.98 - - [28/Dec/2008:05:09:43 +0000] "GET /parking/uk/ls19_7xs/ HTTP/1.1" 200 1368 "-" "Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

> 66.249.72.98
Name: crawl-66-249-72-98.googlebot.com
Address: 66.249.72.98

> crawl-66-249-72-98.googlebot.com
Non-authoritative answer:
Name: crawl-66-249-72-98.googlebot.com
Address: 66.249.72.98

GaryK

5:28 pm on Jan 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here it is again, this time requesting robots.txt. If it looks like a crawler and acts like a crawler surely it must be a crawler. Anyone wanna help me out on this one please?

66.249.66.172 - - [26/Jan/2009:20:31:57 +0000] "GET /robots.txt HTTP/1.1" 200 127 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

66.249.66.172 - - [28/Jan/2009:21:04:52 +0000] "GET /robots.txt HTTP/1.1" 200 127 "-" "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

> 66.249.66.172
Name: crawl-66-249-66-172.googlebot.com
Address: 66.249.66.172

> crawl-66-249-66-172.googlebot.com
Non-authoritative answer:
Name: crawl-66-249-66-172.googlebot.com
Address: 66.249.66.172

wilderness

6:05 pm on Jan 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary,
The majority of my pages are not conductive to hand-helds because the text is too long.

Not sure when I added Nokia to my UA's, however it's been more than a few years ago.

2003:
216.239.39.z - - [19/Jul/2003:03:59:05 -0700] "GET /mypage.html HTTP/1.0" 200 27123 "-" "Nokia3510i/1.0 (04.01) Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0)"

I have a "Nokia Group" reference from 2004, however the log entries are not within that reference. The Class A was a 192.

2006(This thing attempted to grab about 20 pages, most with names that don't exist on my sites).
66.94.233.zz - - [29/Nov/2006:23:36:33 -0800] "GET /index.wml HTTP/1.1" 404 - "-" "Nokia6600/1.0 (4.09.1) SymbianOS/7.0s Series60/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.0"

I've seen the google inclusions, however have not saved the references.

Don

Samizdata

6:24 pm on Jan 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anyone wanna help me out on this one please?

It seems to be well covered in the rest of the thread, but the two user-agents are standard Googlebot-Mobile crawlers - the ones that made my mobile site so popular a few years ago.

The Nokia version is for western WAP/XHTML, the DoCoMo for Japanese cHTML/iMode (the DoCoMo version number seems to have incremented over time).

In the past both always used the 66.249 range - same as normal Googlebot for me - but lately they have increasingly used the 209.85 and 72.14 (Feedfetcher) ranges, with no rDNS.

Otherwise they seem pretty well-behaved, and rank me well.

And they are much busier these days.

...

GaryK

6:42 pm on Jan 30, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps I didn't quite understand the rest of the thread cause I wasn't clear these were legit Googlebot crawlers despite coming from googlebot.com.

I guess I'll be adding them to my database.

Thanks everyone. :)

Vimes

8:49 am on Jan 29, 2009 (gmt 0)

10+ Year Member




System: The following message was spliced on to this thread from: http://www.webmasterworld.com/search_engine_spiders/3837308.htm [webmasterworld.com] by incredibill - 5:48 pm on Jan. 30, 2009 (PST -8)


Hi,

Recently over the last few weeks I've been seeing an increasing amount of user agents coming from a google ip address failing my reverse-forward lookup,

user agent:
DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
located on this ip range:
209.85.128.0 - 209.85.255.255

returns a blank dns record,

i've continued to block it but the requests are coming in more and more and i'm beginning to think i should start allowing it as the amount of requests can't be human.

but why aren't these records up to date if this is a bot,

has any one else had this happen with a mobile user agent?

Vimes.

GaryK

6:59 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When preparing my last few replies I saw the same thing intermittently.

I'd do an nslookup on an IP Address and randomly get either a blank DNS record or an rDNS of *.googlebot.com. And then I'd do the forward lookup and see the same issue. Occasionally I'd also get non-authoritative replies too.

The requests don't seem to be human cause, according to the person who reported this to me, they keep requesting robots.txt.

I'll know more for certain when I analyze my own log files this evening. I'll be looking for instances of Googlebot-Mobile because I've never seen it in any of the log files for the 100+ sites I analyze every weekend. And that seems odd to me because I've got a couple of .mobi domains. Although as yet no mobile-specific sitemaps on my other domains. I never knew there were mobile sitemaps until the person who's been reporting all this to me told me about them.

Samizdata

9:55 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I never knew there were mobile sitemaps

Google Webmaster Tools has had them for years - but note that the format recently changed.

My main mobile project took off in a big way in 2006 thanks to Google rankings.

I'll be looking for instances of Googlebot-Mobile because I've never seen it in any of the log files

Although I don't run a project like yours I have collected over 600 genuine mobile user-agents.

Here's a couple to get you started:


Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; [help.yahoo.com...]

MSMOBOT/1.1 (+http://search.msn.com/msnbot.htm)

...

GaryK

10:17 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This one visited me in September 2008, but I seem to have not noticed it.

MSMOBOT/1.1 (+http://search.msn.com/msnbot.htm)

These haven't visited me since 2007 so I removed them. I never realized they were different than regular YahooSeeker or that they were still active.

LG-C1500 UP.Browser/6.2.3 (GUI) MMP/1.0 (compatible;YahooSeeker/M1A1-R2D2; [help.yahoo.com...]
LG-C1500 UP.Browser/6.2.3 (GUI) MMP/1.0 (compatible;YahooSeeker/M1A1-R2D2;mobile-search-customer-care AT yahoo-inc dot com)
MOT-V975/81.33.02I MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 (compatible;YahooSeeker/M1A1-R2D2; [help.yahoo.com...]
MOT-V975/81.33.02I MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 (compatible;YahooSeeker/M1A1-R2D2;mobile-search-customer-care AT yahoo-inc dot com)
Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; [help.yahoo.com...]
Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2;mobile-search-customer-care AT yahoo-inc dot com)
SGH-Z130 SHP/VPP/R5 SMB3.1 SMM-MMS/1.1.0 profile/MIDP-2.0 configuration/CLDC-1.0 (compatible;YahooSeeker/M1A1-R2D2; [help.yahoo.com...]
SGH-Z130 SHP/VPP/R5 SMB3.1 SMM-MMS/1.1.0 profile/MIDP-2.0 configuration/CLDC-1.0 (compatible;YahooSeeker/M1A1-R2D2;mobile-search-customer-care AT yahoo-inc dot com)

I need to start paying more attention. :o

Thanks for sharing. :)

[edited by: GaryK at 10:20 pm (utc) on Jan. 31, 2009]

jdMorgan

10:35 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just stopping in, and apologies if this has already been stated: From what I've been able to observe, the DoCoMo Googlebot UA is the "WAP/1.0" crawler, looking for pages written using WAP/cHTML markup now most popular in Asian markets. The Nokia Googlebot UA is the "WAP/2.0" crawler, looking for pages written in XHTML+XML (Mobile Profile) markup, most popular in Europe and the U.S.

In each case, Googlebot is "sort of spoofing" an actual device that would use the markup-type that it is looking for by sending that device's actual user-agent string and request headers, and then adding-on the Googlebot stuff at the end of the user-agent string.

I usually see them "taking turns" on newer mobile pages until they conclude that the page is XHTML+XML, at which point DoCoMo Googlebot starts visiting less often and Nokia Googlebot carries on as before, looking for page updates at a frequency that corresponds (loosely) to the page-update-frequency declarations in my sitemap.xml file.

Jim

GaryK

11:07 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, Jim. Every little bit helps.

Samizdata

11:19 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



On checking this month's logs I observe that:

My WAP/1.0 (WML) pages are crawled by a standard Googlebot user-agent (non-mobile).

The Nokia and DoCoMo user-agents both regularly crawl all my other mobile content - which is written in XHTML Transitional because it works on both WAP/2.0 and cHTML phones if kept simple (as well as computer browsers).

I have two mobile sitemaps, one for WML and one for the XHTML pages.

I am in the awkward position of contradicting Jim, which makes me uncomfortable.

...

jdMorgan

11:36 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, not a contradiction, really. I see "regular Googlebot" crawling my mobile pages, too.

And I don't have any cHTML/WAP/1.0 pages at all.

As with any "sample of one," what I posted was an opinion, and you shouldn't feel the least bit hesitant to post a differing opinion -- That's what forums are for!

All I have observed is that "DoCoMo Googlebot" seems to lose interest in my XHTML+XML-MP pages, while "Nokia Googlebot" revisits frequently.

Jim

GaryK

11:44 pm on Jan 31, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My head hurts.

I think for now I'll just call them both Googlebot-Mobile instead of the more descriptive Googlebot-Mobile/WAP1 and Googlebot-Mobile/WAP2 that I was planning on using.

If webmasters need to know more than that I think they should be looking at the full user agent and especially the header data.

[edited by: GaryK at 11:46 pm (utc) on Jan. 31, 2009]

Samizdata

2:03 am on Feb 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From what I've been able to observe, the DoCoMo Googlebot UA is the "WAP/1.0" crawler, looking for pages written using WAP/cHTML markup now most popular in Asian markets.

Apologies if I am misreading you Jim, but cHTML and WAP/1.0 are not equivalent.

"Compact HTML (cHTML) is defined as a subset of HTML 2.0, HTML 3.2 and HTML 4.0 specifications."

<html>
<head>
</head>
<body>
<p>
</p>
</body>
</html>

It is basically HTML with fewer tags, plus additional options for Japanese phones, and it can be viewed in a normal computer browser. I suspect it is your XHTML-MP DocType that frightens off the DoCoMo Googlebot, which is happy with my XHTML Transitional.

Whereas WAP/1.0 (WML) uses a different document structure:

"WML documents are XML documents that validate against the WML DTD"

<wml>
<head>
</head>
<card id="whatever" title="whatever">
<p>
</p>
</card>
</wml>

Almost every web-enabled cellphone outside the far east had a WAP/1.0 browser before the iPhone came along (though some also had an HTML browser). The only computer browser that has native WML support is Opera, as far as I am aware.

Neither the Nokia nor DoCoMo Googlebots crawl WML content, according to my logs.

Which leaves "WAP/2.0", a loose term that includes XHTML Mobile Profile - but XHTML-MP was largely made irrelevant at birth as mobile browsers started supporting most HTML 4.01 and XHTML Transitional markup.

The Nokia Googlebot crawls and indexes all of these, I believe.

I will stop there, as I am not a technical expert and I am making Gary's head hurt.

...

[edited by: Samizdata at 2:07 am (utc) on Feb. 1, 2009]

GaryK

2:27 am on Feb 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can I bum a few aspirin from someone in this thread?

BTW, we still have to deal with the issue that Vimes raised about DNS lookups not working consistently with the IP Address range of these bots.

dstiles

8:06 pm on Feb 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



GaryK - that could be as simple as internet delay time.

Often when I check rDNS for an IP I get an empty response then a proper response if I repeat the check a few seconds later. I suppose it depends on how busy the intermediate DNS servers are: by the time of the second request some of the intermediates have found the authorative answer and cached it so the delay is shorter.

Apart from that, google does seem to crawl from IPs with no rDNS occasionally but it's rather inconsistent. Or possibly a VERY long lookup time. Certainly a lot of google IPs apart from crawlers do not have rDNS, in particular the 72.14.192.0 - 72.14.255.255 range which includes a few used for translate AND general proxies (on the same IP!).

Re: docomo - yahoo JP sends around robots with very primitive header attributes that frequently get trapped.

By the way: OOOOOOOOO - is that enough aspirin? :)

GaryK

4:10 am on Feb 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Please, sir," replied Oliver aka Gary, "I want some more [aspirin]."

I'll take your word on the timeout issue even though I've never seen it myself. I do hundreds of lookups when I run my analysis reports on Sunday and I always get a valid host name. I allow a ten second timeout, but it never seems to take that long.

I dislike the thought of denying Googlebot because I couldn't do a full round trip DNS on it. And yet that's what'll happen cause that sort of thing is fully automatic on my sites that use it.

dstiles

11:48 pm on Feb 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don't know where you live, Gary, but I'm in the UK. If you are local-ish to America it could be a lot faster. :)
This 38 message thread spans 2 pages: 38