Forum Moderators: open
I am posting full headers minus any referrer information if any was supplied and exemplified Host information. I am also supplying the Ip Ranges which I have seen them coming from, which I can verify are Google owned via whois lookup.
Spoofing DoCoMo
<Headers>
<header name="Connection" value="Keep-alive" />
<header name="Accept" value="text/plain,text/html" />
<header name="Accept-Encoding" value="gzip,deflate" />
<header name="From" value="googlebot(at)googlebot.com" />
<header name="Host" value="example.com" />
<header name="User-Agent" value="DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" />
</Headers>
<BotRange>
<range StartIp="66.249.64.0" EndIp="66.249.95.255" />
<range StartIp="209.85.128.0" EndIp="209.85.255.255" />
</BotRange>
<Info FirstSeen="10/10/2008 12:03:00 PM" LastVisit="9/29/2009 11:36:00 PM" />
This version takes the same files as the regular googlebot, and doesn't mind html/xhtml files at all.
Spoofing Phone.com
<Headers>
<header name="Connection" value="Keep-alive" />
<header name="Accept" value="application/vnd.wap.xhtml+xml,application/xhtml+xml;q=0.9,text/vnd.wap.wml;q=0.8,text/html;q=0.7,*/*;q=0.6" />
<header name="Accept-Encoding" value="gzip,deflate" />
<header name="From" value="googlebot(at)googlebot.com" />
<header name="Host" value="example.com" />
<header name="User-Agent" value="SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" />
</Headers>
<BotRange>
<range StartIp="66.249.64.0" EndIp="66.249.95.255" />
</BotRange>
<Info FirstSeen="2/5/2009 6:02:00 AM" LastVisit="10/3/2009 9:58:00 PM" />
I see maybe one or two hits from this version a week. I don't think it likes my xhtml website which it is crawling. This version I think is actually looking for content designed specifically for mobiles phones.
Spoofing iPhone
<Headers>
<header name="Connection" value="Keep-alive" />
<header name="Accept" value="*/*" />
<header name="Accept-Encoding" value="gzip,deflate" />
<header name="From" value="googlebot(at)googlebot.com" />
<header name="Host" value="example.com" />
<header name="User-Agent" value="Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" />
</Headers>
<BotRange>
<range StartIp="66.249.64.0" EndIp="66.249.95.255" />
</BotRange>
<Info FirstSeen="5/9/2009 6:02:00 AM" LastVisit="8/7/2009 5:18:00 AM" />
This is the new kid on the block I have only just started noticing this in my logs. I don't have many hits from this version too little for me to tell much about it.
There are more variations of Googlebot-Mobile, but I have not seen them in the last year on the websites which I monitor.
You may have noticed all the different User-Agent variations still list Googlebot-Mobile as version 2.1.
1.) If yes, did they heed it?
2.) If no, do you have "User-agent: Googlebot-Mobile" specifically allowed or disallowed?
Aside: That's the UA Google defines as one of its robots.txt testing agents in Webmaster Tools/Site Configuration/Crawler access.
(FWIW: Thus far, all of Googlebot's mobile variations read and heed robots.txt on my sites.)
Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)
They follow it since I give Google full access to my website.