Forum Moderators: open

Message Too Old, No Replies

What is DoCoMo/1.0/N505i/c20/TB/W20H10 ?

         

gringogigi

6:34 pm on Sep 8, 2008 (gmt 0)

10+ Year Member



Finally got around to blocking baiduspider today (seeing as it is not verifiable by reverse+forward DNS), and went to check the logs for any new bots. Found this:

208.80.194.nnn [08/Sep/2008:03:57:41] "GET / HTTP/1.0" 202 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {4EB13121-6963-4D18-B470-6D20C8ED84A3}; Roadrunner; snprtzŠT04714137801828; .NET CLR 1.1.4322; .NET CLR 2.0.50727)" "MaybeRobot" "-"

72.14.199.nnn [08/Sep/2008:06:54:48] "GET /n.htm HTTP/1.1" 403 "DoCoMo/1.0/N505i/c20/TB/W20H10 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" "Robot" "Imposterer"

66.55.151.nnn [08/Sep/2008:10:01:46] "HEAD / HTTP/1.1" 202 "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" "MaybeRobot" "-"

Has anyone else seen the middle one?

It says it's from google, but it doesn't verify with DNS. It *does* belong to Google if you manually look via WHOIS though...

[edited by: incrediBILL at 7:26 pm (utc) on Sep. 8, 2008]
[edit reason] Obscured IPs [/edit]

keyplyr

10:00 pm on Sep 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DoCoMo is a Japanese mobile browser.

Googlebot-Mobile is of course Google.

Haven't seen "MaybeRobot"

jdMorgan

11:07 pm on Sep 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Docomo is the leading Japanese mobile Web provider.

This looks like someone spoofing Googlebot-mobile, and crawling while using Google's Web transcoder service as a proxy. Probably a scraper, but can't be sure because of the incomplete 72.14.199.nnn IP address.

"Imposter" is misspelled, and the User-agent string syntax is "iffy" at best; Generally, no substrings in UAs should be enclosed in quotes.

I'd certainly block this one, myself.

Jim

[added]

Here is a real Googlebot-mobile request, crawling for cHTML Web content:

209.85.238.6 - - [31/Aug/2008:14:29:14 -0500] "GET /page.html HTTP/1.1" 200 1234 "-" "DoCoMo/1.0/N505i/c20/TB/W20H10 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

and here, crawling for xml+xhtml-MP content:

66.249.72.81 - - [31/Aug/2008:19:41:21 -0500] "GET /page.html HTTP/1.1" 200 1234 "-" "Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"

[/added]

[edited by: jdMorgan at 11:17 pm (utc) on Sep. 8, 2008]

Samizdata

11:38 pm on Sep 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First one is Websense, second one is a fake Google, third one is Choopa.

I would terminate all of them with extreme prejudice.

...

thetrasher

10:23 am on Sep 9, 2008 (gmt 0)

10+ Year Member



Maybe it's just a glitch in DNS? Check for "X-Forwarded-For" in order to identify Google proxies.

FYI:
208.80.194.nnn
[webmasterworld.com...]

deny from 208.80.192.0/21

66.55.151.n
Seen here for 16 months on three domains

deny from 66.55.128.0/19

Samizdata

12:06 pm on Sep 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Although the 72.14.199.nnn range is definitely Google, I have never seen it used for crawling.

I have seen it used as a proxy by various undesirables as well as tools such as the Transcoder.

Mobile proxies are notorious for allowing legit bots to crawl, which can have negative effects.

I would be entirely comfortable with serving a 403 in the circumstances.

Of course, I could be wrong.

...

wilderness

3:39 pm on Sep 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've used the following since Feb 2006.

RewriteCond %{REMOTE_ADDR} ^72\.14\.(19[2-9]¦2[0-5][0-9])\. [OR]

Of course that may not be wise for everybody.

Umbra

4:45 pm on Sep 12, 2008 (gmt 0)

10+ Year Member



I'm seeing it from 210.153.* and 210.136.* and it's not sending any other headers except for the DoCoMo user agent. The reverse ip is proxy*.docomo.ne.jp. I'm thinking this is a badly written tool because it's missing all the headers.