homepage Welcome to WebmasterWorld Guest from 50.19.206.49
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Bender the web crawler
anyone know what this new googlebot is about?
Kelowna




msg:4254338
 7:46 pm on Jan 17, 2011 (gmt 0)

I saw a visitor in my logs from 206.123.88.26 that identifies itself as Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)

Anyone know what it is?

[edited by: incrediBILL at 10:15 pm (utc) on Jan 17, 2011]
[edit reason] fixed UA [/edit]

 

keyplyr




msg:4254491
 2:43 am on Jan 18, 2011 (gmt 0)

Anybodies guess. The botrunner's using a colo owned by Networld Internet Services:

NetRange: 206.123.88.0 - 206.123.89.255
CIDR: 206.123.88.0/23

AboutWeb




msg:4298827
 9:47 am on Apr 16, 2011 (gmt 0)

This spider just crawled a page on my site, and the spider had the ip 50.22.60.251 and hostname 50.22.60.251-static.reverse.softlayer.com

wilderness




msg:4298898
 1:18 pm on Apr 16, 2011 (gmt 0)

Here's two other Colo4 ranges.
One from 2006 & 2008.

72.29.96.0 - 72.29.127.255
72.249.0.0 - 72.249.191.255

FWIW?
No wonder the www has run out of IP ranges:
72.249.0.0 - 72.249.191.255

wilderness




msg:4298903
 1:37 pm on Apr 16, 2011 (gmt 0)

Any bot that uses these common names [webmasterworld.com](there are even more that I couldn't recall at that moment) in their UA, deserves access denial.

wilderness




msg:4303175
 1:17 am on Apr 24, 2011 (gmt 0)

Now crawling from GoogleSites

50.16.84.28 - - [23/Apr/2011:18:38:24 -0600] "GET /robots.txt HTTP/1.1" 403 371 "-" "Mozilla/5.0 (compatible; Bender; http ://sites.google.com/site/bendercrawler)"

dstiles




msg:4303383
 6:46 pm on Apr 24, 2011 (gmt 0)

50.16/14 is amazonaws. This month I've only seen the bot from softlayer - 173.192.127.nn

Looks as if this may be some kind of bot designed under google auspices and then sold on: a scraper of some kind, perhaps? From the URL given in the UA...

"Bender is a web crawler or "robot" used for research on local services content on the Internet.

"Bender identifies itself as user-agent "Bender". You can use this in your robots.txt if you want to specifically control how Bender accesses your site.

"Bender is not affiliated with Google, Inc."

BotsvBrowsers has half a dozen IPs for it and first saw the browser 1st April this year. User-agents.org has no record of it that I could find.

wilderness




msg:4303404
 7:37 pm on Apr 24, 2011 (gmt 0)

crawler


They deny themselves with this common and abusively used UA term.

FWIW, I realized after making the submission that it was amazon. My bad.

caribguy




msg:4303413
 8:07 pm on Apr 24, 2011 (gmt 0)

First seen in Jan, seems well behaved (robots.txt only):

206.123.88.26 - - [18/Jan/2011:00:00:00 -0000] "GET /robots.txt HTTP/1.1" 200 312

And now regularly from here:
50.22.60.251 - - [25/Mar/2011:00:00:00 -0000] "GET /robots.txt HTTP/1.1" 200 321

Pfui




msg:4343832
 3:02 am on Jul 26, 2011 (gmt 0)

Coming around again with yet another name change and a new/additional host. Thus far behaved from AWS but not SoftLayer:

ec2-184-72-178-50.compute-1.amazonaws.com
Mozilla/5.0 (compatible; Bender; http://benderthewebrobot.tumblr.com)
robots.txt? Yes

50.22.60.251-static.reverse.softlayer.com
Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)
robots.txt? Yes, BUT promptly and routinely ignored.

Wish the bot-runner would pick a UA and a host and stick with 'em.

Hmm. This thing may even date back to someone's student days:

bender/Nutch-0.8.1 (myd@cs.stanford.edu) [webmasterworld.com...]

lucy24




msg:4343849
 4:33 am on Jul 26, 2011 (gmt 0)

50.22.60.251-static.reverse.softlayer.com
Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)

Is that recent? I've got the identical IP + UA combination but just once, back in April.

This thing may even date back to someone's student days:

bender/Nutch-0.8.1 (myd@cs.stanford.edu)

Funny you should say that. I did a quick search for Bender and Nutch (thank you, Spotlight) and at the end of the list is

141.211.202.150 - - [10/Jul/2011:07:03:47 -0700] "GET / HTTP/1.0" 200 1851 "-" "CSVNutch/Nutch-1.3 (Academic test crawl; pshopcrawl at gmail)"

141.211 may or may not be UMich. Probably yes; there are other schools in the general neighborhood.

Uhmm... They're worried that if they give their e-mail address with an un-obfuscated @, they'll be spammed by every www site in the country?

Pfui




msg:4344039
 1:57 pm on Jul 26, 2011 (gmt 0)

re Bender: The same softlayer address w/ G-based info ("Bender the web crawler") hit in May and June. The amazonaws address w/ tumblr-based info ("Bender the Web Robot") hit yesterday.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved