homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Bender the web crawler
anyone know what this new googlebot is about?

 7:46 pm on Jan 17, 2011 (gmt 0)

I saw a visitor in my logs from that identifies itself as Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)

Anyone know what it is?

[edited by: incrediBILL at 10:15 pm (utc) on Jan 17, 2011]
[edit reason] fixed UA [/edit]



 2:43 am on Jan 18, 2011 (gmt 0)

Anybodies guess. The botrunner's using a colo owned by Networld Internet Services:

NetRange: -


 9:47 am on Apr 16, 2011 (gmt 0)

This spider just crawled a page on my site, and the spider had the ip and hostname


 1:18 pm on Apr 16, 2011 (gmt 0)

Here's two other Colo4 ranges.
One from 2006 & 2008. - -

No wonder the www has run out of IP ranges: -


 1:37 pm on Apr 16, 2011 (gmt 0)

Any bot that uses these common names [webmasterworld.com](there are even more that I couldn't recall at that moment) in their UA, deserves access denial.


 1:17 am on Apr 24, 2011 (gmt 0)

Now crawling from GoogleSites - - [23/Apr/2011:18:38:24 -0600] "GET /robots.txt HTTP/1.1" 403 371 "-" "Mozilla/5.0 (compatible; Bender; http ://sites.google.com/site/bendercrawler)"


 6:46 pm on Apr 24, 2011 (gmt 0)

50.16/14 is amazonaws. This month I've only seen the bot from softlayer - 173.192.127.nn

Looks as if this may be some kind of bot designed under google auspices and then sold on: a scraper of some kind, perhaps? From the URL given in the UA...

"Bender is a web crawler or "robot" used for research on local services content on the Internet.

"Bender identifies itself as user-agent "Bender". You can use this in your robots.txt if you want to specifically control how Bender accesses your site.

"Bender is not affiliated with Google, Inc."

BotsvBrowsers has half a dozen IPs for it and first saw the browser 1st April this year. User-agents.org has no record of it that I could find.


 7:37 pm on Apr 24, 2011 (gmt 0)


They deny themselves with this common and abusively used UA term.

FWIW, I realized after making the submission that it was amazon. My bad.


 8:07 pm on Apr 24, 2011 (gmt 0)

First seen in Jan, seems well behaved (robots.txt only): - - [18/Jan/2011:00:00:00 -0000] "GET /robots.txt HTTP/1.1" 200 312

And now regularly from here: - - [25/Mar/2011:00:00:00 -0000] "GET /robots.txt HTTP/1.1" 200 321


 3:02 am on Jul 26, 2011 (gmt 0)

Coming around again with yet another name change and a new/additional host. Thus far behaved from AWS but not SoftLayer:

Mozilla/5.0 (compatible; Bender; http://benderthewebrobot.tumblr.com)
robots.txt? Yes
Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)
robots.txt? Yes, BUT promptly and routinely ignored.

Wish the bot-runner would pick a UA and a host and stick with 'em.

Hmm. This thing may even date back to someone's student days:

bender/Nutch-0.8.1 (myd@cs.stanford.edu) [webmasterworld.com...]


 4:33 am on Jul 26, 2011 (gmt 0)
Mozilla/5.0 (compatible; Bender; http://sites.google.com/site/bendercrawler)

Is that recent? I've got the identical IP + UA combination but just once, back in April.

This thing may even date back to someone's student days:

bender/Nutch-0.8.1 (myd@cs.stanford.edu)

Funny you should say that. I did a quick search for Bender and Nutch (thank you, Spotlight) and at the end of the list is - - [10/Jul/2011:07:03:47 -0700] "GET / HTTP/1.0" 200 1851 "-" "CSVNutch/Nutch-1.3 (Academic test crawl; pshopcrawl at gmail)"

141.211 may or may not be UMich. Probably yes; there are other schools in the general neighborhood.

Uhmm... They're worried that if they give their e-mail address with an un-obfuscated @, they'll be spammed by every www site in the country?


 1:57 pm on Jul 26, 2011 (gmt 0)

re Bender: The same softlayer address w/ G-based info ("Bender the web crawler") hit in May and June. The amazonaws address w/ tumblr-based info ("Bender the Web Robot") hit yesterday.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved